This is an automated email from the ASF dual-hosted git repository. xiazcy pushed a commit to branch tp4-future-update in repository https://gitbox.apache.org/repos/asf/tinkerpop.git
commit ffcc76d790fc97fe4a05a897c1048c386ef7e318 Author: Yang Xia <[email protected]> AuthorDate: Wed May 8 12:16:44 2024 -0700 Updating the future section for TP4 and adding a proposal page for all features --- docs/src/dev/future/index.asciidoc | 57 +++-- .../future/proposal-5-tinkerpop4-features.asciidoc | 242 +++++++++++++++++++++ 2 files changed, 280 insertions(+), 19 deletions(-) diff --git a/docs/src/dev/future/index.asciidoc b/docs/src/dev/future/index.asciidoc index ce030878f4..647afa6afe 100644 --- a/docs/src/dev/future/index.asciidoc +++ b/docs/src/dev/future/index.asciidoc @@ -47,23 +47,26 @@ in each release line represent unreleased changes only. Once an official release are removed with new items taking their place as they are planned. The release line is removed from the roadmap completely when it is no longer maintained. -== 3.7.x - Target 2023H1 - -The development of the 3.7.x release line is currently under way with a target release date for the initial release of -the line of 23H1. +== TinkerPop 4.x - Target 2024H2 +The development of the 4.x release line is currently under way with a target release date for the initial release +of the line of 24H2. Items listed below are breaking feature updates essential to TP4, additional features in discussion are listed in the Appendix. +For additional details on each item listed, see the features doc link:https://github.com/apache/tinkerpop/blob/master/docs/src/dev/future/proposal-4-tinkerpop4-features[TinkerPop 4 Features]. + +* *HTTP Support* - Server & GLVs +** Replacement of WebSocket with HTTP/1.1 (link:https://lists.apache.org/thread/vfs1j9ycb8voxwc00gdzfmlg2gghx3n1[DISCUSS thread]) +* Make `gremlin-lang` script processing default +* Bytecode Removal (link:https://lists.apache.org/thread/7m3govzsqtmmj224xs7k5vv1ycnmocjn[DISCUSS thread]) +* *Transactions* - Redesign the transaction model so that it is better suited for all graphs. +** Ensure that TinkerPop has a native implementation for transactions in TinkerGraph so that all tests can run from it. +** Ensure that there is no difference between remote and embedded transaction usage and that the API is less tangled +than it is today. +Features originally planned for 3.7.x. * Add support for traversals as parameters for `V()`, `is()`, and `has()` (includes `Traversal` arguments to `P`) -* Geospatial support for TinkerPop (link:++https://lists.apache.org/[email protected]:2021-7:DISCUSS%20geo-spatial++[DISCUSS Thread]) -* Add mid-traversal `E()` support (link:https://issues.apache.org/jira/browse/TINKERPOP-2798[TINKERPOP-2798]) -* Allow properties on elements (as opposed to just references) for remote traversals * Add subgraph/tree structure in all GLVs -* List functions (`concat()`/etc.) * Define semantics for query federation across Gremlin servers (depends on `call()` step) * Gremlin debug support -* Date/Time manipulation functions (`dateAdd()`, `dateDiff()`, etc.) -* Add string manipulation functions (`split()`, `substring()` etc.) (link:https://issues.apache.org/jira/browse/TINKERPOP-2672[TINKERPOP-2672]) * Case-insensitive search (link:https://issues.apache.org/jira/browse/TINKERPOP-2673[TINKERPOP-2673]) -* Type conversion with `cast()` step * Mutation steps for `clone()` of an `Element` and for `moveE()` for edges. * Add a language element to merge `Map` objects more easily. @@ -99,25 +102,41 @@ story. |link:https://github.com/apache/tinkerpop/blob/master/docs/src/dev/future/proposal-arrow-flight-2[Proposal 2] |Gremlin Arrow Flight. |Future |N |link:https://github.com/apache/tinkerpop/blob/master/docs/src/dev/future/proposal-3-remove-closures[Proposal 3] |Removing the Need for Closures/Lambda in Gremlin |3.7.0 |Y |link:https://github.com/apache/tinkerpop/blob/master/docs/src/dev/future/proposal-transaction-4[Proposal 4] |TinkerGraph Transaction Support |3.7.0 |Y +|link:https://github.com/apache/tinkerpop/blob/master/docs/src/dev/future/proposal-4-tinkerpop4-features[Proposal 5] |TinkerPop 4 Features |4.0.0 |N |========================================================= = Appendix -== TinkerPop4 +== TinkerPop 4.x +This section includes proposed features for TP4, which may or may not be planned into the release. +For additional details on each item listed, see the features doc link:https://github.com/apache/tinkerpop/blob/master/docs/src/dev/future/proposal-4-tinkerpop4-features[TinkerPop 4 Features]. -This space is currently a bit of a scratchpad for ideas and changes that might not fit well into TinkerPop3 and -therefore might be best left to TinkerPop4. - -* *Transactions* - Redesign the transaction model so that it is better suited for all graphs. -** Ensure that TinkerPop has a native implementation for transactions in TinkerGraph so that all tests can run from it. -** Ensure that there is no difference between remote and embedded transaction usage and that the API is less tangled -than it is today. * *Groovy* - Reconsider all dependencies on Groovy throughout TinkerPop ** Remove Groovy support from Gremlin Server which should be possible now that `gremlin-language` and `call()` are available. ** Investigate options for using JShell as a replacement for `groovysh` in Gremlin Console. ** Investigate options for removing `ScriptEngine` support in general, which would include support from `gremlin-language`. +* Type System (link:https://lists.apache.org/thread/rpdq3ywk6vqpyv512to36ot8yqvjo3dv[DISCUSS thread]) +* Schema Support +* Multi-label, no label, mutable label support +* Multi/meta properties on edges +* Pluggable System for explain/profile() +* `has()` accepting a traversal +* Improve `local()` step +* Type conversion with `cast()` step +* New Gremlin language elements for geospatial (link:https://lists.apache.org/thread/mxg3kopgj9h9v8j299qjhdhopzpdkfow[DISCUSS Thread]), vector, and pattern matching +* Rework `match()` step +* Query status/query cancellation +* Unify algorithm steps +* Modernize IO for OLAP +* Proxy implementation +* Remove `neo4j-gremlin` (link:https://lists.apache.org/thread/lxn4s9fs8rzggm0jlnffnphfpqnpn3h8[DISCUSS thread]) +* Deprecate `sparql-gremlin` +* `io()` step improvements +* Documentation re-organization +* Improved telemetry in driver/server +* Matrix testing in driver/server === 4.x Branching Methodology diff --git a/docs/src/dev/future/proposal-5-tinkerpop4-features.asciidoc b/docs/src/dev/future/proposal-5-tinkerpop4-features.asciidoc new file mode 100644 index 0000000000..8b195d4574 --- /dev/null +++ b/docs/src/dev/future/proposal-5-tinkerpop4-features.asciidoc @@ -0,0 +1,242 @@ +//// +Licensed to the Apache Software Foundation (ASF) under one or more +contributor license agreements. See the NOTICE file distributed with +this work for additional information regarding copyright ownership. +The ASF licenses this file to You under the Apache License, Version 2.0 +(the "License"); you may not use this file except in compliance with +the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +//// +image::apache-tinkerpop-logo.png[width=500,link="https://tinkerpop.apache.org"] + +*x.y.z - Proposal 5* + +== TinkerPop 4.x Features + +=== Status +TinkerPop 4 implementation has begun with the switch from WebSocket to HTTP/1.1 for the underlying transport of Gremlin Server. +This is one of the many features that's been proposed in the community for TP 4.x. In this doc we aim to provide a high level +overview of all features and ideas that were proposed to be included in TinkerPop 4.x. Some features are breaking, as with the +server work, others will not. + +One reminder is that starting with TinkerPop 4, we will be moving into Semantic Versioning, as outlined in the link:https://lists.apache.org/thread/g85tbsocmpv5oksq0xs425cgrw8xkdnn[DISCUSS thread]. + +=== Motivation +TinkerPop 4 is a chance to introduce major breaking changes that can provide revolutionary improvements for providers and users. + +=== Overview +This is a non-exhaustive list of all features that are in discussion, or have been proposed at some point for TP4. +In sections below, each item is placed into <<breaking-must-have>>, <<breaking-good-to-have>>, and <<non-breaking>>, +with details explaining their implications. + +[width="100%",cols="10,^1,^1,^1,^3",options="header"] +|========================================================= +|Feature |Breaking |Size |Must-Have |Status +| <<http-support>> |Y |XL |Y |In Progress +| <<bytecode-removal>> |Y |L |Y |In Discussion +| <<gremlin-lang-default>> |Y |L |Y | +| <<tx-redesign>>|Y |XL|Y |In Discussion +| <<groovy-removal>>|Y|XL|| +| <<type-system>>|Y|XL|| +| <<schema-support>>|Y|L|| +| <<multi-label>>|Y|L|| +| <<meta-props-on-edge>>|Y|M|| +| <<pluggable-explain>>|Y|M|| +| <<has-traversal>>|Y|M|| +| <<query-cancel>>|Y|M|| +| <<local-step-improve>>|Y|M|| +| <<type-casts>>|Y|L|| +| <<geo-vector-patterns>>|Y|XL|| +| <<match-step-improve>>|Y|L|| +| <<algorithm-steps>>|Y|M|| +| <<io-olap>>|Y|L|| +| <<neo4-removal>>|Y|S|| +| <<sparql-deprecate>>|N|S|| +| <<proxy>>|N|M|| +| <<io-step-improve>>|N|M|| +| <<docs-reorg>>|N|XL|| +| <<matrix-test>>|N|L|| +| <<telemerty>>|N|L|| +|========================================================= + +=== Breaking Changes - Must-Haves [[breaking-must-have]] +This section lists all breaking changes that will be essential to TP4 release, which cannot be implemented in minor version +upgrades. Any changes not included in TP4 will need to be implemented in the next major versions. + +==== HTTP support - Server & GLVs [[http-support]] +Currently under development in the `master-http` branch. This body of work aims to replace the WebSocket protocol in Gremlin Server +with HTTP/1.1 (link:https://lists.apache.org/thread/vfs1j9ycb8voxwc00gdzfmlg2gghx3n1[DISCUSS thread]). +For API design, see link:https://issues.apache.org/jira/browse/TINKERPOP-3065[TINKERPOP-3065 +Implement a new HTTP API]. + +Along with this work, connection settings should be minimized, server initiation should be simplified, and sessions will be removed. + +A bigger portion of this work includes switching out the WebSocket libraries in each GLV for HTTP. In this rework, connection +options should be simplified with HTTP compared to WebSocket, and should be unified across all GLVs to the best of each +language's library availability. This will also include implementing interface for pluggable request interceptor for authentication, +as raised in the link:https://lists.apache.org/thread/cpsdd7gjmr1yb6c5kkm6v2bcfpp6fqq5[DISCUSS thread]. + +==== Bytecode removal [[bytecode-removal]] +One of the purposes that bytecode served was to provide a universal way to translate a Traversal. However, with the introduction of +the `gremlin-lang` parser this need can be fulfilled differently. Any Gremlin script can be converted into a Traversal in a uniform way which reduces the +need for bytecode. Now, we are left with two systems that serve a similar purpose, it is probably time to remove one of them during a major +version upgrade, see (link:https://lists.apache.org/thread/7m3govzsqtmmj224xs7k5vv1ycnmocjn[DISCUSS thread]). + +Before the full removal can be implemented, a few updates will be needed in `gremlin-lang` to ensure appropriate types are covered. +Each GLV will also have to be updated to switch from bytecode based to string based traversal construction. A proposed plan includes: + +1. Extract interface from Bytecode, and implement string based traversals and request options +2. Add support for missing types, such as UUID, Set, Edge, ByteBuffer, etc. in `gremlin-lang` (link:https://issues.apache.org/jira/browse/TINKERPOP-3023[TINKERPOP-3023]) +3. Add missing types to GLVs and rework traversal generation +4. Ensure Feature tests work properly + +*Type System update needed* + +One important note for this proposed plan is that currently `gremlin-lang` does not cover all types supported via Bytecode, +which means either _all missing types need to be fully defined and implemented in the `gremlin-lang` parser for parity +(related to *Type System* section below)_, or _consensus have to be reached in the community on if reduced type support +is acceptable, and if so, which types can be omitted at this point._ + +==== Make `gremlin-lang` script processing default [[gremlin-lang-default]] +Switching the default script processing from `GremlinGroovyScriptEngine` to `gremlin-lang` is a step towards removing +dependency on Groovy in the Gremlin Server. + +*Gremlin Console rework* + +As a result of sessions removal and switch to `gremlin-lang`, the Gremlin Console remote mode will be affected, and users +may notice a difference in the interaction. Specific changes will need to be determined and communicated to the community. + +==== Transaction redesign [[tx-redesign]] +As transaction will have to be implemented over HTTP, this is an opportunity to improve the usability of the transaction APIs. +Such API redesign will be a breaking change that needs to be introduced in the initial release of TP4, which can include +stub implementations only, with full implementation added iteratively in minor releases. + +This is another large body of work that will be breaking for users and any providers relying on Groovy. + +=== Breaking Changes - Good-to-Haves [[breaking-good-to-have]] +This section lists all breaking changes that will be beneficial to include into TP4, but can be implemented in future major versions. + +==== Groovy removal in Gremlin Server [[groovy-removal]] +Removing Groovy from Gremlin Server is a major body of work that implies: + +1. revised config system to avoid init script +2. deprecate `GremlinGroovyScriptEngine` for `gremlin-language` for script processing +3. remove/replace all the Groovy based plugin infrastructure from the server + +One main impact of how Groovy has allowed arbitrary code to be executed on the server is security vulnerabilities. +However, the removal of this system itself has overreaching affects in the community that should be discussed. + +==== Type System [[type-system]] +TinkerPop has not had one's own type system defined and has been relying on the JVM types, which becomes a problem especially in +GLVs that doesn't have corresponding types defined in their language. (link:https://lists.apache.org/thread/rpdq3ywk6vqpyv512to36ot8yqvjo3dv[DISCUSS thread]) + +==== Schema support [[schema-support]] +Schema support relies on a well-defined type system. + +==== Multi-label, no label, mutable label support [[multi-label]] +TinkerPop only support single, immutable labels for its Elements. Various providers have implemented their own mechanisms +for multi-label, no label, and/or mutable label support. Neo4j also allows multiple labels in their graphs. It is time to consider +bringing these functionalities into parity. + +==== Multi/meta properties on edges [[meta-props-on-edge]] +Currently, meta-properties only exists on vertices, this extends to allowing meta-properties on edges. + +==== Pluggable System for explain/profile() [[pluggable-explain]] +While TinkerPop provides explain() and profile() steps, switching to a pluggable architecture would increase flexibility for +providers who wish to customize the amount and format of information they return. + +An extension of this is for explain() to work in remote fashion, see link:https://issues.apache.org/jira/browse/TINKERPOP-2128[TINKERPOP-2128] + +==== Improve `local()` step [[local-step-improve]] +The concept and application of the `local()` step has been somewhat confusing to users, and the addition of the string and list +manipulation steps in 3.7 further blurred some definitions of local execution in a traversal. It is a good time to start considering +a redesign or improved design of the `local()` step. + +==== Type conversion with `cast()` step [[type-casts]] +We have introduced `aoString()` and `asDate()` in 3.7, this would be to introduce additional casting steps like `toInt()`, which +should rely on a well-defined type system. + +==== New Gremlin language elements for geospatial, vector, and pattern matching [[geo-vector-patterns]] +Similar to how string and list manipulation steps were introduced, there is room for creating first-class steps for vector computation +and geospatial steps (link:https://lists.apache.org/thread/mxg3kopgj9h9v8j299qjhdhopzpdkfow[DISCUSS Thread]). Pattern matching is also another area is the long due for revision, which ties into the current +implementation of `match()` step. + +==== Rework `match()` step [[match-step-improve]] +The `match()` step has been an attempt to introduce a way of declarative form of querying in TinkerPop based on pattern matching. +There exists various issues with the step, and rework is due for improvements. + +Unresolved issues related to current `match()`: + +* link:https://issues.apache.org/jira/browse/TINKERPOP-2961[TINKERPOP-2961 Missing exceptions for unsolvable match pattern] +* link:https://issues.apache.org/jira/browse/TINKERPOP-2528[TINKERPOP-2528 Improve match() step to generate traversals that uses indexes] +* link:https://issues.apache.org/jira/browse/TINKERPOP-2503[TINKERPOP-2503 Implement look-ahead on PathRetractionStrategy] +* link:https://issues.apache.org/jira/browse/TINKERPOP-2340[TINKERPOP-2340 MatchStep with VertexStep Exceptions] +* link:https://issues.apache.org/jira/browse/TINKERPOP-940[TINKERPOP-940 Convert LocalTraversals to MatchSteps in OLAP] +* link:https://issues.apache.org/jira/browse/TINKERPOP-736[TINKERPOP-736 Automatic Traversal rewriting] + +==== `has()` accepting Traversal [[has-traversal]] +This is a body of work that was in the roadmap for 3.7.x, which is to add support of traversals as parameters to `has()`, +which should expand the usability of the Gremlin language. + +==== Query status/query cancellation [[query-cancel]] +These are useful features for debugging and improved resource management that have been implemented by providers, but would now be +a good time to bring parity into TinkerPop. + +* link:https://issues.apache.org/jira/browse/TINKERPOP-2210[TINKERPOP-2210 Support cancellation of remote traversals] + +==== Unify algorithm steps [[algorithm-steps]] +Moving the algorithm steps into `call()` step or generify them in some way. + +==== Modernize IO for OLAP [[io-olap]] +As name suggests, we should remove old file serialization formats, and introduce more modernized format for IO. One possible +candidate is link:link:https://github.com/apache/incubator-graphar[GraphAR], which is a standard data file format for graph data +storage and retrieval, currently an incubating Apache project. + +A potential large extension of this work, which may not be included for this version yet, is revisiting OLAP in general to resolve +link:https://issues.apache.org/jira/browse/TINKERPOP-1298?jql=project%20%3D%20TINKERPOP%20AND%20status%20%3D%20Open%20AND%20text%20~%20%22OLAP%22[open JIRA issues]. + +==== Remove `neo4j-gremlin` [[neo4-removal]] +As discussed inside (link:https://lists.apache.org/thread/lxn4s9fs8rzggm0jlnffnphfpqnpn3h8[DISCUSS thread]), `neo4j-gremlin` was deprecated in 3.7 +with the introduction of native transaction in TinkerGraph. TP4 would be the place to remove the model. + +=== Non-Breaking Changes [[non-breaking]] +This section lists all changes that should not be breaking in terms of functionality and APIs, and can be implemented in minor version releases. + +==== Documentation reorganization [[docs-reorg]] +In addition to the necessary documentation updates needed for new TP4 feature implementations, this entails more major rework +to the documentation structure. + +The current documentation is very thorough in certain areas, but lacking in many others. The accumulation of the features and functionalities +over the past years likely mean that certain information are outdated, and/or should be reworded for clarity. While we have a generous +amount of reference material, there tend to lack implementation guidelines for contributors and providers. TP4 is an opportunity to rework +the documentations to be more thorough, concise, clear, and easy to update when new features are implemented. + +Another implication of this is to revisit the current documentation generation process. We have a very complex scripting structure that we use to +orchestrate the generation of documentations, combined with Maven plugins for language specific docs. This process maybe affected by +any major alterations to documentation structure, which would need some effort to revise. + +==== Deprecate `sparql-gremlin` [[sparql-deprecate]] +This module of TinkerPop has been largely unmaintained and likely unused for many years. Unless we receive fresh interest and contribution, +it would be the time to deprecate and remove in a future version. + +==== Proxy implementation [[proxy]] +Implementing a proxy for Gremlin Server might be a viable alternative to implementing clustering in the client, for +orchestrating multiple Gremlin Server instances, and/or rerouting WebSocket/HTTP requests for compatibility. + +==== `io()` step improvements [[io-step-improve]] +Simply `io()` for data ingestion and export in both embedded and remote usage in some way, and add support for CSV format. + +==== Matrix testing [[matrix-test]] +This aims to create an automated testing set up, which helps to ensure compatibility between drivers and server across minor releases, +and to make sure API contracts are not broken unintentionally. + +==== Improved telemetry in driver/server [[telemerty]] +This is a less well-defined area, aimed at improved metrics collection that can better aid debugging for users and providers. +Work may include adding the ability to debug queries and traversals, adding OpenTelemetry support, etc. \ No newline at end of file
