Hi Cole. Thank you for sharing. We reached an agreement on all topics except the use of the RETURN statement.
We brainstormed inside the team and came up with an interesting idea about handling the output from the match statement, thanks to Lev Sivashov, who provided it. This idea is combined with another of my proposals to treat Optional.EMPTY returned by Traverser is a jolt to the execution of the next step by Traversal, but it is treated as no value for the steps that do not process input values, such as addV(). It will fix queries such as `g.addV(__.inject('x'))` and similar ones in Gremlin that accept Traversal and need a fake Traverser with a value to work as expected. So we propose not to support RETURN at all, as we already have a means to handle projections in Gremlin. Instead: 1. match() steps returns Optional.empty() as result. 2. We specify which MATCH variables we need to fetch using the select() step. So query g.V(1).property('friendWeight', match("MATCH (n{name:'Cole'})-[e:knows]->() RETURN sum(e.weight)")) will look like g.V(1).property('friendWeight', match("MATCH (n{name:'Cole'})-[e:knows]->()).select("n").values("weight").sum())) This approach is easily optimized for execution by analyzing the select steps and providing GQL executor names of variables that are really needed. It also looks elegant, prevents informational clutter, and offers minimal and efficient pattern-matching methods for Gremlin. WDYT? If you agree, I will wait a week to gather feedback from other participants. If no additions are provided, I will publish a summary here and link to our design document for general information, and I will start implementing it at our pace. On Wed, Sep 3, 2025 at 5:27 AM Cole Greer <colegr...@apache.org> wrote: > > Hi Andrii, > > I've taken more time to think through your proposal. > > > I think we can transform the idea of introduction of new step, to the idea > > of usage > > of `with` step and provide the following modulation rule for the new > > `match` step: if name of the key in with step is passed in with "$" prefix, > > this prefix is removed an the rest of the key is used as query parameter. > > It is quite a common way of naming the parameters. As for binding of > > parameters for server queries, if query parameters are not provided > > explicitly, then we will perform an implicit lookup over the bindings of > > those parameters. > > I like this. It gives good flexibility for localized "match parameters", > while retaining some connection to the existing parameter bindings in the > server. > > > There is a discrepancy between the naming of parameters between GQL and > > Gremlin, but that is, IMHO, acceptable. > > As one more alternative, probably even more appealing, we can wrap > > parameters in "{}", as Koltin does :-) > > That will resemble GQL style and will not create a visual mess. > > > > So it will look like: > > ` g.match("MATCH (src:Airport {code:srcCode}), (dest:Airport > > {code:destCode}) RETURN src") > > .addE("Route").to("dest") > > .property(T.id, > > format("%{_}-%{_}").by(constant("{srcCode}")).by(constant("{destCode}")))` > > We don't currently support any parameter replacement within a string literal, > currently parameters can only be used to swap out the string literal in its > entirety. It may be complicated to implement as that parameter resolution > would need to be added to all steps which accept string arguments. It may be > best to spin this into it's own discussion if there is interest in pursuing > this. > > > > I still haven't quite aligned myself regarding single non-element > > returns. I'll reply back on this topic soon. > > > > I'm curious to see what you think. > > I've worked through some examples here and my preference is not to wrap > single returns in maps. I understand the desire to limit the possible return > types from the match step to just Elements and Maps, but in my opinion this > is outweighed by the convenience of directly using the results. For instance > with map wrapping: > g.match("MATCH (n{name:'Cole'}) RETURN > n.birthday").select("n.birthday").dateDiff(datetime("2000-01-01")) > compared to without maps: > g.match("MATCH (n{name:'Cole'}) RETURN > n.birthday").dateDiff(datetime("2000-01-01")) > > The map wrapping and associated select feels unnecessary to me and gets in > the way. I feel similarly about the following examples: > > g.match("MATCH (n:person) RETURN n.age").select("n.age").order().limit(5) vs. > g.match("MATCH (n:person) RETURN n.age").order().limit(5) > > g.V(1).property('friendWeight', match("MATCH (n{name:'Cole'})-[e:knows]->() > RETURN sum(e.weight)").select("sum(e.weight)")) vs. > g.V(1).property('friendWeight', match("MATCH (n{name:'Cole'})-[e:knows]->() > RETURN sum(e.weight)")) > > I couldn't come up with examples where I wanted to retain the results in > their maps so the select() always feels like an unnecessary chore to me. > Without these maps, the possible return types of match() would grow to > include any property type supported by the graph, as well as the return types > of any functions included in the declarative language. This is more complex > but not without precedent considering steps such as inject() and constant(). > > Of course for any match query which returns multiple results, a map of all of > them should be returned: > g.match("MATCH (p:person)-[e:created]->(s:software) RETURN *") > -> {"p": V[1], "e": E[9], "s": V[3]} > > In my mind this is mostly a matter of a small convenience. If you feel > strongly that wrapping any non-element results into maps is preferable, I can > accept that as well. > > Thanks, > Cole > > > On 2025/08/27 15:20:31 Andrii Lomakin wrote: > > Good day. > > > > >I suppose I'm approaching this one more from the perspective that I don't > > see why these parameters need to be isolated to just the match subquery. > > > > Thank you, Cole, for your feedback. > > While you paused further analysis, I investigated code a bit, and I think > > we can transform the idea of introduction of new step, to the idea of usage > > of `with` step and provide the following modulation rule for the new > > `match` step: if name of the key in with step is passed in with "$" prefix, > > this prefix is removed an the rest of the key is used as query parameter. > > It is quite a common way of naming the parameters. As for binding of > > parameters for server queries, if query parameters are not provided > > explicitly, then we will perform an implicit lookup over the bindings of > > those parameters. > > "Global" parameters can be applied in `with` Step in GraphTraversalSource > > using the same approach. > > > > In such case, your query example would look like: > > > > ` g.match("MATCH (src:Airport {code:srcCode}), (dest:Airport > > {code:destCode}) RETURN src") > > .addE("Route").to("dest") > > .property(T.id, > > format("%{_}-%{_}").by(constant("$srcCode")).by(constant("$destCode")))` > > > > There is a discrepancy between the naming of parameters between GQL and > > Gremlin, but that is, IMHO, acceptable. > > As one more alternative, probably even more appealing, we can wrap > > parameters in "{}", as Koltin does :-) > > That will resemble GQL style and will not create a visual mess. > > > > So it will look like: > > ` g.match("MATCH (src:Airport {code:srcCode}), (dest:Airport > > {code:destCode}) RETURN src") > > .addE("Route").to("dest") > > .property(T.id, > > format("%{_}-%{_}").by(constant("{srcCode}")).by(constant("{destCode}")))` > > > > Also, nobody prohibits keeping the policy of resolving parameter binding as > > it is right now for server queries, with the recommended way to use the new > > approach, so it will not be a breaking change and I doubt that many users > > use string literals wrapped {} as values. > > > > > I still haven't quite aligned myself regarding single non-element > > returns. I'll reply back on this topic soon. > > > > I'm curious to see what you think. > > > > > Thanks again for driving these discussions. In my opinion this will be > > one of the most exciting additions to gremlin in quite some time. > > > > Thank you, I am totally flattered :-) > > > > > > > > > > > > > > > > > > On Tue, Aug 26, 2025 at 12:13 AM Cole Greer <colegr...@apache.org> wrote: > > > > > Hi Andrii, > > > > > > It was great to see your response. I think we are mostly in agreement > > > here. > > > > > > > It would be even better, IMHO, if the TP project added an ANTLR4 parser > > > for GQL match statements > > > > > > Agreed, I've been loosely following LDBC's Open GQL project which has > > > produced an Apache 2 licensed GQL Antlr grammar which likely offers a good > > > starting point. > > > https://github.com/opengql/grammar > > > > > > > Except for obvious query injection cases, which, in the absence of query > > > parameters, should be handled by users themselves > > > > > > I mostly considered this in the remote context, in which reliance on > > > gremlin-server for parameters is not an issue. I suppose there may be > > > embedded use cases in which query injection is a concern, however this > > > seems much rarer than the remote case. > > > > > > > another important argument for the presence of query parameters is that > > > query parsing is quite a heavy process > > > > > > I definitely agree on this front. > > > > > > > >I would prefer to solve that problem at the broader gremlin level, > > > instead of isolating it to the match step. > > > > > > > > Would you happen to have any other applications in mind? > > > > > > I suppose I'm approaching this one more from the perspective that I don't > > > see why these parameters need to be isolated to just the match subquery. > > > > > > Parameters is already a bit overloaded and messy in TinkerPop and I hope > > > to reduce that complexity overtime. As already noted, remote gremlin > > > scripts already have the ability to use parameters via gremlin-server. > > > Bytecode requests currently have bindings which serve a similar purpose. > > > Internally we also have the Parameterizing interface which is more about > > > steps supporting things like `with()` modulation, and not related to query > > > parameters. > > > > > > I think it's easier for users if we simply have one set of query > > > parameters instead of fractured gremlin parameters and match parameters. I > > > expect there are some cases where it is useful to reference the same > > > parameter in both the gremlin and GQL portions of a query, although it is > > > admittedly not a common use case. The following query is a somewhat > > > contrived example where the same parameters are used to match 2 nodes, and > > > then the same parameters are concatenated together to form an id for a new > > > edge which is added between the nodes: > > > g.match("MATCH (src:Airport {code:srcCode}), (dest:Airport > > > {code:destCode}) RETURN src") > > > .addE("Route").to("dest") > > > .property(T.id, > > > format("%{_}-%{_}").by(constant(srcCode)).by(constant(destCode))) > > > > > > There may also be cases where it is useful to have multiple match steps in > > > a single traversal which reuse the same parameters. > > > > > > Taking the existing remote query parameters, reworking them to support the > > > embedded case as well, then making those parameters available to the new > > > match step would solve the query injection and parse cache problems > > > without > > > introducing an additional form of parameters for users to handle. > > > > > > > > I will take some time next week to work through some example queries > > > and get a better sense of how I feel on each option here. > > > > > > > > Looking forward to reading your conclusions. > > > > > > I still haven't quite aligned myself regarding single non-element returns. > > > I'll reply back on this topic soon. > > > > > > Thanks again for driving these discussions. In my opinion this will be one > > > of the most exciting additions to gremlin in quite some time. > > > > > > Regards, > > > Cole > > > > > > On 2025/08/23 14:00:51 Andrii Lomakin wrote: > > > > Good day, Cole. > > > > > > > > Glad to exchange more ideas with you in this thread. > > > > > > > > >I think it would make sense for TinkerPop to adopt a default language > > > for the new match step, which is some heavily restricted form of GQL > > > (read-only, limited to basic MATCH, WHERE, and RETURN statements). This > > > "standard" language could then be used in the new match step without a > > > language with-modulator. Providers would still be free to support their > > > own > > > languages via that modulator if they choose. > > > > > > > > That makes sense, I agree with you. > > > > It would be even better, IMHO, if the TP project added an ANTLR4 > > > > parser for GQL match statements (there is already at least one ANTLR > > > > spec in the public domain) that vendors can use to work on the AST > > > > level. We can talk about possible collaboration on this task. > > > > > > > > > I'd be interested if you have any examples where embedded parameters > > > present a clear advantage. > > > > > > > > I expected that this question would be raised :-) > > > > But decided to move the discussion to a follow-up thread to avoid > > > > polluting the main proposal. > > > > Except for obvious query injection cases, which, in the absence of > > > > query parameters, should be handled by users themselves, another > > > > important argument for the presence of query parameters is that query > > > > parsing is quite a heavy process, and the consumption of 20% of CPU > > > > resources on query parsing is not a rare exception. > > > > To avoid this overhead, query parsing results (likely ASTs) are cached > > > > by a simple string hash code (likely the only way, as they are not > > > > parsed in this phase). Of course, the absence of query parameters very > > > > often increases the variability of queries by several orders of > > > > magnitude and voids caching efforts. > > > > > > > > >I would prefer to solve that problem at the broader gremlin level, > > > instead of isolating it to the match step. > > > > > > > > Would you happen to have any other applications in mind? > > > > > > > > > I will take some time next week to work through some example queries > > > and get a better sense of how I feel on each option here. > > > > > > > > Looking forward to reading your conclusions. > > > > > > > > >. I think that all "variables" bound in the match query should be > > > stored such that they are later selectable. > > > > > > > > Yeah, cool idea! > > > > > > > > >Overall I think this would be a great change to gremlin. I look forward > > > to keeping this discussion going and ultimately seeing the changes land in > > > TinkerPop. > > > > > > > > Thank you, Cole! > > > > Once the discussion comes to a natural conclusion, I will summarize > > > > all the ideas again to ensure that we are all on the same page. Then, > > > > we will add it to our roadmap. > > > > > > > > On Sat, Aug 23, 2025 at 12:01 AM Cole Greer <colegr...@apache.org> > > > wrote: > > > > > > > > > > Hi Andrii, > > > > > > > > > > Thanks for starting this discussion and putting together this > > > proposal. I want to start by saying that overall, I'm massively in favour > > > of the proposed overhaul of match(). This is a topic that has come up many > > > times in the past, and taking advantage of an established declarative > > > language like GQL always seems to be the preferred solution. > > > > > > > > > > The idea of having the language configurable via something like > > > `.with(“language”, > > > > > “GQL”)` is quite interesting, and something I haven't seen in previous > > > discussions. There is clear value in allowing providers to support their > > > own preferred declarative languages here, but I also worry about the loss > > > of query portability if TinkerPop is too hands off on the choice of > > > declarative language. I believe the vast majority of usages here will be > > > seeing a traversal with a simple GQL-like match pattern. I think it would > > > make sense for TinkerPop to adopt a default language for the new match > > > step, which is some heavily restricted form of GQL (read-only, limited to > > > basic MATCH, WHERE, and RETURN statements). This "standard" language could > > > then be used in the new match step without a language with-modulator. > > > Providers would still be free to support their own languages via that > > > modulator if they choose. > > > > > > > > > > I will take a bit more time to consider the withParameter() proposal. > > > My initial reaction is that I prefer to tie it into the existing parameter > > > bindings included in remote requests to gremlin-server. I would like query > > > parameters to function in a unified manner across the entire traversal if > > > possible, instead of a separate detached system isolated to the new match > > > step. I understand the current limitation of only supporting parameters in > > > remote traversals. I'm not immediately seeing the need to support > > > parameters for embedded traversals here, I'd be interested if you have any > > > examples where embedded parameters present a clear advantage. If we do > > > decide there is a need for embedded parameters, I would prefer to solve > > > that problem at the broader gremlin level, instead of isolating it to the > > > match step. > > > > > > > > > > I totally agree that the start and mid-step behaviour of the new match > > > step should be modeled after V() and E(). > > > > > > > > > > I think the trickiest part of getting this right is the return types. > > > The most common use cases I expect is where the RETURN clause only > > > includes > > > a single node or edge. In this case I completely agree with returning the > > > element itself. I definitely want to support usages such as g.match("MATCH > > > (n{name:'Cole'}) RETURN n").out()... My main tenet here is that results > > > should naturally flow from the declarative match into the subsequent > > > gremlin and be easy to consume. If multiple objects are returned, I would > > > agree that it is necessary to return a Map<String, ?> as in g.match("MATCH > > > (p:person)-[e:created]->(s:software) RETURN *") -> {"p": V[1], "e": E[9], > > > "s": V[3]} ... > > > > > > > > > > I'm still on the fence for how to handle single returns of > > > non-elements. I see the value in your recommendation to return a map of > > > size 1, but I also see some convenience to directly returning the value > > > (usually a single property). I will take some time next week to work > > > through some example queries and get a better sense of how I feel on each > > > option here. > > > > > > > > > > There is one final item which I would like to see added to the > > > proposal. I think that all "variables" bound in the match query should be > > > stored such that they are later selectable. Essentially I think it's > > > important to support something like this: > > > > > > > > > > g.match("MATCH (n1{name:'Cole'})-[]->(n2) RETURN > > > n1").where(...)...select(n2).out()... > > > > > > > > > > The ability to select other bound variables later in the traversal > > > should greatly limit the number of times users are forced to return > > > multiple items at once, which reduces the amount of use cases where users > > > will be forced to break down maps in gremlin to complete their query. > > > > > > > > > > Overall I think this would be a great change to gremlin. I look > > > forward to keeping this discussion going and ultimately seeing the changes > > > land in TinkerPop. > > > > > > > > > > Thanks, > > > > > Cole > > > > > > > > > > On 2025/08/22 15:46:10 Andrii Lomakin wrote: > > > > > > Good day. > > > > > > > > > > > > I propose new semantics for the match step in Gremlin, which we > > > discussed > > > > > > briefly in the Discord chat. The current ideas listed partially > > > summarize > > > > > > ideas suggested by several discussion participants. > > > > > > > > > > > > The current semantics of the match step are complex to optimize, so > > > users > > > > > > do not use this step in practice, and DB vendors do not recommend > > > using > > > > > > match step in queries. > > > > > > > > > > > > Instead, what is proposed is to provide a new match step based on > > > > > > declarative semantics. > > > > > > > > > > > > Signature of this step is quite simple: Travervsal<S, E> > > > > > > match(String > > > > > > matchQuery). > > > > > > > > > > > > Where matchQuery is a match statement written in declarative query > > > language > > > > > > supported by the provider, I will use GQL as an example below. > > > > > > > > > > > > This step will require the language as a configuration parameter > > > provided > > > > > > using with the step. > > > > > > > > > > > > So the simplest query will look like: > > > > > > > > > > > > g.match(“MATCH > > > (person:Person)-[:knows]->(friend:Person)”).with(“language”, > > > > > > “GQL”) > > > > > > > > > > > > match step can accept query parameters, so if we provide a query > > > > > > like > > > > > > g.match(“MATCH > > > > > > (p:Person WHERE p.name = $personName)RETURN > > > p.email”).with(“language”, > > > > > > “GQL”) > > > > > > > > > > > > we may use parameter bindings, but it will work only for interaction > > > with > > > > > > Gremlin Server, so instead, I propose an additional modulator step: > > > > > > withParameter(String > > > > > > name, Object value) > > > > > > > > > > > > In such case final version will look like: g.match(“MATCH (p:Person > > > WHERE > > > > > > p.name = $personName) RETURN p.email”).with(“language”, > > > > > > “GQL”).withParameter(“personName”, “Stephen”) > > > > > > > > > > > > Alongside the version of withParameter step that provides the name > > > of the > > > > > > query parameter, a version with the following signature should also > > > be > > > > > > provided: withParameter(int index, Object value) for query languages > > > that > > > > > > support indexed parameters with/instead of named parameters. > > > > > > > > > > > > Because we already introduced one modulator step, it is reasonable > > > > > > to > > > > > > consider replacing it with step by more specific withQueryLanguage() > > > > > > modulator step that will allow us to add more expressiveness to the > > > > > > resulting queries. > > > > > > > > > > > > In such case final version will look like: g.match(“MATCH (p:Person > > > WHERE > > > > > > p.name = $personName) RETURN > > > > > > p.email”).withQueryLanguage(“GQL”).withParameter(“personName”, > > > “Stephen”) > > > > > > > > > > > > As for the scope of application of this step, I recommend making it > > > behave > > > > > > exactly as it is implemented for the V() and E() steps. It could be > > > added > > > > > > in the middle of GraphTraversal, but the execution result will be > > > the same > > > > > > pattern matching execution applied to the whole graph stored in the > > > > > > database (not to the item filtered/transformed by the previous > > > steps). > > > > > > > > > > > > It also means that match step will be added to the > > > GraphTraversalSource. > > > > > > > > > > > > As for the format of the output of the match step, I would recommend > > > the > > > > > > following: > > > > > > > > > > > > 1. If the match statement returns an Element instance, it is > > > returned as > > > > > > is. > > > > > > > > > > > > 2. Otherwise, it should return any value that is allowed to be a > > > property > > > > > > value in Element. > > > > > > > > > > > > 3. I would add an optional recommendation to return either Element > > > > > > or > > > > > > Map<String, > > > > > > ?> where the key of the map is the result a projection of the query > > > result > > > > > > which in case of query g.match(“MATCH (p:Person WHERE p.name = > > > > > > $personName) RETURN > > > > > > p.email”).withQueryLanguage(“GQL”).withParameter(“personName”, > > > “Stephen”) > > > > > > > > > > > > will look like {“p.email”: “s...@gmail.com”}. Following this > > > > > > optional > > > > > > recommendation will, IMHO, improve user experience. > > > > > > > > > > > > This step should be restricted to executing only idempotent queries. > > > > > > > > > > > > I would also recommend adding versions of withParameter() that > > > > > > accept > > > > > > Traversal as a value of the parameters, namely: > > > > > > 1. withParameter(String name, TraversalSource value) > > > > > > > > > > > > 2. withParameter(int index, TraversalSource value) > > > > > > > > > > > > > > > > > > > > > > > > The current version of the match step should be deprecated and then > > > removed. > > > > > > > > > > > > I want to thank Stephen Mallette, whose initial idea closely aligned > > > with > > > > > > ours and who actively contributed to our discussions. > > > > > > > > > > > > I'm looking forward to your thoughts, observations, and any other > > > feedback > > > > > > you may have. > > > > > > > > > > > > Best Regards, > > > > > > YouTrackDB development lead > > > > > > Andrii Lomakin > > > > > > > > > > > > > > >