So, I've been doing some additional thinking about the ways that this could work based on the comments below and have put my comments inline.
Dave On Tue, Sep 22, 2020 at 6:05 AM Stephen Mallette <[email protected]> wrote: > I added some thoughts inline below: > > On Fri, Sep 18, 2020 at 3:51 PM David Bechberger <[email protected]> > wrote: > > > Thanks for the detailed comments Stephen. I have addressed them inline > > below. > > > > I did read the proposal from earlier and I think that we are in close > > agreement with what we are trying to accomplish. I also fully support > > Josh's comment on providing a mechanism for submitting a map of > properties > > as manually unrolling this all right now leads to a lot of potential for > > error and a long messy traversal. > > > > I'm looking forward to this discussion on how to merge these two > proposals. > > > > > > 1. How would multi/meta-properties fit into the API you've proposed? > > > > My first thought here is that multi-properties would be represented as > > lists in the map, e.g. > > > > {names: ['Dave', 'David']} > > > > and meta-properties would be represented as maps in the maps, e.g. > > > > {name: {first: 'Dave', last: 'Bechberger'}} > > > > I can't say I've thought through all the implications of this though so > it > > is an area we would need to explore. > > > > The first implication that comes to mind is that it makes the assumption > the user wants multi/meta-properties as opposed to a single cardinality of > List in the first case and a Map as a property value in the second case. I > suppose that graphs with a schema could resolve those assumptions but > graphs that are schemaless would have a problem. The issue could be > resolved by specialized configuration of "g" or per merge() step using a > with() modulator I suppose but that goes into a yet another level of > implications to consider. I've often wondered if the start point for > getting types/schema into TP3 without a full rewrite would be in this form > where Gremlin would be given hints as to what to expect as to the types of > data it might encounter while traversing. Anyway, I'd be hesitant to go > down paths that don't account for multi/metaproperties well. They are > first class citizens in TP3 (with those hoping for extension of at least > multiproperties to edges) and while I find them a constant annoyance for so > many reasons, we're kinda stuck with them. > I agree we need to account for multi/meta properties as 1st class citizens. This is my current thinking on the syntax for the situations we have laid out so far: *Merge* g.mergeV('name', {'name': 'marko'}, {'name': 'marko', 'age': 29}) *Merge with id* g.mergeV('name', {T.id: 1}, {T.id: 1, 'name': 'Marko'}) *Merge with Meta Properties* g.mergeV('name', {T.id: 1}, {T.id: 1, 'name': {'first': Marko', 'last': 'R'}) *Merge with Multi Properties * g.mergeV('name', {T.id: 1}, {T.id: 1, 'lang': ['java', 'scala']) *Merge with Single Cardinality* g.mergeV('name', {T.id: 1}, {T.id: 1, 'lang': 'java') *Merge with List Cardinality* g.mergeV('name', {T.id: 1}, {T.id: 1, 'lang': ['java', 'scala', 'java']) *Merge with Set Cardinality * g.mergeV('name', {T.id: 1}, {T.id: 1, 'lang': ['java', 'scala']) Since in a mergeV() scenario we are only ever adding whatever values are passed in there would be no need to specify the cardinality of the property being added. If they wanted to add a value to an existing property then the current property() method would still be available on the output traversal. e.g. g.mergeV('name', {T.id: 1}, {T.id: 1, 'lang': ['java', 'scala', 'java']).property(Cardinality.list, 'lang, 'java') *Merge with stream * g.inject([{id: 1, label: 'person', 'name': 'marko'}, {id: 2, label: 'person', 'name': 'josh'}]). mergeV(values('label'), valueMap('id'), valueMap()) *Merge with reporting added or updated side effect* g.mergeV('name', {'name': 'marko'}, {'name': 'marko', 'age': 29}). with(WithOptions.sideEffectLabel, 'a'). project('vertex', 'added'). by(). by(cap('a')) > > > > 2. How would users set the T.id on creation? would that T.id just be a > key > > in the first Map argument? > > > > Yes, I was thinking that T.id would be the key name, e.g.: > > > > g.mergeV('name', {T.id: 1}, {'name': 'Marko'}) > > > > ok - I can't say I see a problem with that atm. > > > > 3. I do like the general idea of a match on multiple properties for the > > first argument as a convenience but wonder about the specificity of this > > API a bit as it focuses heavily on equality - I suppose that's most cases > > for get-or-create, so perhaps that's ok. > > > > In most cases I've seen use exact matches on the vertex or edge. I think > > it might be best to keep this straightforward as any complex edge cases > > still can perform the same functionality using the coalesce() pattern. > > > > I played around with the idea to use modulators to apply additional > constraints: > > g.mergeV('person', {'state':'NY'}, {'active': true}). > by(has('age', gt(29)) > > I suppose that's neat but maybe not...we could easily get the same thing > from: > > g.V().has('person','age',gt(29)). > mergeV('person', {'state':'NY'}, {'active': true}). > > which is more readable though it does make it harder for providers to > optimize upsert operations if they could make use of the has() as part of > that. I think I like has() as part of the main query rather than in a by() > - just thinking out loud. > > Another potential option here would be to allow the second parameter of mergeV() accept a traversal. Not sure how much complexity that would add though g.V().mergeV('person', __.has('person','age',gt(29)).has('state', 'NY'}., {'active': true}). > > > 4. I think your suggestion points to one of the troubles Gremlin has > which > > we see with "algorithms" - extending the language with new steps that > > provides a form of "sugar" (e.g. in algorithms we end up with > > shortestPath() step) pollutes the core language a bit, hence my > > generalization of "merging" in my link above which fits into the core > > Gremlin language style. There is a bigger picture where we are missing > > something in Gremlin that lets us extend the language in ways that let us > > easily introduce new steps that aren't for general purpose. This issue is > > discussed in terms of "algorithms" here: > > https://issues.apache.org/jira/browse/TINKERPOP-1991 but I think we > could > > see how there might be some "mutation" extension steps that would cover > > your suggested API, plus batch operations, etc. We need a way to add > > "sugar" without it interfering with the consistency of the core. > Obviously > > this is a bigger issue but perhaps important to solve to implement steps > in > > the fashion you describe. > > > > I agree that the "algorithm" steps do seem a bit odd in the core language > > but I think the need is there. I'd be interested in furthering this > > discussion but I think these "pattern" steps may or may not be the same > as > > the algorithm steps. I'll have to think on that. > > > > > 5. I suppose that the reason for mergeE and mergeV is to specify what > > element type the first Map argument should be applied to? what about > > mergeVP (i.e. vertex property as it too is an element) ? That's tricky > but > > I don't think we should miss that. Perhaps merge() could be a "complex > > modulator"?? that's a new concept of course, but you would do > g.V().merge() > > and the label and first Map would fold to VertexStartStep (i.e. V()) for > > the lookup and then a MergeStep would follow - thus a "complex modulator" > > as it does more than just change the behavior of the previous step - it > > also adds its own. I suppose it could also add has() steps followed by > the > > MergeStep and then the has() operations would fold in normally as they do > > today. In this way, we can simplify to just one single > > merge(String,Map,Map). ?? > > > > I agree that we should also think about how to include properties in this > > merge construct. The reason I was thinking about mergeV() and mergeE() > is > > that it follows the same pattern as the already well understood > > addV()/addE() steps. I am a bit wary of trying to generalize this down > to > > a single merge() step as these sorts of complex overloads make it hard to > > figure out which Gremlin step you should use for a particular pattern > (e.g. > > upsert vertex or upsert edge). One thing that I think customers find > > useful about the addV/addE step is that they are very discoverable, the > > name tells me what functionality to expect from that step. > > > > I'm starting to agree with the idea that we have to do something like > mergeV()/E() as it wouldn't quite work as a start step otherwise. We > couldn't do: > > g.merge() > > as we wouldnt know what sort of Element it applied to. If so, I wonder if > it would be better to preserve "merge" for my more general use-case and > prefer upsertV()/E()? > > Also, perhaps mergeVP() doesn't need to exist as perhaps it is an uncommon > case (compared to V/E()) and we don't really have addVP() as an analogous > step. Perhaps existing coalesce() patterns and/or my more general purpose > merge() step would satisfy those situations?? > I think that a mergeVP() type step may require a different pattern since it would really need to accept a map of values otherwise it would essentially perform the same function as property() > > > > 6. One thing neither my approach nor yours seems to do is tell the user > if > > they created something or updated something - that's another thing I've > > seen users want to have in get-or-create. Here again we go deeper into a > > less general step specification as alluded to in 4, but a merge() step as > > proposed in 5, might return [Element,boolean] so as to provide an > indicator > > of creation? > > > > Hmm, I'll have to think about that. How would returning multiple values > > work if we want to chain these together. e.g. Add a vertex between two > > edges but make sure the vertices exist? > > > Added a thought on how to accomplish this above that I'd like to get thoughts on. > > > > 7. You were just introducing your ideas here, so perhaps you haven't > gotten > > this far yet, but a shortcoming to doing merge(String,Map,Map) is that it > > leaves open no opportunity to stream a List of Maps to a merge() for a > form > > of batch loading which is mighty common and one of the variations of the > > coalesce() pattern that I alluded to at the start of all this. I think > that > > we would want to be sure that we left open the option to do that somehow. > > 8. If we had a general purpose merge() step I wonder if it makes > developing > > the API as you suggested easier to do? > > > > Hmm, let me think about that one. > > > > > I will add an item 8 to think about which I didn't mention before: > > 8. The signature you suggested for mergeE() is: > > mergeE(String, Map, Map) > String - The edge label > Map (first) - The properties to match existing edge on > Map (second) - Any additional properties to set if a new edge is > created (optional) > > but does that exactly answer the need? Typically the idea is to detect an > edge between two vertices not globally in the graph by way of some > properties. This signature doesn't seem to allow for that as it doesn't > allow specification of the vertices to test against. Perhaps the answer is > to use from() and to() modulators? > > g.mergeE('knows', {'weight':0.5}). > from(V(1)).to(V(2)) > g.V().has('person','name','marko'). > mergeE( 'knows', {'weight':0.5}). > to(V(2)) > g.V().has('person','name','marko'). > mergeE( 'self', {'weight':0.5}) > > That seems to work? > I think that is an elegant solution to that problem. I like it and it keeps in line with the way that addE() works. > > > > > On Thu, Sep 17, 2020 at 4:31 AM Stephen Mallette <[email protected]> > > wrote: > > > > > I like our coalesce() pattern but it is verbose and over time it has > gone > > > from a simple pattern to one with numerous variations for all manner of > > > different sorts of merge-like operations. As such, I do think we should > > > introduce something to cover this pattern. > > > > > > I like that you used the word "merge" in your description of this as it > > is > > > the word I've liked using. You might want to give a look at my proposed > > > merge() step from earlier in the year: > > > > > > > > > > > > https://lists.apache.org/thread.html/r34ff112e18f4e763390303501fc07c82559d71667444339bde61053f%40%3Cdev.tinkerpop.apache.org%3E > > > > > > I'm just going to dump thoughts as they come regarding what you wrote: > > > > > > 1. How would multi/meta-properties fit into the API you've proposed? > > > 2. How would users set the T.id on creation? would that T.id just be a > > key > > > in the first Map argument? > > > 3. I do like the general idea of a match on multiple properties for the > > > first argument as a convenience but wonder about the specificity of > this > > > API a bit as it focuses heavily on equality - I suppose that's most > cases > > > for get-or-create, so perhaps that's ok. > > > 4. I think your suggestion points to one of the troubles Gremlin has > > which > > > we see with "algorithms" - extending the language with new steps that > > > provides a form of "sugar" (e.g. in algorithms we end up with > > > shortestPath() step) pollutes the core language a bit, hence my > > > generalization of "merging" in my link above which fits into the core > > > Gremlin language style. There is a bigger picture where we are missing > > > something in Gremlin that lets us extend the language in ways that let > us > > > easily introduce new steps that aren't for general purpose. This issue > is > > > discussed in terms of "algorithms" here: > > > https://issues.apache.org/jira/browse/TINKERPOP-1991 but I think we > > could > > > see how there might be some "mutation" extension steps that would cover > > > your suggested API, plus batch operations, etc. We need a way to add > > > "sugar" without it interfering with the consistency of the core. > > Obviously > > > this is a bigger issue but perhaps important to solve to implement > steps > > in > > > the fashion you describe. > > > 5. I suppose that the reason for mergeE and mergeV is to specify what > > > element type the first Map argument should be applied to? what about > > > mergeVP (i.e. vertex property as it too is an element) ? That's tricky > > but > > > I don't think we should miss that. Perhaps merge() could be a "complex > > > modulator"?? that's a new concept of course, but you would do > > g.V().merge() > > > and the label and first Map would fold to VertexStartStep (i.e. V()) > for > > > the lookup and then a MergeStep would follow - thus a "complex > modulator" > > > as it does more than just change the behavior of the previous step - it > > > also adds its own. I suppose it could also add has() steps followed by > > the > > > MergeStep and then the has() operations would fold in normally as they > do > > > today. In this way, we can simplify to just one single > > > merge(String,Map,Map). ?? > > > 6. One thing neither my approach nor yours seems to do is tell the user > > if > > > they created something or updated something - that's another thing I've > > > seen users want to have in get-or-create. Here again we go deeper into > a > > > less general step specification as alluded to in 4, but a merge() step > as > > > proposed in 5, might return [Element,boolean] so as to provide an > > indicator > > > of creation? > > > 7. You were just introducing your ideas here, so perhaps you haven't > > gotten > > > this far yet, but a shortcoming to doing merge(String,Map,Map) is that > it > > > leaves open no opportunity to stream a List of Maps to a merge() for a > > form > > > of batch loading which is mighty common and one of the variations of > the > > > coalesce() pattern that I alluded to at the start of all this. I think > > that > > > we would want to be sure that we left open the option to do that > somehow. > > > 8. If we had a general purpose merge() step I wonder if it makes > > developing > > > the API as you suggested easier to do? > > > > > > I think I'd like to solve the problems you describe in your post as > well > > as > > > the ones in mine. There is some relation there, but gaps as well. With > > more > > > discussion here we can figure something out. > > > > > > Thanks for starting this talk - good one! > > > > > > > > > > > > On Wed, Sep 16, 2020 at 9:26 PM David Bechberger <[email protected]> > > > wrote: > > > > > > > I've had a few on and off discussions with a few people here, so I > > wanted > > > > to send this out to everyone for feedback. > > > > > > > > What are people's thoughts on creating a new set of steps that codify > > > > common Gremlin best practices? > > > > > > > > I think there are several common Gremlin patterns where users would > > > benefit > > > > from the additional guidance that these codified steps represent. > The > > > > first one I would recommend though is codifying the element existence > > > > pattern into a single Gremlin step, something like: > > > > > > > > mergeV(String, Map, Map) > > > > String - The vertex label > > > > Map (first) - The properties to match existing vertices on > > > > Map (second) - Any additional properties to set if a new vertex > is > > > > created (optional) > > > > mergeE(String, Map, Map) > > > > String - The edge label > > > > Map (first) - The properties to match existing edge on > > > > Map (second) - Any additional properties to set if a new edge is > > > > created (optional) > > > > > > > > In each of these cases these steps would perform the same upsert > > > > functionality as the element existence pattern. > > > > > > > > Example: > > > > > > > > g.V().has('person','name','stephen'). > > > > fold(). > > > > coalesce(unfold(), > > > > addV('person'). > > > > property('name','stephen'). > > > > property('age',34)) > > > > > > > > would become: > > > > > > > > g.mergeV('person', {'name': 'stephen'}, {'age', 34}) > > > > > > > > I think that this change would be a good addition to the language for > > > > several reasons: > > > > > > > > * This codifies the best practice for a specific action/recipe, which > > > > reduces the chance that someone uses the pattern incorrectly > > > > * Most complex Gremlin traversals are verbose. Reducing the amount > of > > > code > > > > that needs to be written and maintained allows for a better developer > > > > experience. > > > > * It will lower the bar of entry for a developer by making these > > actions > > > > more discoverable. The more we can help bring these patterns to the > > > > forefront of the language via these pattern/meta steps the more we > > guide > > > > users towards writing better Gremlin faster > > > > * This allows DB vendors to optimize for this pattern > > > > > > > > I know that this would likely be the first step in Gremlin that > > codifies > > > a > > > > pattern, so I'd like to get other's thoughts on this? > > > > > > > > Dave > > > > > > > > > >
