Re: [DISCUSS] Creating pattern steps to codify best practices

David Bechberger Mon, 28 Sep 2020 15:49:48 -0700

So, I've been doing some additional thinking about the ways that this could
work based on the comments below and have put my comments inline.


Dave

On Tue, Sep 22, 2020 at 6:05 AM Stephen Mallette <[email protected]>
wrote:

> I added some thoughts inline below:
>
> On Fri, Sep 18, 2020 at 3:51 PM David Bechberger <[email protected]>
> wrote:
>
> > Thanks for the detailed comments Stephen.  I have addressed them inline
> > below.
> >
> > I did read the proposal from earlier and I think that we are in close
> > agreement with what we are trying to accomplish.  I also fully support
> > Josh's comment on providing a mechanism for submitting a map of
> properties
> > as manually unrolling this all right now leads to a lot of potential for
> > error and a long messy traversal.
> >
> > I'm looking forward to this discussion on how to merge these two
> proposals.
> >
> >
> > 1. How would multi/meta-properties fit into the API you've proposed?
> >
> > My first thought here is that multi-properties would be represented as
> > lists in the map, e.g.
> >
> > {names: ['Dave', 'David']}
> >
> > and meta-properties would be represented as maps in the maps, e.g.
> >
> > {name: {first: 'Dave', last: 'Bechberger'}}
> >
> > I can't say I've thought through all the implications of this though so
> it
> > is an area we would need to explore.
> >
>
> The first implication that comes to mind is that it makes the assumption
> the user wants multi/meta-properties as opposed to a single cardinality of
> List in the first case and a Map as a property value in the second case. I
> suppose that graphs with a schema could resolve those assumptions but
> graphs that are schemaless would have a problem. The issue could be
> resolved by specialized configuration of "g" or per merge() step using a
> with() modulator I suppose but that goes into a yet another level of
> implications to consider. I've often wondered if the start point for
> getting types/schema into TP3 without a full rewrite would be in this form
> where Gremlin would be given hints as to what to expect as to the types of
> data it might encounter while traversing. Anyway, I'd be hesitant to go
> down paths that don't account for multi/metaproperties well.  They are
> first class citizens in TP3 (with those hoping for extension of at least
> multiproperties to edges) and while I find them a constant annoyance for so
> many reasons, we're kinda stuck with them.
>

I agree we need to account for multi/meta properties as 1st class
citizens.  This is my current thinking on the syntax for the situations we
have laid out so far:

*Merge*
g.mergeV('name', {'name': 'marko'}, {'name': 'marko', 'age': 29})


*Merge with id* g.mergeV('name', {T.id: 1}, {T.id: 1, 'name': 'Marko'})


*Merge with Meta Properties* g.mergeV('name', {T.id: 1}, {T.id: 1, 'name':
{'first': Marko', 'last': 'R'})


*Merge with Multi Properties * g.mergeV('name', {T.id: 1}, {T.id: 1,
'lang': ['java', 'scala'])


*Merge with Single Cardinality* g.mergeV('name', {T.id: 1}, {T.id: 1,
'lang': 'java')


*Merge with List Cardinality* g.mergeV('name', {T.id: 1}, {T.id: 1, 'lang':
['java', 'scala', 'java'])


*Merge with Set Cardinality * g.mergeV('name', {T.id: 1}, {T.id: 1, 'lang':
['java', 'scala'])

Since in a mergeV() scenario we are only ever adding whatever values are
passed in there would be no need to specify the cardinality of the property
being added.  If they wanted to add a value to an existing property then
the current property() method would still be available on the output
traversal. e.g.
g.mergeV('name', {T.id: 1}, {T.id: 1, 'lang': ['java', 'scala',
'java']).property(Cardinality.list, 'lang, 'java')


*Merge with stream * g.inject([{id: 1, label: 'person', 'name': 'marko'},
{id: 2, label: 'person', 'name': 'josh'}]).
  mergeV(values('label'), valueMap('id'), valueMap())


*Merge with reporting added or updated side effect* g.mergeV('name',
{'name': 'marko'}, {'name': 'marko', 'age': 29}).
  with(WithOptions.sideEffectLabel, 'a').
  project('vertex', 'added').
    by().
    by(cap('a'))



>
>
> > 2. How would users set the T.id on creation? would that T.id just be a
> key
> > in the first Map argument?
> >
> > Yes, I was thinking that T.id would be the key name, e.g.:
> >
> > g.mergeV('name', {T.id: 1}, {'name': 'Marko'})
> >
>
> ok - I can't say I see a problem with that atm.
>
>
> > 3. I do like the general idea of a match on multiple properties for the
> > first argument as a convenience but wonder about the specificity of this
> > API a bit as it focuses heavily on equality - I suppose that's most cases
> > for get-or-create, so perhaps that's ok.
> >
> > In most cases I've seen use exact matches on the vertex or edge.  I think
> > it might be best to keep this straightforward as any complex edge cases
> > still can perform the same functionality using the coalesce() pattern.
> >
>
> I played around with the idea to use modulators to apply additional
> constraints:
>
> g.mergeV('person', {'state':'NY'}, {'active': true}).
>     by(has('age', gt(29))
>
> I suppose that's neat but maybe not...we could easily get the same thing
> from:
>
> g.V().has('person','age',gt(29)).
>   mergeV('person', {'state':'NY'}, {'active': true}).
>
> which is more readable though it does make it harder for providers to
> optimize upsert operations if they could make use of the has() as part of
> that. I think I like has() as part of the main query rather than in a by()
> - just thinking out loud.
>
>
Another potential option here would be to allow the second parameter of
mergeV() accept a traversal.  Not sure how much complexity that would add
though

g.V().mergeV('person', __.has('person','age',gt(29)).has('state', 'NY'}.,
{'active': true}).


>
> > 4. I think your suggestion points to one of the troubles Gremlin has
> which
> > we see with "algorithms" - extending the language with new steps that
> > provides a form of "sugar" (e.g. in algorithms we end up with
> > shortestPath() step) pollutes the core language a bit, hence my
> > generalization of "merging" in my link above which fits into the core
> > Gremlin language style. There is a bigger picture where we are missing
> > something in Gremlin that lets us extend the language in ways that let us
> > easily introduce new steps that aren't for general purpose. This issue is
> > discussed in terms of "algorithms" here:
> > https://issues.apache.org/jira/browse/TINKERPOP-1991 but I think we
> could
> > see how there might be some "mutation" extension steps that would cover
> > your suggested API, plus batch operations, etc. We need a way to add
> > "sugar" without it interfering with the consistency of the core.
> Obviously
> > this is a bigger issue but perhaps important to solve to implement steps
> in
> > the fashion you describe.
> >
> > I agree that the "algorithm" steps do seem a bit odd in the core language
> > but I think the need is there.  I'd be interested in furthering this
> > discussion but I think these "pattern" steps may or may not be the same
> as
> > the algorithm steps.  I'll have to think on that.
>
>
>
> > 5. I suppose that the reason for mergeE and mergeV is to specify what
> > element type the first Map argument should be applied to? what about
> > mergeVP (i.e. vertex property as it too is an element) ? That's tricky
> but
> > I don't think we should miss that. Perhaps merge() could be a "complex
> > modulator"?? that's a new concept of course, but you would do
> g.V().merge()
> > and the label and first Map would fold to VertexStartStep (i.e. V()) for
> > the lookup and then a MergeStep would follow - thus a "complex modulator"
> > as it does more than just change the behavior of the previous step - it
> > also adds its own. I suppose it could also add has() steps followed by
> the
> > MergeStep and then the has() operations would fold in normally as they do
> > today. In this way, we can simplify to just one single
> > merge(String,Map,Map). ??
> >
> > I agree that we should also think about how to include properties in this
> > merge construct.  The reason I was thinking about mergeV() and mergeE()
> is
> > that it follows the same pattern as the already well understood
> > addV()/addE() steps.  I am a bit wary of trying to generalize this down
> to
> > a single merge() step as these sorts of complex overloads make it hard to
> > figure out which Gremlin step you should use for a particular pattern
> (e.g.
> > upsert vertex or upsert edge).  One thing that I think customers find
> > useful about the addV/addE step is that they are very discoverable, the
> > name tells me what functionality to expect from that step.
> >
>
> I'm starting to agree with the idea that we have to do something like
> mergeV()/E() as it wouldn't quite work as a start step otherwise. We
> couldn't do:
>
> g.merge()
>
> as we wouldnt know what sort of Element it applied to. If so, I wonder if
> it would be better to preserve "merge" for my more general use-case and
> prefer upsertV()/E()?
>
> Also, perhaps mergeVP() doesn't need to exist as perhaps it is an uncommon
> case (compared to V/E()) and we don't really have addVP() as an analogous
> step. Perhaps existing coalesce() patterns and/or my more general purpose
> merge() step would satisfy those situations??
>

I think that a mergeVP() type step may require a different pattern since it
would really need to accept a map of values otherwise it would essentially
perform the same function as property()


>
>
> > 6. One thing neither my approach nor yours seems to do is tell the user
> if
> > they created something or updated something - that's another thing I've
> > seen users want to have in get-or-create. Here again we go deeper into a
> > less general step specification as alluded to in 4, but a merge() step as
> > proposed in 5, might return [Element,boolean] so as to provide an
> indicator
> > of creation?
> >
> > Hmm, I'll have to think about that.  How would returning multiple values
> > work if we want to chain these together. e.g. Add a vertex between two
> > edges but make sure the vertices exist?
> >
>

Added a thought on how to accomplish this above that I'd like to get
thoughts on.


> >
> > 7. You were just introducing your ideas here, so perhaps you haven't
> gotten
> > this far yet, but a shortcoming to doing merge(String,Map,Map) is that it
> > leaves open no opportunity to stream a List of Maps to a merge() for a
> form
> > of batch loading which is mighty common and one of the variations of the
> > coalesce() pattern that I alluded to at the start of all this. I think
> that
> > we would want to be sure that we left open the option to do that somehow.
> > 8. If we had a general purpose merge() step I wonder if it makes
> developing
> > the API as you suggested easier to do?
> >
> > Hmm, let me think about that one.
> >
> >
> I will add an item 8 to think about which I didn't mention before:
>
> 8. The signature you suggested for mergeE() is:
>
> mergeE(String, Map, Map)
>      String - The edge label
>      Map (first) - The properties to match existing edge on
>      Map (second) - Any additional properties to set if a new edge is
> created (optional)
>
> but does that exactly answer the need? Typically the idea is to detect an
> edge between two vertices not globally in the graph by way of some
> properties. This signature doesn't seem to allow for that as it doesn't
> allow specification of the vertices to test against. Perhaps the answer is
> to use from() and to() modulators?
>
> g.mergeE('knows', {'weight':0.5}).
>   from(V(1)).to(V(2))
> g.V().has('person','name','marko').
>   mergeE( 'knows', {'weight':0.5}).
>     to(V(2))
> g.V().has('person','name','marko').
>   mergeE( 'self', {'weight':0.5})
>
> That seems to work?
>

I think that is an elegant solution to that problem.  I like it and it
keeps in line with the way that addE() works.


>
>
>
> > On Thu, Sep 17, 2020 at 4:31 AM Stephen Mallette <[email protected]>
> > wrote:
> >
> > > I like our coalesce() pattern but it is verbose and over time it has
> gone
> > > from a simple pattern to one with numerous variations for all manner of
> > > different sorts of merge-like operations. As such, I do think we should
> > > introduce something to cover this pattern.
> > >
> > > I like that you used the word "merge" in your description of this as it
> > is
> > > the word I've liked using. You might want to give a look at my proposed
> > > merge() step from earlier in the year:
> > >
> > >
> > >
> >
> https://lists.apache.org/thread.html/r34ff112e18f4e763390303501fc07c82559d71667444339bde61053f%40%3Cdev.tinkerpop.apache.org%3E
> > >
> > > I'm just going to dump thoughts as they come regarding what you wrote:
> > >
> > > 1. How would multi/meta-properties fit into the API you've proposed?
> > > 2. How would users set the T.id on creation? would that T.id just be a
> > key
> > > in the first Map argument?
> > > 3. I do like the general idea of a match on multiple properties for the
> > > first argument as a convenience but wonder about the specificity of
> this
> > > API a bit as it focuses heavily on equality - I suppose that's most
> cases
> > > for get-or-create, so perhaps that's ok.
> > > 4. I think your suggestion points to one of the troubles Gremlin has
> > which
> > > we see with "algorithms" - extending the language with new steps that
> > > provides a form of "sugar" (e.g. in algorithms we end up with
> > > shortestPath() step) pollutes the core language a bit, hence my
> > > generalization of "merging" in my link above which fits into the core
> > > Gremlin language style. There is a bigger picture where we are missing
> > > something in Gremlin that lets us extend the language in ways that let
> us
> > > easily introduce new steps that aren't for general purpose. This issue
> is
> > > discussed in terms of "algorithms" here:
> > > https://issues.apache.org/jira/browse/TINKERPOP-1991 but I think we
> > could
> > > see how there might be some "mutation" extension steps that would cover
> > > your suggested API, plus batch operations, etc. We need a way to add
> > > "sugar" without it interfering with the consistency of the core.
> > Obviously
> > > this is a bigger issue but perhaps important to solve to implement
> steps
> > in
> > > the fashion you describe.
> > > 5. I suppose that the reason for mergeE and mergeV is to specify what
> > > element type the first Map argument should be applied to? what about
> > > mergeVP (i.e. vertex property as it too is an element) ? That's tricky
> > but
> > > I don't think we should miss that. Perhaps merge() could be a "complex
> > > modulator"?? that's a new concept of course, but you would do
> > g.V().merge()
> > > and the label and first Map would fold to VertexStartStep (i.e. V())
> for
> > > the lookup and then a MergeStep would follow - thus a "complex
> modulator"
> > > as it does more than just change the behavior of the previous step - it
> > > also adds its own. I suppose it could also add has() steps followed by
> > the
> > > MergeStep and then the has() operations would fold in normally as they
> do
> > > today. In this way, we can simplify to just one single
> > > merge(String,Map,Map). ??
> > > 6. One thing neither my approach nor yours seems to do is tell the user
> > if
> > > they created something or updated something - that's another thing I've
> > > seen users want to have in get-or-create. Here again we go deeper into
> a
> > > less general step specification as alluded to in 4, but a merge() step
> as
> > > proposed in 5, might return [Element,boolean] so as to provide an
> > indicator
> > > of creation?
> > > 7. You were just introducing your ideas here, so perhaps you haven't
> > gotten
> > > this far yet, but a shortcoming to doing merge(String,Map,Map) is that
> it
> > > leaves open no opportunity to stream a List of Maps to a merge() for a
> > form
> > > of batch loading which is mighty common and one of the variations of
> the
> > > coalesce() pattern that I alluded to at the start of all this. I think
> > that
> > > we would want to be sure that we left open the option to do that
> somehow.
> > > 8. If we had a general purpose merge() step I wonder if it makes
> > developing
> > > the API as you suggested easier to do?
> > >
> > > I think I'd like to solve the problems you describe in your post as
> well
> > as
> > > the ones in mine. There is some relation there, but gaps as well. With
> > more
> > > discussion here we can figure something out.
> > >
> > > Thanks for starting this talk - good one!
> > >
> > >
> > >
> > > On Wed, Sep 16, 2020 at 9:26 PM David Bechberger <[email protected]>
> > > wrote:
> > >
> > > > I've had a few on and off discussions with a few people here, so I
> > wanted
> > > > to send this out to everyone for feedback.
> > > >
> > > > What are people's thoughts on creating a new set of steps that codify
> > > > common Gremlin best practices?
> > > >
> > > > I think there are several common Gremlin patterns where users would
> > > benefit
> > > > from the additional guidance that these codified steps represent.
> The
> > > > first one I would recommend though is codifying the element existence
> > > > pattern into a single Gremlin step, something like:
> > > >
> > > > mergeV(String, Map, Map)
> > > >      String - The vertex label
> > > >      Map (first) - The properties to match existing vertices on
> > > >      Map (second) - Any additional properties to set if a new vertex
> is
> > > > created (optional)
> > > > mergeE(String, Map, Map)
> > > >      String - The edge label
> > > >      Map (first) - The properties to match existing edge on
> > > >      Map (second) - Any additional properties to set if a new edge is
> > > > created (optional)
> > > >
> > > > In each of these cases these steps would perform the same upsert
> > > > functionality as the element existence pattern.
> > > >
> > > > Example:
> > > >
> > > > g.V().has('person','name','stephen').
> > > >            fold().
> > > >            coalesce(unfold(),
> > > >                     addV('person').
> > > >                       property('name','stephen').
> > > >                       property('age',34))
> > > >
> > > > would become:
> > > >
> > > > g.mergeV('person', {'name': 'stephen'}, {'age', 34})
> > > >
> > > > I think that this change would be a good addition to the language for
> > > > several reasons:
> > > >
> > > > * This codifies the best practice for a specific action/recipe, which
> > > > reduces the chance that someone uses the pattern incorrectly
> > > > * Most complex Gremlin traversals are verbose.  Reducing the amount
> of
> > > code
> > > > that needs to be written and maintained allows for a better developer
> > > > experience.
> > > > * It will lower the bar of entry for a developer by making these
> > actions
> > > > more discoverable.  The more we can help bring these patterns to the
> > > > forefront of the language via these pattern/meta steps the more we
> > guide
> > > > users towards writing better Gremlin faster
> > > > * This allows DB vendors to optimize for this pattern
> > > >
> > > > I know that this would likely be the first step in Gremlin that
> > codifies
> > > a
> > > > pattern, so I'd like to get other's thoughts on this?
> > > >
> > > > Dave
> > > >
> > >
> >
>

Re: [DISCUSS] Creating pattern steps to codify best practices

Reply via email to