Re: [DISCUSS] Creating pattern steps to codify best practices

David Bechberger Fri, 11 Dec 2020 10:03:12 -0800

I agree with Josh's description of what the upsertV() functionality is
intended to be.


I also fully support the simplification provided by the by() modulation
that Stephen suggested, removing the second map.  I think that provides a
much cleaner and easier to comprehend syntax.

With that agreement, I think this does beg the question of if this should
be a new step (upsertV()) or just an additional signature on the addV()
step? I

Stephen alluded to this on the dev list and after thinking about this a bit
I think that I am favor of not adding a step and just adding a new
signature to the existing step (if possible).  Thoughts?


Dave

On Wed, Dec 9, 2020 at 4:33 AM Stephen Mallette <[email protected]>
wrote:

> Josh, thanks for your thoughts - some responses inline:
>
> On Tue, Dec 8, 2020 at 10:16 PM Josh Perryman <[email protected]>
> wrote:
>
> > I'll offer some thoughts. I'm seeing upsertV() as an idempotent
> getOrCreate
> > call which always returns a vertex with the label/property values
> specified
> > within the step. It's sort of a declarative pattern: "return this vertex
> to
> > me, find it if you can, create it if you must."
> >
>
> I like this description - I've added it to the gist, though it's a bit at
> odds with Dave's previous post, so we'll consider it a temporary addition
> until he responds.
>
>
> > On that account, I do like the simplification in 1. Repetition shouldn't
> be
> > necessary. In an ideal world, the engine should know the primary
> > identifiers (name or id) and find/create the vertex based on them. Any
> > other included values will be "trued up" as well. But this may be a
> bridge
> > too far for TinkerPop since knowing identifiers may require a specified
> > schema. I'd prefer to omit the third input, but it might be necessary to
> > keep it so that the second input can be for the matching use case.
> >
>
> In my most recent post on gremlin-users I think I came up with a nice way
> to get rid of the second Map. One Map that forms the full list of
> properties for upserting is easier than partitioning two Maps that
> essentially merge together. I imagine it's unlikely that application code
> will have that separation naturally so users will have the added step of
> trying to separate their data into searchable vs "just data". Getting us to
> one Map argument will simplify APIs for us and reduce complexity to users.
> Here is what I'd proposed for those not following over there:
>
> // match on name and age (or perhaps whatever the underlying graph system
> thinks is best?)
> g.upsertV('person', [name:'marko',age:29])
>
> // match on name only
> g.upsertV('person', [name:'marko',age:29]).by('name')
>
> // explicitly match on name and age
> g.upsertV('person', [name:'marko',age:29]).
>   by('name').by('age')
>
> // match on id only
> g.upsertV('person', [(T.id): 100, name:'marko',age:29]).by(T.id)
>
> // match on whatever the by(Traversal) predicate defines
> g.upsertV('person', [name:'marko',age:29]).
>   by(has('name', 'marko'))
>
> // match on id, then update age
> g.upsertV('person', [(T.id): 100, name:'marko']).by(T.id).
>   property('age',29)
>
> With this model, we get one Map argument that represents the complete
> property set to be added/updated to the graph and the user can hint on what
> key they wish to match on using by() where that sort of step modulation
> should be a well understood and familiar concept in Gremlin at this point.
>
> So that means I think 2 should always match or update the additional
> > values. Again, we're specifying the expected result and letting the
> engine
> > figure out best how to return that results and appropriately maintain
> > state.
> >
>
> I again like this description, but we'll see what Dave's thoughts are since
> he's a bit behind on the threads at this point I think.
>
>
> > I'm also presuming that anything not included as inputs to the upsertV()
> > step are then to be handled by following steps. I'm hoping that is a
> > sufficient approach for addressing the multi/meta property use cases
> > brought up in 3.
> >
>
> yeah................it needs more thought. I spent more time thinking on
> this issue yesterday than I have for all the previous posts combined and I
> think it yielded something good in that revised syntax. It's going to take
> more of that kind of elbow grease to dig into these lesser use cases to
> make sure we aren't coding ourselves into corners.
>
>
> > I do like the idea of using modulators (with(), by()) for more
> > sophisticated usage and advanced use cases. Also, the streaming examples
> > are quite elegant allowing for a helpful separation of data and logic.
> >
>
> cool - hope you like the revised syntax I posted then. :)
>
>
> > That's my humble take. This is a very welcome addition to the language
> and
> > I appreciate the thoughtful & collaborative approach to the design
> > considerations.
> >
>
> Thanks again and please keep the thoughts coming. Lots of other interesting
> design discussions seem to be brewing.
>
>
> >
> > Josh
> >
> > On Tue, Dec 8, 2020 at 8:57 AM Stephen Mallette <[email protected]>
> > wrote:
> >
> > > I started a expanded this discussion to gremlin-users for a wider
> > audience
> > > and the thread is starting to grow:
> > >
> > > https://groups.google.com/g/gremlin-users/c/QBmiOUkA0iI/m/pj5Ukiq6AAAJ
> > >
> > > I guess we'll need to summarize that discussion back here now....
> > >
> > > I did have some more thoughts to hang out there and figured that I
> > wouldn't
> > > convolute the discussion on gremlin-users with it so I will continue
> the
> > > discussion here.
> > >
> > > 1, The very first couple of examples seem wrong (or at least not best
> > > demonstrating the usage):
> > >
> > > g.upsertV('person', [name: 'marko'],
> > >                     [name: 'marko', age: 29])
> > > g.upsertV('person', [(T.id): 1],
> > >                     [(T.id): 1, name: 'Marko'])
> > >
> > > should instead be:
> > >
> > > g.upsertV('person', [name: 'marko'],
> > >                     [age: 29])
> > > g.upsertV('person', [(T.id): 1],
> > >                     [name: 'Marko'])
> > >
> > > 2. I can't recall if we made this clear anywhere but in situations
> where
> > we
> > > "get" rather than "create" do the additional properties act in an
> update
> > > fashion to the element that was found? I think I've been working on the
> > > assumption that it would, though perhaps that is not always desirable?
> > >
> > > 3. We really never settled up how to deal with multi/meta-properties.
> > That
> > > story should be clear so that when we document upsert() we include the
> > > approaches for the fallback patterns that don't meet the 90% of use
> cases
> > > we are targeting and I sense that we're saying that
> meta/multi-properties
> > > don't fit in that bucket. So with that in mind, I don't think that the
> > > following works for metaproperties:
> > >
> > > g.upsertV('person', [(T.id): 1],
> > >                     [name:[acl: 'public'])
> > >
> > > as it doesn't let us set the value for the "name" just the pairs for
> the
> > > meta-properties. I guess a user would have to fall back to:
> > >
> > > g.upsertV('person', [(T.id): 1]).
> > >   property('name','marko','acl','public')
> > >
> > > // or use the additional properties syntax
> > > g.upsertV('person', [(T.id): 1],[name:'marko']).
> > >   properties('name').property('acl','public')
> > >
> > > // or if there were multi-properties then maybe...
> > > g.upsertV('person', [(T.id): 1],[name:'marko']).
> > >   properties('name').hasValue('marko').property('acl','public')
> > >
> > > As for multi-properties, I dont think we should assume that a List
> object
> > > should be interpreted by Gremlin as a multi-property. Perhaps we just
> > rely
> > > on the underlying graph to properly deal with that given a schema or
> the
> > > user falls back to:
> > >
> > > // this ends up as however the graph deals with a List object
> > > g.upsertV('person', [(T.id): 1], [lang: ['java', 'scala', 'java'])
> > >
> > > // this is explicit
> > > g.upsertV('person', [(T.id): 1]).
> > >   property(list, 'lang', 'java').
> > >   property(list, 'lang, 'scala').
> > >   property(list, 'lang', 'java')
> > >
> > > If that makes sense to everyone, I will update the gist.
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > On Tue, Oct 27, 2020 at 11:51 PM David Bechberger <[email protected]
> >
> > > wrote:
> > >
> > > > Hello Stephen,
> > > >
> > > > Thanks for making that gist, its much easier to follow along with the
> > > > proposed syntax there.  To your specific comments:
> > > >
> > > > #1 - My only worry with the term upsert is that it is not as widely a
> > > used
> > > > term for this sort of pattern as "Merge" (e.g. SQL, Cypher).
> However I
> > > > don't have a strong opinion on this, so I am fine with either.
> > > > #2 - My only real objective here is to make sure that we make the
> > 80-90%
> > > > case easy and straightforward.  I think that having the fallback
> option
> > > of
> > > > using the current syntax for any complicated edge cases should be
> > > > considered here as well. I'd appreciate your thoughts here as these
> are
> > > > good points you bring up that definitely fall into the 80-90% use
> case.
> > > > #3 - Those points make sense to me, not sure I have anything further
> to
> > > add
> > > > #4 - I don't think I would expect to have the values extracted from
> the
> > > > traversal filters but I'd be interested in other opinions on that.
> > > >
> > > > Thanks,
> > > > Dave
> > > >
> > > > On Thu, Oct 15, 2020 at 5:30 AM Stephen Mallette <
> [email protected]
> > >
> > > > wrote:
> > > >
> > > > > It's been a couple weeks since we moved this thread so I went back
> > > > through
> > > > > it and refreshed my mind with what's there. Dave, nice job putting
> > all
> > > > > those examples together. I've recollected them all together and
> have
> > > just
> > > > > been reviewing them in a more formatted style with proper Groovy
> > syntax
> > > > > than what can be accomplished in this mailing list. Anyone who
> cares
> > to
> > > > > review in that form can do so here (i've pasted its markdown
> contents
> > > at
> > > > > the bottom of this post as well):
> > > > >
> > > > >
> https://gist.github.com/spmallette/5cd448f38d5dae832c67d890b576df31
> > > > >
> > > > > In this light, I have the following comments and thoughts:
> > > > >
> > > > > 1. I feel more inclined toward saving "merge" for a more
> generalized
> > > step
> > > > > of that name and have thus renamed the examples to use "upsert".
> > > > >
> > > > > 2. I still don't quite feel comfortable with meta/multi-properties
> > > > > examples. Perhaps you could explain further as it's possible I'm
> not
> > > > > thinking things through properly. For meta-properties the example
> is:
> > > > >
> > > > > g.upsertV('person', [(T.id): 1],
> > > > >                     [(T.id): 1, name:[first: Marko', last: 'R'])
> > > > >
> > > > > So I see that "name" takes a Map here, but how does Gremlin know if
> > > that
> > > > > means "meta-property" or "Map value" and if the former, then how do
> > you
> > > > set
> > > > > the value of "name"?  My question is similar to the multi-property
> > > > examples
> > > > > - using this one:
> > > > >
> > > > > g.upsertV('person', [(T.id): 1],
> > > > >                     [(T.id): 1,lang: ['java', 'scala'])
> > > > >
> > > > > How does Gremlin know if the user means that "lang" should be a
> > > > > "multi-property" with Cardinality.list (or Cardinality.set for that
> > > > matter)
> > > > > or a value of Cardinality.single with a List/Array value? I can
> > perhaps
> > > > > suggest some solutions here but wanted to make sure I wasn't just
> > > > > misunderstanding something.
> > > > >
> > > > > 3. The "upsert with stream" example is:
> > > > >
> > > > > g.inject([(id): 1, (label): 'person', name: 'marko'],
> > > > >          [(id): 2, (label): 'person', name: 'josh']).
> > > > >   upsertV(values(label), valueMap(id), valueMap())
> > > > >
> > > > > That would establish an API of upsertV(Traversal, Traversal,
> > Traversal)
> > > > > which would enable your final thought of your last post with:
> > > > >
> > > > > g.V().upsertV('person', __.has('person','age',gt(29)).has('state',
> > > 'NY'),
> > > > >                         [active: true])
> > > > >
> > > > > The open-ended nature of that is neat but comes with some
> complexity
> > > with
> > > > > steps that traditionally take an "open" Traversal (and this step
> now
> > > has
> > > > up
> > > > > to three of them) - perhaps, that's our own fault in some ways. For
> > the
> > > > > "stream" concept, values()/valueMap() won't immediately work that
> way
> > > as
> > > > > they only apply to Element...given today's Gremlin semantics, I
> think
> > > > we'd
> > > > > re-write that as:
> > > > >
> > > > > g.inject([(id): 1, (label): 'person', name: 'marko'],
> > > > >          [(id): 2, (label): 'person', name: 'josh']).
> > > > >   upsertV(select(label), select(id), select(id,label,'name'))
> > > > >
> > > > > Though, even that doesn't quite work because you can't select(T)
> > right
> > > > now
> > > > > which means you'd need to go with:
> > > > >
> > > > > g.inject([id: 1, label: 'person', name: 'marko'],
> > > > >          [id: 2, label: 'person', name: 'josh']).
> > > > >   upsertV(select('label'), select('id'),
> select('id','label','name'))
> > > > >
> > > > > not terrible in a way and perhaps we could fix select(T) - can't
> > > remember
> > > > > why that isn't allowed except that select() operates over multiple
> > > scopes
> > > > > and perhaps trying T through those scopes causes trouble.
> > > > >
> > > > > 4. Continuing with the use of Traversal as arguments, let's go back
> > to:
> > > > >
> > > > > g.V().upsertV('person', __.has('person','age',gt(29)).has('state',
> > > 'NY'),
> > > > >                         [active: true])
> > > > >
> > > > > I originally had it in my mind to prefer this approach over a Map
> for
> > > the
> > > > > search as it provides a great deal of flexibility and fits the
> common
> > > > > filtering patterns that users follow really well. I suppose the
> > > question
> > > > is
> > > > > what to do in the situation of "create". Would we extract all the
> > > strict
> > > > > equality filter values from the initial has(), thus, in the
> example,
> > > > > "state" but not "age", and create a vertex that has "state=NY,
> > > > > active=true"? That is possible with how things are designed in
> > Gremlin
> > > > > today I think.
> > > > >
> > > > > This syntax does create some conflict with the subject of 3 and
> > streams
> > > > > because the traverser in the streams case is the incoming Map but
> for
> > > > this
> > > > > context it's meant as a filter of V(). Getting too whacky?
> > > > >
> > > > > 5. I like the idea of using a side-effect to capture whether the
> > > element
> > > > > was created or not. that makes sense. I think that can work.
> > > > >
> > > > > ==============================
> > > > > = gist contents
> > > > > ==============================
> > > > >
> > > > > # API
> > > > >
> > > > > ```java
> > > > > upsertV(String label, Map matchOrCreateProperties)
> > > > > upsertV(String label, Map matchOrCreateProperties, Map
> > > > > additionalProperties)
> > > > > upsertV(Traversal label, Traversal matchOrCreateProperties)
> > > > > upsertV(Traversal label, Traversal matchOrCreateProperties,
> Traversal
> > > > > additionalProperties)
> > > > > upsertV(...).
> > > > >   with(WithOptions.sideEffectLabel, 'a')
> > > > >
> > > > > upsertE(String label, Map matchOrCreateProperties)
> > > > > upsertE(Traversal label, Traversal matchOrCreateProperties)
> > > > > upsertE(String label, Map matchOrCreateProperties, Map
> > > > > additionalProperties)
> > > > > upsertE(Traversal label, Traversal matchOrCreateProperties,
> Traversal
> > > > > additionalProperties)
> > > > > upsertE(...).
> > > > >   from(Vertex incidentOut).to(Vertex incidentIn)
> > > > >   with(WithOptions.sideEffectLabel, 'a')
> > > > > ```
> > > > >
> > > > > # Examples
> > > > >
> > > > > ## upsert
> > > > >
> > > > > ```groovy
> > > > > g.upsertV('person', [name: 'marko'],
> > > > >                     [name: 'marko', age: 29])
> > > > > ```
> > > > >
> > > > > ## upsert with id
> > > > >
> > > > > ```groovy
> > > > > g.upsertV('person', [(T.id): 1],
> > > > >                     [(T.id): 1, name: 'Marko'])
> > > > > ```
> > > > >
> > > > > ## upsert with Meta Properties
> > > > >
> > > > > ```groovy
> > > > > g.upsertV('person', [(T.id): 1],
> > > > >                     [(T.id): 1, name:[first: Marko', last: 'R'])
> > > > > ```
> > > > >
> > > > > ## upsert with Multi Properties
> > > > >
> > > > > ```groovy
> > > > > g.upsertV('person', [(T.id): 1],
> > > > >                     [(T.id): 1,lang: ['java', 'scala'])
> > > > > ```
> > > > >
> > > > > ## upsert with Single Cardinality
> > > > >
> > > > > ```groovy
> > > > > g.upsertV('person', [(T.id): 1],
> > > > >                     [(T.id): 1, lang: 'java')
> > > > > ```
> > > > >
> > > > > ## upsert with List Cardinality
> > > > >
> > > > > ```groovy
> > > > > g.upsertV('person', [(T.id): 1],
> > > > >                     [(T.id): 1, lang: ['java', 'scala', 'java'])
> > > > > ```
> > > > >
> > > > > ## upsert with Set Cardinality
> > > > >
> > > > > ```groovy
> > > > > g.upsertV('person', [(T.id): 1],
> > > > >                     [(T.id): 1, lang: ['java','scala'])
> > > > > ```
> > > > >
> > > > > ## upsert with List Cardinality - and add value to list
> > > > >
> > > > > ```groovy
> > > > > g.upsertV('person', [(T.id): 1],
> > > > >                     [(T.id): 1, lang: ['java','scala','java']).
> > > > >   property(Cardinality.list, 'lang', 'java')
> > > > > ```
> > > > >
> > > > > ## upsert with stream
> > > > >
> > > > > ```groovy
> > > > > // doesn't work with today's Gremlin semantics
> > > > > g.inject([(id): 1, (label): 'person', name: 'marko'],
> > > > >          [(id): 2, (label): 'person', name: 'josh']).
> > > > >   upsertV(values(label), valueMap(id), valueMap())
> > > > >
> > > > > // the following would be more in line with what's possible with
> > > existing
> > > > > Gremlin semantics:
> > > > > g.inject([id: 1, label: 'person', name: 'marko'],
> > > > >          [id: 2, label: 'person', name: 'josh']).
> > > > >   upsertV(select('label'), select('id'),
> select('id','label','name'))
> > > > > ```
> > > > >
> > > > > ## upsert with reporting added or updated side effect
> > > > >
> > > > > ```groovy
> > > > > g.upsertV('person', [name: 'marko'],
> > > > >                     [name: 'marko', age: 29]).
> > > > >     with(WithOptions.sideEffectLabel, 'a').
> > > > >   project('vertex', 'added').
> > > > >     by().
> > > > >     by(cap('a'))
> > > > > ```
> > > > >
> > > > > ## upsert Edge assuming a self-relation of "marko"
> > > > >
> > > > > ```groovy
> > > > > g.V().has('person','name','marko').
> > > > >   upsertE( 'self', [weight:0.5])
> > > > > ```
> > > > >
> > > > > ## upsert Edge with incident vertex checks
> > > > >
> > > > > ```groovy
> > > > > g.upsertE('knows', [weight:0.5]).
> > > > >    from(V(1)).to(V(2))
> > > > > g.V().has('person','name','marko').
> > > > >   upsertE( 'knows', [weight:0.5]).
> > > > >     to(V(2))
> > > > > ```
> > > > >
> > > > > ## upsert using a Traversal for the match
> > > > >
> > > > > ```groovy
> > > > > g.V().upsertV('person', __.has('person','age',gt(29)).has('state',
> > > 'NY'),
> > > > >                         [active: true])
> > > > > ```
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Mon, Sep 28, 2020 at 6:49 PM David Bechberger <
> > [email protected]>
> > > > > wrote:
> > > > >
> > > > > > So, I've been doing some additional thinking about the ways that
> > this
> > > > > could
> > > > > > work based on the comments below and have put my comments inline.
> > > > > >
> > > > > > Dave
> > > > > >
> > > > > > On Tue, Sep 22, 2020 at 6:05 AM Stephen Mallette <
> > > [email protected]
> > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > I added some thoughts inline below:
> > > > > > >
> > > > > > > On Fri, Sep 18, 2020 at 3:51 PM David Bechberger <
> > > > [email protected]>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Thanks for the detailed comments Stephen.  I have addressed
> > them
> > > > > inline
> > > > > > > > below.
> > > > > > > >
> > > > > > > > I did read the proposal from earlier and I think that we are
> in
> > > > close
> > > > > > > > agreement with what we are trying to accomplish.  I also
> fully
> > > > > support
> > > > > > > > Josh's comment on providing a mechanism for submitting a map
> of
> > > > > > > properties
> > > > > > > > as manually unrolling this all right now leads to a lot of
> > > > potential
> > > > > > for
> > > > > > > > error and a long messy traversal.
> > > > > > > >
> > > > > > > > I'm looking forward to this discussion on how to merge these
> > two
> > > > > > > proposals.
> > > > > > > >
> > > > > > > >
> > > > > > > > 1. How would multi/meta-properties fit into the API you've
> > > > proposed?
> > > > > > > >
> > > > > > > > My first thought here is that multi-properties would be
> > > represented
> > > > > as
> > > > > > > > lists in the map, e.g.
> > > > > > > >
> > > > > > > > {names: ['Dave', 'David']}
> > > > > > > >
> > > > > > > > and meta-properties would be represented as maps in the maps,
> > > e.g.
> > > > > > > >
> > > > > > > > {name: {first: 'Dave', last: 'Bechberger'}}
> > > > > > > >
> > > > > > > > I can't say I've thought through all the implications of this
> > > > though
> > > > > so
> > > > > > > it
> > > > > > > > is an area we would need to explore.
> > > > > > > >
> > > > > > >
> > > > > > > The first implication that comes to mind is that it makes the
> > > > > assumption
> > > > > > > the user wants multi/meta-properties as opposed to a single
> > > > cardinality
> > > > > > of
> > > > > > > List in the first case and a Map as a property value in the
> > second
> > > > > case.
> > > > > > I
> > > > > > > suppose that graphs with a schema could resolve those
> assumptions
> > > but
> > > > > > > graphs that are schemaless would have a problem. The issue
> could
> > be
> > > > > > > resolved by specialized configuration of "g" or per merge()
> step
> > > > using
> > > > > a
> > > > > > > with() modulator I suppose but that goes into a yet another
> level
> > > of
> > > > > > > implications to consider. I've often wondered if the start
> point
> > > for
> > > > > > > getting types/schema into TP3 without a full rewrite would be
> in
> > > this
> > > > > > form
> > > > > > > where Gremlin would be given hints as to what to expect as to
> the
> > > > types
> > > > > > of
> > > > > > > data it might encounter while traversing. Anyway, I'd be
> hesitant
> > > to
> > > > go
> > > > > > > down paths that don't account for multi/metaproperties well.
> > They
> > > > are
> > > > > > > first class citizens in TP3 (with those hoping for extension of
> > at
> > > > > least
> > > > > > > multiproperties to edges) and while I find them a constant
> > > annoyance
> > > > > for
> > > > > > so
> > > > > > > many reasons, we're kinda stuck with them.
> > > > > > >
> > > > > >
> > > > > > I agree we need to account for multi/meta properties as 1st class
> > > > > > citizens.  This is my current thinking on the syntax for the
> > > situations
> > > > > we
> > > > > > have laid out so far:
> > > > > >
> > > > > > *Merge*
> > > > > > g.mergeV('name', {'name': 'marko'}, {'name': 'marko', 'age': 29})
> > > > > >
> > > > > >
> > > > > > *Merge with id* g.mergeV('name', {T.id: 1}, {T.id: 1, 'name':
> > > 'Marko'})
> > > > > >
> > > > > >
> > > > > > *Merge with Meta Properties* g.mergeV('name', {T.id: 1}, {T.id:
> 1,
> > > > > 'name':
> > > > > > {'first': Marko', 'last': 'R'})
> > > > > >
> > > > > >
> > > > > > *Merge with Multi Properties * g.mergeV('name', {T.id: 1}, {T.id:
> > 1,
> > > > > > 'lang': ['java', 'scala'])
> > > > > >
> > > > > >
> > > > > > *Merge with Single Cardinality* g.mergeV('name', {T.id: 1},
> {T.id:
> > 1,
> > > > > > 'lang': 'java')
> > > > > >
> > > > > >
> > > > > > *Merge with List Cardinality* g.mergeV('name', {T.id: 1}, {T.id:
> 1,
> > > > > 'lang':
> > > > > > ['java', 'scala', 'java'])
> > > > > >
> > > > > >
> > > > > > *Merge with Set Cardinality * g.mergeV('name', {T.id: 1}, {T.id:
> 1,
> > > > > 'lang':
> > > > > > ['java', 'scala'])
> > > > > >
> > > > > > Since in a mergeV() scenario we are only ever adding whatever
> > values
> > > > are
> > > > > > passed in there would be no need to specify the cardinality of
> the
> > > > > property
> > > > > > being added.  If they wanted to add a value to an existing
> property
> > > > then
> > > > > > the current property() method would still be available on the
> > output
> > > > > > traversal. e.g.
> > > > > > g.mergeV('name', {T.id: 1}, {T.id: 1, 'lang': ['java', 'scala',
> > > > > > 'java']).property(Cardinality.list, 'lang, 'java')
> > > > > >
> > > > > >
> > > > > > *Merge with stream * g.inject([{id: 1, label: 'person', 'name':
> > > > 'marko'},
> > > > > > {id: 2, label: 'person', 'name': 'josh'}]).
> > > > > >   mergeV(values('label'), valueMap('id'), valueMap())
> > > > > >
> > > > > >
> > > > > > *Merge with reporting added or updated side effect*
> > g.mergeV('name',
> > > > > > {'name': 'marko'}, {'name': 'marko', 'age': 29}).
> > > > > >   with(WithOptions.sideEffectLabel, 'a').
> > > > > >   project('vertex', 'added').
> > > > > >     by().
> > > > > >     by(cap('a'))
> > > > > >
> > > > > >
> > > > > >
> > > > > > >
> > > > > > >
> > > > > > > > 2. How would users set the T.id on creation? would that T.id
> > just
> > > > be
> > > > > a
> > > > > > > key
> > > > > > > > in the first Map argument?
> > > > > > > >
> > > > > > > > Yes, I was thinking that T.id would be the key name, e.g.:
> > > > > > > >
> > > > > > > > g.mergeV('name', {T.id: 1}, {'name': 'Marko'})
> > > > > > > >
> > > > > > >
> > > > > > > ok - I can't say I see a problem with that atm.
> > > > > > >
> > > > > > >
> > > > > > > > 3. I do like the general idea of a match on multiple
> properties
> > > for
> > > > > the
> > > > > > > > first argument as a convenience but wonder about the
> > specificity
> > > of
> > > > > > this
> > > > > > > > API a bit as it focuses heavily on equality - I suppose
> that's
> > > most
> > > > > > cases
> > > > > > > > for get-or-create, so perhaps that's ok.
> > > > > > > >
> > > > > > > > In most cases I've seen use exact matches on the vertex or
> > > edge.  I
> > > > > > think
> > > > > > > > it might be best to keep this straightforward as any complex
> > edge
> > > > > cases
> > > > > > > > still can perform the same functionality using the coalesce()
> > > > > pattern.
> > > > > > > >
> > > > > > >
> > > > > > > I played around with the idea to use modulators to apply
> > additional
> > > > > > > constraints:
> > > > > > >
> > > > > > > g.mergeV('person', {'state':'NY'}, {'active': true}).
> > > > > > >     by(has('age', gt(29))
> > > > > > >
> > > > > > > I suppose that's neat but maybe not...we could easily get the
> > same
> > > > > thing
> > > > > > > from:
> > > > > > >
> > > > > > > g.V().has('person','age',gt(29)).
> > > > > > >   mergeV('person', {'state':'NY'}, {'active': true}).
> > > > > > >
> > > > > > > which is more readable though it does make it harder for
> > providers
> > > to
> > > > > > > optimize upsert operations if they could make use of the has()
> as
> > > > part
> > > > > of
> > > > > > > that. I think I like has() as part of the main query rather
> than
> > > in a
> > > > > > by()
> > > > > > > - just thinking out loud.
> > > > > > >
> > > > > > >
> > > > > > Another potential option here would be to allow the second
> > parameter
> > > of
> > > > > > mergeV() accept a traversal.  Not sure how much complexity that
> > would
> > > > add
> > > > > > though
> > > > > >
> > > > > > g.V().mergeV('person', __.has('person','age',gt(29)).has('state',
> > > > 'NY'}.,
> > > > > > {'active': true}).
> > > > > >
> > > > > >
> > > > > > >
> > > > > > > > 4. I think your suggestion points to one of the troubles
> > Gremlin
> > > > has
> > > > > > > which
> > > > > > > > we see with "algorithms" - extending the language with new
> > steps
> > > > that
> > > > > > > > provides a form of "sugar" (e.g. in algorithms we end up with
> > > > > > > > shortestPath() step) pollutes the core language a bit, hence
> my
> > > > > > > > generalization of "merging" in my link above which fits into
> > the
> > > > core
> > > > > > > > Gremlin language style. There is a bigger picture where we
> are
> > > > > missing
> > > > > > > > something in Gremlin that lets us extend the language in ways
> > > that
> > > > > let
> > > > > > us
> > > > > > > > easily introduce new steps that aren't for general purpose.
> > This
> > > > > issue
> > > > > > is
> > > > > > > > discussed in terms of "algorithms" here:
> > > > > > > > https://issues.apache.org/jira/browse/TINKERPOP-1991 but I
> > think
> > > > we
> > > > > > > could
> > > > > > > > see how there might be some "mutation" extension steps that
> > would
> > > > > cover
> > > > > > > > your suggested API, plus batch operations, etc. We need a way
> > to
> > > > add
> > > > > > > > "sugar" without it interfering with the consistency of the
> > core.
> > > > > > > Obviously
> > > > > > > > this is a bigger issue but perhaps important to solve to
> > > implement
> > > > > > steps
> > > > > > > in
> > > > > > > > the fashion you describe.
> > > > > > > >
> > > > > > > > I agree that the "algorithm" steps do seem a bit odd in the
> > core
> > > > > > language
> > > > > > > > but I think the need is there.  I'd be interested in
> furthering
> > > > this
> > > > > > > > discussion but I think these "pattern" steps may or may not
> be
> > > the
> > > > > same
> > > > > > > as
> > > > > > > > the algorithm steps.  I'll have to think on that.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > > 5. I suppose that the reason for mergeE and mergeV is to
> > specify
> > > > what
> > > > > > > > element type the first Map argument should be applied to?
> what
> > > > about
> > > > > > > > mergeVP (i.e. vertex property as it too is an element) ?
> That's
> > > > > tricky
> > > > > > > but
> > > > > > > > I don't think we should miss that. Perhaps merge() could be a
> > > > > "complex
> > > > > > > > modulator"?? that's a new concept of course, but you would do
> > > > > > > g.V().merge()
> > > > > > > > and the label and first Map would fold to VertexStartStep
> (i.e.
> > > > V())
> > > > > > for
> > > > > > > > the lookup and then a MergeStep would follow - thus a
> "complex
> > > > > > modulator"
> > > > > > > > as it does more than just change the behavior of the previous
> > > step
> > > > -
> > > > > it
> > > > > > > > also adds its own. I suppose it could also add has() steps
> > > followed
> > > > > by
> > > > > > > the
> > > > > > > > MergeStep and then the has() operations would fold in
> normally
> > as
> > > > > they
> > > > > > do
> > > > > > > > today. In this way, we can simplify to just one single
> > > > > > > > merge(String,Map,Map). ??
> > > > > > > >
> > > > > > > > I agree that we should also think about how to include
> > properties
> > > > in
> > > > > > this
> > > > > > > > merge construct.  The reason I was thinking about mergeV()
> and
> > > > > mergeE()
> > > > > > > is
> > > > > > > > that it follows the same pattern as the already well
> understood
> > > > > > > > addV()/addE() steps.  I am a bit wary of trying to generalize
> > > this
> > > > > down
> > > > > > > to
> > > > > > > > a single merge() step as these sorts of complex overloads
> make
> > it
> > > > > hard
> > > > > > to
> > > > > > > > figure out which Gremlin step you should use for a particular
> > > > pattern
> > > > > > > (e.g.
> > > > > > > > upsert vertex or upsert edge).  One thing that I think
> > customers
> > > > find
> > > > > > > > useful about the addV/addE step is that they are very
> > > discoverable,
> > > > > the
> > > > > > > > name tells me what functionality to expect from that step.
> > > > > > > >
> > > > > > >
> > > > > > > I'm starting to agree with the idea that we have to do
> something
> > > like
> > > > > > > mergeV()/E() as it wouldn't quite work as a start step
> otherwise.
> > > We
> > > > > > > couldn't do:
> > > > > > >
> > > > > > > g.merge()
> > > > > > >
> > > > > > > as we wouldnt know what sort of Element it applied to. If so, I
> > > > wonder
> > > > > if
> > > > > > > it would be better to preserve "merge" for my more general
> > use-case
> > > > and
> > > > > > > prefer upsertV()/E()?
> > > > > > >
> > > > > > > Also, perhaps mergeVP() doesn't need to exist as perhaps it is
> an
> > > > > > uncommon
> > > > > > > case (compared to V/E()) and we don't really have addVP() as an
> > > > > analogous
> > > > > > > step. Perhaps existing coalesce() patterns and/or my more
> general
> > > > > purpose
> > > > > > > merge() step would satisfy those situations??
> > > > > > >
> > > > > >
> > > > > > I think that a mergeVP() type step may require a different
> pattern
> > > > since
> > > > > it
> > > > > > would really need to accept a map of values otherwise it would
> > > > > essentially
> > > > > > perform the same function as property()
> > > > > >
> > > > > >
> > > > > > >
> > > > > > >
> > > > > > > > 6. One thing neither my approach nor yours seems to do is
> tell
> > > the
> > > > > user
> > > > > > > if
> > > > > > > > they created something or updated something - that's another
> > > thing
> > > > > I've
> > > > > > > > seen users want to have in get-or-create. Here again we go
> > deeper
> > > > > into
> > > > > > a
> > > > > > > > less general step specification as alluded to in 4, but a
> > merge()
> > > > > step
> > > > > > as
> > > > > > > > proposed in 5, might return [Element,boolean] so as to
> provide
> > an
> > > > > > > indicator
> > > > > > > > of creation?
> > > > > > > >
> > > > > > > > Hmm, I'll have to think about that.  How would returning
> > multiple
> > > > > > values
> > > > > > > > work if we want to chain these together. e.g. Add a vertex
> > > between
> > > > > two
> > > > > > > > edges but make sure the vertices exist?
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > > Added a thought on how to accomplish this above that I'd like to
> > get
> > > > > > thoughts on.
> > > > > >
> > > > > >
> > > > > > > >
> > > > > > > > 7. You were just introducing your ideas here, so perhaps you
> > > > haven't
> > > > > > > gotten
> > > > > > > > this far yet, but a shortcoming to doing
> merge(String,Map,Map)
> > is
> > > > > that
> > > > > > it
> > > > > > > > leaves open no opportunity to stream a List of Maps to a
> > merge()
> > > > for
> > > > > a
> > > > > > > form
> > > > > > > > of batch loading which is mighty common and one of the
> > variations
> > > > of
> > > > > > the
> > > > > > > > coalesce() pattern that I alluded to at the start of all
> this.
> > I
> > > > > think
> > > > > > > that
> > > > > > > > we would want to be sure that we left open the option to do
> > that
> > > > > > somehow.
> > > > > > > > 8. If we had a general purpose merge() step I wonder if it
> > makes
> > > > > > > developing
> > > > > > > > the API as you suggested easier to do?
> > > > > > > >
> > > > > > > > Hmm, let me think about that one.
> > > > > > > >
> > > > > > > >
> > > > > > > I will add an item 8 to think about which I didn't mention
> > before:
> > > > > > >
> > > > > > > 8. The signature you suggested for mergeE() is:
> > > > > > >
> > > > > > > mergeE(String, Map, Map)
> > > > > > >      String - The edge label
> > > > > > >      Map (first) - The properties to match existing edge on
> > > > > > >      Map (second) - Any additional properties to set if a new
> > edge
> > > is
> > > > > > > created (optional)
> > > > > > >
> > > > > > > but does that exactly answer the need? Typically the idea is to
> > > > detect
> > > > > an
> > > > > > > edge between two vertices not globally in the graph by way of
> > some
> > > > > > > properties. This signature doesn't seem to allow for that as it
> > > > doesn't
> > > > > > > allow specification of the vertices to test against. Perhaps
> the
> > > > answer
> > > > > > is
> > > > > > > to use from() and to() modulators?
> > > > > > >
> > > > > > > g.mergeE('knows', {'weight':0.5}).
> > > > > > >   from(V(1)).to(V(2))
> > > > > > > g.V().has('person','name','marko').
> > > > > > >   mergeE( 'knows', {'weight':0.5}).
> > > > > > >     to(V(2))
> > > > > > > g.V().has('person','name','marko').
> > > > > > >   mergeE( 'self', {'weight':0.5})
> > > > > > >
> > > > > > > That seems to work?
> > > > > > >
> > > > > >
> > > > > > I think that is an elegant solution to that problem.  I like it
> and
> > > it
> > > > > > keeps in line with the way that addE() works.
> > > > > >
> > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > > On Thu, Sep 17, 2020 at 4:31 AM Stephen Mallette <
> > > > > [email protected]
> > > > > > >
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > I like our coalesce() pattern but it is verbose and over
> time
> > > it
> > > > > has
> > > > > > > gone
> > > > > > > > > from a simple pattern to one with numerous variations for
> all
> > > > > manner
> > > > > > of
> > > > > > > > > different sorts of merge-like operations. As such, I do
> think
> > > we
> > > > > > should
> > > > > > > > > introduce something to cover this pattern.
> > > > > > > > >
> > > > > > > > > I like that you used the word "merge" in your description
> of
> > > this
> > > > > as
> > > > > > it
> > > > > > > > is
> > > > > > > > > the word I've liked using. You might want to give a look at
> > my
> > > > > > proposed
> > > > > > > > > merge() step from earlier in the year:
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://lists.apache.org/thread.html/r34ff112e18f4e763390303501fc07c82559d71667444339bde61053f%40%3Cdev.tinkerpop.apache.org%3E
> > > > > > > > >
> > > > > > > > > I'm just going to dump thoughts as they come regarding what
> > you
> > > > > > wrote:
> > > > > > > > >
> > > > > > > > > 1. How would multi/meta-properties fit into the API you've
> > > > > proposed?
> > > > > > > > > 2. How would users set the T.id on creation? would that
> T.id
> > > just
> > > > > be
> > > > > > a
> > > > > > > > key
> > > > > > > > > in the first Map argument?
> > > > > > > > > 3. I do like the general idea of a match on multiple
> > properties
> > > > for
> > > > > > the
> > > > > > > > > first argument as a convenience but wonder about the
> > > specificity
> > > > of
> > > > > > > this
> > > > > > > > > API a bit as it focuses heavily on equality - I suppose
> > that's
> > > > most
> > > > > > > cases
> > > > > > > > > for get-or-create, so perhaps that's ok.
> > > > > > > > > 4. I think your suggestion points to one of the troubles
> > > Gremlin
> > > > > has
> > > > > > > > which
> > > > > > > > > we see with "algorithms" - extending the language with new
> > > steps
> > > > > that
> > > > > > > > > provides a form of "sugar" (e.g. in algorithms we end up
> with
> > > > > > > > > shortestPath() step) pollutes the core language a bit,
> hence
> > my
> > > > > > > > > generalization of "merging" in my link above which fits
> into
> > > the
> > > > > core
> > > > > > > > > Gremlin language style. There is a bigger picture where we
> > are
> > > > > > missing
> > > > > > > > > something in Gremlin that lets us extend the language in
> ways
> > > > that
> > > > > > let
> > > > > > > us
> > > > > > > > > easily introduce new steps that aren't for general purpose.
> > > This
> > > > > > issue
> > > > > > > is
> > > > > > > > > discussed in terms of "algorithms" here:
> > > > > > > > > https://issues.apache.org/jira/browse/TINKERPOP-1991 but I
> > > think
> > > > > we
> > > > > > > > could
> > > > > > > > > see how there might be some "mutation" extension steps that
> > > would
> > > > > > cover
> > > > > > > > > your suggested API, plus batch operations, etc. We need a
> way
> > > to
> > > > > add
> > > > > > > > > "sugar" without it interfering with the consistency of the
> > > core.
> > > > > > > > Obviously
> > > > > > > > > this is a bigger issue but perhaps important to solve to
> > > > implement
> > > > > > > steps
> > > > > > > > in
> > > > > > > > > the fashion you describe.
> > > > > > > > > 5. I suppose that the reason for mergeE and mergeV is to
> > > specify
> > > > > what
> > > > > > > > > element type the first Map argument should be applied to?
> > what
> > > > > about
> > > > > > > > > mergeVP (i.e. vertex property as it too is an element) ?
> > That's
> > > > > > tricky
> > > > > > > > but
> > > > > > > > > I don't think we should miss that. Perhaps merge() could
> be a
> > > > > > "complex
> > > > > > > > > modulator"?? that's a new concept of course, but you would
> do
> > > > > > > > g.V().merge()
> > > > > > > > > and the label and first Map would fold to VertexStartStep
> > (i.e.
> > > > > V())
> > > > > > > for
> > > > > > > > > the lookup and then a MergeStep would follow - thus a
> > "complex
> > > > > > > modulator"
> > > > > > > > > as it does more than just change the behavior of the
> previous
> > > > step
> > > > > -
> > > > > > it
> > > > > > > > > also adds its own. I suppose it could also add has() steps
> > > > followed
> > > > > > by
> > > > > > > > the
> > > > > > > > > MergeStep and then the has() operations would fold in
> > normally
> > > as
> > > > > > they
> > > > > > > do
> > > > > > > > > today. In this way, we can simplify to just one single
> > > > > > > > > merge(String,Map,Map). ??
> > > > > > > > > 6. One thing neither my approach nor yours seems to do is
> > tell
> > > > the
> > > > > > user
> > > > > > > > if
> > > > > > > > > they created something or updated something - that's
> another
> > > > thing
> > > > > > I've
> > > > > > > > > seen users want to have in get-or-create. Here again we go
> > > deeper
> > > > > > into
> > > > > > > a
> > > > > > > > > less general step specification as alluded to in 4, but a
> > > merge()
> > > > > > step
> > > > > > > as
> > > > > > > > > proposed in 5, might return [Element,boolean] so as to
> > provide
> > > an
> > > > > > > > indicator
> > > > > > > > > of creation?
> > > > > > > > > 7. You were just introducing your ideas here, so perhaps
> you
> > > > > haven't
> > > > > > > > gotten
> > > > > > > > > this far yet, but a shortcoming to doing
> > merge(String,Map,Map)
> > > is
> > > > > > that
> > > > > > > it
> > > > > > > > > leaves open no opportunity to stream a List of Maps to a
> > > merge()
> > > > > for
> > > > > > a
> > > > > > > > form
> > > > > > > > > of batch loading which is mighty common and one of the
> > > variations
> > > > > of
> > > > > > > the
> > > > > > > > > coalesce() pattern that I alluded to at the start of all
> > this.
> > > I
> > > > > > think
> > > > > > > > that
> > > > > > > > > we would want to be sure that we left open the option to do
> > > that
> > > > > > > somehow.
> > > > > > > > > 8. If we had a general purpose merge() step I wonder if it
> > > makes
> > > > > > > > developing
> > > > > > > > > the API as you suggested easier to do?
> > > > > > > > >
> > > > > > > > > I think I'd like to solve the problems you describe in your
> > > post
> > > > as
> > > > > > > well
> > > > > > > > as
> > > > > > > > > the ones in mine. There is some relation there, but gaps as
> > > well.
> > > > > > With
> > > > > > > > more
> > > > > > > > > discussion here we can figure something out.
> > > > > > > > >
> > > > > > > > > Thanks for starting this talk - good one!
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Wed, Sep 16, 2020 at 9:26 PM David Bechberger <
> > > > > > [email protected]>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > I've had a few on and off discussions with a few people
> > here,
> > > > so
> > > > > I
> > > > > > > > wanted
> > > > > > > > > > to send this out to everyone for feedback.
> > > > > > > > > >
> > > > > > > > > > What are people's thoughts on creating a new set of steps
> > > that
> > > > > > codify
> > > > > > > > > > common Gremlin best practices?
> > > > > > > > > >
> > > > > > > > > > I think there are several common Gremlin patterns where
> > users
> > > > > would
> > > > > > > > > benefit
> > > > > > > > > > from the additional guidance that these codified steps
> > > > represent.
> > > > > > > The
> > > > > > > > > > first one I would recommend though is codifying the
> element
> > > > > > existence
> > > > > > > > > > pattern into a single Gremlin step, something like:
> > > > > > > > > >
> > > > > > > > > > mergeV(String, Map, Map)
> > > > > > > > > >      String - The vertex label
> > > > > > > > > >      Map (first) - The properties to match existing
> > vertices
> > > on
> > > > > > > > > >      Map (second) - Any additional properties to set if a
> > new
> > > > > > vertex
> > > > > > > is
> > > > > > > > > > created (optional)
> > > > > > > > > > mergeE(String, Map, Map)
> > > > > > > > > >      String - The edge label
> > > > > > > > > >      Map (first) - The properties to match existing edge
> on
> > > > > > > > > >      Map (second) - Any additional properties to set if a
> > new
> > > > > edge
> > > > > > is
> > > > > > > > > > created (optional)
> > > > > > > > > >
> > > > > > > > > > In each of these cases these steps would perform the same
> > > > upsert
> > > > > > > > > > functionality as the element existence pattern.
> > > > > > > > > >
> > > > > > > > > > Example:
> > > > > > > > > >
> > > > > > > > > > g.V().has('person','name','stephen').
> > > > > > > > > >            fold().
> > > > > > > > > >            coalesce(unfold(),
> > > > > > > > > >                     addV('person').
> > > > > > > > > >                       property('name','stephen').
> > > > > > > > > >                       property('age',34))
> > > > > > > > > >
> > > > > > > > > > would become:
> > > > > > > > > >
> > > > > > > > > > g.mergeV('person', {'name': 'stephen'}, {'age', 34})
> > > > > > > > > >
> > > > > > > > > > I think that this change would be a good addition to the
> > > > language
> > > > > > for
> > > > > > > > > > several reasons:
> > > > > > > > > >
> > > > > > > > > > * This codifies the best practice for a specific
> > > action/recipe,
> > > > > > which
> > > > > > > > > > reduces the chance that someone uses the pattern
> > incorrectly
> > > > > > > > > > * Most complex Gremlin traversals are verbose.  Reducing
> > the
> > > > > amount
> > > > > > > of
> > > > > > > > > code
> > > > > > > > > > that needs to be written and maintained allows for a
> better
> > > > > > developer
> > > > > > > > > > experience.
> > > > > > > > > > * It will lower the bar of entry for a developer by
> making
> > > > these
> > > > > > > > actions
> > > > > > > > > > more discoverable.  The more we can help bring these
> > patterns
> > > > to
> > > > > > the
> > > > > > > > > > forefront of the language via these pattern/meta steps
> the
> > > more
> > > > > we
> > > > > > > > guide
> > > > > > > > > > users towards writing better Gremlin faster
> > > > > > > > > > * This allows DB vendors to optimize for this pattern
> > > > > > > > > >
> > > > > > > > > > I know that this would likely be the first step in
> Gremlin
> > > that
> > > > > > > > codifies
> > > > > > > > > a
> > > > > > > > > > pattern, so I'd like to get other's thoughts on this?
> > > > > > > > > >
> > > > > > > > > > Dave
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] Creating pattern steps to codify best practices

Reply via email to