Re: Blueprints - Gremlin in TP3

Matthias Broecheler Mon, 13 Apr 2015 16:35:34 -0700

Hi guys,

I disagree that this is desired behavior. The graph is the raw structure of
the data and a traversal gives you a "view" of that data. That
characterization makes sense. So, let's say we have a traversal with
PartitionGraphStrategy added to our traversal g, then
t = g.V().out()
gives us a few of the graph restricted to the data that a particular user
can see. However, I see a problem in that the raw graph elements can leak
out of that view. If I iterate this traversal out, I get normal vertex
objects and from there I can access anything I want. From a conceptual
perspective, we have the raw data leaking out of the view which I think
will be very confusing to users. In particular, the more a traversal
becomes a DSL in its own right (with filtering, triangle closure and what
have ya happening under the hood) the more confusing it will be on the user
that the vertices that fall out at the end are just "raw" vertices and any
subsequent access is no longer tied to the traversal. I am afraid that this
will be a major stumbling block for TP3 users.
It also makes it difficult to implement things like PartitionGraphStrategy
or any kind of authentication or access control graph strategy, since you
can easily get around it by taking the elements out of the traversal. I
don't necessarily mean that in an adversarial setting where somebody is
trying to hack the system, but also simply in a setup where health care
developers want to use strategies to avoid illegal access which they
control by way of a custom strategy. In the current setup, it is very easy
for a developer (and almost natural) to slip out of the traversal and go at
the elements directly.


Now, for the argument that you shouldn't be doing that and if you want to
do something to the elements in a traversal you should do that in the
traversal itself: I agree with that sentiment and the idea of traversal
becoming the query language and that's all you ever use. However, if that
is the case, then we should consider not returning elements at all.
By analogy to SQL, SQL doesn't allow you to accidentally (or not) slip out
of the relational algebra and start manipulating database records directly
(some earlier systems actually allowed that for performance reasons though
(I assume) it was quickly realized what a horrible idea that is). That's
kind of what TinkerPop3 does right now. A graph traversal should be
self-contained and sealed to avoid such conceptual leakage.

Here are my 2 cents on a resolution: It seems we are in agreement that
people should write traversals to produce the result set they are
interested in and not do Blueprints style "coding" to get there. With the
introduced nested traversals. modifiers and all the other new features it
should indeed be possible to do that 90%+ of the time without using lambdas.
We could go the route of simply not returning elements in traversals at all
(but only some projection of them, like a valueMap). However, that leaves a
small percentage of use cases where do want to get elements (for instance
if you need a lambda step inside your traversal or absolutely want to get a
vertex back). In those cases, TP3 should simply wrap the element into a
TraversalElement which holds a pointer to the source traversal so that it
remains within the "view".

The arguments against this are:
1) Wrappers screwed up the original GraphStrategies: That does not apply
here, since TP3 would do all the wrapping on top of the graph db
implementation so that the implementation isn't even aware of it.
Specifically, elements would only get wrapped if a) they are the result of
a traversal, b) when passed into a lambda for evaluation or c) when they
are added to a user accessible sideeffect data structure. In other words,
during processing, vertices aren't wrapped - hence there is no problem for
vendor implementations and TP3 has full control over the wrapping.
2) The performance impact of wrapping: Since we only wrap selectively I
don't think this matters. We are only wrapping elements that get returned
(in some way or another) to the user as a result. Such results are
typically small and hence it doesn't cost much to wrap. As such, all
heavy-duty computation inside a traversal (in particular when you consider
OLAP) is done without any wrapping.

WDTY?
Matthias






On Mon, Apr 13, 2015 at 2:52 PM Marko Rodriguez <[email protected]>
wrote:

> Hi,
>
> Yep. The GraphTraversal is your "query." Get the data you want from your
> query,
>
> In SQL, do you want a Row back or do you want a String, a List of Strings,
> a Map of counts, etc?
>
> Finally, if there is something you want to do that can't be done with the
> provided steps, then use a lambda.
>
> Marko.
>
> http://markorodriguez.com
>
> On Apr 13, 2015, at 3:44 PM, Matt Frantz <[email protected]>
> wrote:
>
> > It's true that doing things The Right Way takes a bit of discipline.
> When
> > I first started with TP3, I wanted to get the vertices and then do
> > post-processing in the application.  Matthias's point (if I understand
> it)
> > is that this "what can I do with a vertex" approach leads to suboptimal
> > implementations.  Expressing what you want in lambda-free Gremlin is the
> > goal.  So the rule of thumb is to return to your original traversal and
> > keep extending it until it does everything you want to do.
> >
> > On Mon, Apr 13, 2015 at 2:27 PM, Marko Rodriguez <[email protected]>
> > wrote:
> >
> >> Hi,
> >>
> >> Yea, there could be a step that yields Traversals if you plan to
> traverse
> >> off the returns. But then why not have the logic in your original
> traversal?
> >>
> >> We have to think of Graph as a data structure of vertices and edges.
> Then
> >> there are TraversalSources. When you put the graph into these traversal
> >> sources and you get a "view of the graph" from the perspective of the
> DSL.
> >> If you are getting our a vertex, its a vertex. Thats that. However,
> what do
> >> you want with that vertex? Its id? Well, end with id(). Its label, well
> end
> >> with label(). So forth and so on… end the GraphTraversal with the
> ultimate
> >> result you want.
> >>
> >> Thanks,
> >> Marko.
> >>
> >> http://markorodriguez.com
> >>
> >> On Apr 13, 2015, at 3:09 PM, Matt Frantz <[email protected]>
> >> wrote:
> >>
> >>> I guess what you want to avoid is a new set of interfaces like
> >>> VertexForTraversal, EdgeForTraversal, etc.  That's a fair point.
> >>>
> >>> What a developer has to do now is something like this:
> >>>
> >>> t = g.traversal().V().out()
> >>> while (t.hasNext()) {
> >>> v = t.next();
> >>> vt = g.traversal().V(v);
> >>> vt.out()...;
> >>> }
> >>>
> >>> In effect, the proposed "forTraversal" (or perhaps "asTraversal" or
> just
> >>> "traversal") step would simply produce those "vt" traversals.
> >>>
> >>> If you wanted both the element and the springboard, you could use
> select:
> >>>
> >>> g.traversal().V().out().as('v').traversal().as('vt').select('v', 'vt');
> >>>
> >>>
> >>>
> >>> On Mon, Apr 13, 2015 at 1:19 PM, Marko Rodriguez <[email protected]
> >
> >>> wrote:
> >>>
> >>>> Technically, that is possible.
> >>>>
> >>>> Would I implement it, no. Wrappers just lead to problems as we have
> seen
> >>>> with Graph strategies.
> >>>>
> >>>> Marko.
> >>>>
> >>>> http://markorodriguez.com
> >>>>
> >>>> On Apr 13, 2015, at 2:14 PM, Matt Frantz <[email protected]>
> >>>> wrote:
> >>>>
> >>>>> What about a step that would wrap the elements, so that the developer
> >>>> could
> >>>>> decide if she wanted them to be springboards for subsequent
> traversals?
> >>>>>
> >>>>> g.traversal().V().out().forTraversal()
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Mon, Apr 13, 2015 at 10:17 AM, Marko Rodriguez <
> >> [email protected]>
> >>>>> wrote:
> >>>>>
> >>>>>> Hello,
> >>>>>>
> >>>>>> No, as that reference does not exist and to add it to every Element
> >>>>>> produced would be exceeding expensive --- not only from a 64-bit
> >>>> reference
> >>>>>> standpoint, but also from a threading standpoint. To make it work,
> you
> >>>>>> would have to wrap each Element produced and that would be an Object
> >>>>>> wrapper with a 64-bit reference. Eek. And then in OLAP, where
> Elements
> >>>> are
> >>>>>> created all over the cluster, what 64-bit reference to use?! --
> which
> >>>> JVM?
> >>>>>>
> >>>>>> Marko.
> >>>>>>
> >>>>>> http://markorodriguez.com
> >>>>>>
> >>>>>> On Apr 13, 2015, at 10:55 AM, Matt Frantz <
> [email protected]
> >>>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Could Element.traversal() be a shortcut for returning to the
> >>>>>>> TraversalSource that produced the Element?
> >>>>>>>
> >>>>>>> On Mon, Apr 13, 2015 at 9:07 AM, Marko Rodriguez <
> >> [email protected]
> >>>>>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> They are stateless. Create once -- use over and over and over.
> >>>>>>>>
> >>>>>>>> Marko.
> >>>>>>>>
> >>>>>>>> http://markorodriguez.com
> >>>>>>>>
> >>>>>>>> On Apr 13, 2015, at 10:01 AM, Bryn Cooke <[email protected]>
> >> wrote:
> >>>>>>>>
> >>>>>>>>> Hi Marko,
> >>>>>>>>>
> >>>>>>>>> What is the recommended scope of a TraversalSource?
> >>>>>>>>>
> >>>>>>>>> Per graph?
> >>>>>>>>> Per thread?
> >>>>>>>>>
> >>>>>>>>> Should I be pooling them?
> >>>>>>>>>
> >>>>>>>>> Cheers,
> >>>>>>>>>
> >>>>>>>>> Bryn
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On 13/04/15 16:57, Marko Rodriguez wrote:
> >>>>>>>>>> Hi Matt,
> >>>>>>>>>>
> >>>>>>>>>> Yes, that is possible and easy to do, but I would not add it as
> we
> >>>>>> need
> >>>>>>>> to stress to people to always go through the same TraversalSource.
> >>>>>>>>>>
> >>>>>>>>>> The importance of TraversalSource can not be overstated. It is
> >>>>>>>> impossible to just have Vertex.out() for the following reasons:
> >>>>>>>>>>
> >>>>>>>>>>  1. GraphTraversal is just one type of DSL.
> >>>>>>>>>>  2. Ignoring 1, then what is the traversal engine that will
> >>>> execute
> >>>>>>>> Vertex.out()? Spark, Giraph, standard iterator, GremlinServer,
> etc.?
> >>>>>>>>>>  3. What are the strategies you are applying? You might have
> >>>>>>>> ReadOnlyStrategy on g.V(), but then you v.out().remove().
> Strategies
> >>>>>> gone…
> >>>>>>>>>>
> >>>>>>>>>> TraversalSource is your "traversal context." Users should always
> >> use
> >>>>>>>> this. If they want low level methods, they can, but they are not
> >>>>>> guaranteed:
> >>>>>>>>>>
> >>>>>>>>>>  1. An execution engine.
> >>>>>>>>>>  2. A set of strategies.
> >>>>>>>>>>  3. DSL method chaining.
> >>>>>>>>>>
> >>>>>>>>>> While we can do v.traversal().out(), you are then creating a new
> >>>>>>>> TraversalSource. This is expensive and diverts the user from using
> >> the
> >>>>>>>> original TraversalSource. For instance, lets say you are working
> >> with
> >>>>>>>> SparkGraphComputer, the you would have to do this:
> >>>>>>>>>>
> >>>>>>>>>>  v.traversal(computer(SparkComputerEngine)).out()
> >>>>>>>>>>
> >>>>>>>>>> This creates a new TraversalSource, traversal engine, graph
> >>>>>> references,
> >>>>>>>> etc… its just not "the way."
> >>>>>>>>>>
> >>>>>>>>>> Marko.
> >>>>>>>>>>
> >>>>>>>>>> http://markorodriguez.com
> >>>>>>>>>>
> >>>>>>>>>> On Apr 13, 2015, at 9:42 AM, Matt Frantz <
> >>>> [email protected]>
> >>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> Could something similar to what was done in splitting Graph and
> >>>>>>>>>>> GraphTraversalSource happen with Vertex/Edge?  That is:
> >>>>>>>>>>>
> >>>>>>>>>>> v.traversal().out()...
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> On Mon, Apr 13, 2015 at 7:28 AM, Marko Rodriguez <
> >>>>>> [email protected]
> >>>>>>>>>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> Hi,
> >>>>>>>>>>>>
> >>>>>>>>>>>> You can't start a traversal from any element because
> >>>> GraphTraversal
> >>>>>> is
> >>>>>>>>>>>> just one type of DSL. For instance,
> >>>>>>>>>>>>
> >>>>>>>>>>>>    vertex.friends().name()
> >>>>>>>>>>>>
> >>>>>>>>>>>> …would not exist as methods.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Finally, users can do vertex.edges() if they please, but its
> not
> >>>>>> from
> >>>>>>>> a
> >>>>>>>>>>>> TraversalSource so its not "DSL"'d. If you want optimizations,
> >>>>>> method
> >>>>>>>>>>>> chaining, etc., everything must go through a TraversalSource,
> if
> >>>>>> not,
> >>>>>>>> its
> >>>>>>>>>>>> "raw methods."
> >>>>>>>>>>>>
> >>>>>>>>>>>> Marko.
> >>>>>>>>>>>>
> >>>>>>>>>>>> http://markorodriguez.com
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Apr 13, 2015, at 3:39 AM, Bryn Cooke <[email protected]>
> >>>>>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>> I have to agree,
> >>>>>>>>>>>>> The loss of being able to start a traversal at an element is
> a
> >>>> real
> >>>>>>>>>>>> blow, although I'm sure it was done for good reasons.
> >>>>>>>>>>>>> Here are some additional considerations:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> * Graph vendors and TP users have different requirements for
> an
> >>>> API
> >>>>>>>>>>>>> that may not be unifiable in a satisfactory way. So perhaps
> the
> >>>>>>>>>>>>> current interfaces are geared towards graph vendors and a
> >> wrapper
> >>>>>>>>>>>>> could be created for users. Without moving from interfaces to
> >>>>>>>>>>>>> abstract classes and therefore gaining the extra power of
> >>>> protected
> >>>>>>>>>>>>> scope any unified API will be difficult to achieve.
> >>>>>>>>>>>>> * Scala and Groovy have added functionality to make Gremin
> >> easier
> >>>>>> to
> >>>>>>>>>>>>> deal with. The same can and perhaps should be done for Java.
> >> Type
> >>>>>>>>>>>>> safety and syntactic sugar is available to different degrees
> in
> >>>>>> each
> >>>>>>>>>>>>> language, so perhaps we should not try too hard in gremlin
> core
> >>>> and
> >>>>>>>>>>>>> leave that to language specific bindings. In short, gremlin
> >> core
> >>>>>>>>>>>>> could be targeted to the JVM and Java/Scala/Groovy users have
> >>>>>>>>>>>>> something else that happens to allow traversals from
> elements.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Cheers,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Bryn
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On 13/04/15 07:09, pieter-gmail wrote:
> >>>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I concur with Matthias.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Thanks
> >>>>>>>>>>>>>> Pieter
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On 13/04/2015 01:59, Matthias Broecheler wrote:
> >>>>>>>>>>>>>>> Hi guys,
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> after playing with the M8 release for a bit I wanted to
> >> discuss
> >>>>>> the
> >>>>>>>>>>>>>>> following: With M8, TP3 effectively brings back the
> >> distinction
> >>>>>>>> between
> >>>>>>>>>>>>>>> Blueprints and Gremlin, i.e. there are low level methods
> for
> >>>>>>>> accessing
> >>>>>>>>>>>> a
> >>>>>>>>>>>>>>> vertex' adjacency list and there are the traversals.
> >>>>>>>>>>>>>>> In TP2 that was an issue because developers would start
> >>>>>>>> implementing
> >>>>>>>>>>>>>>> against Blueprints directly and treat it like a graph
> >> library -
> >>>>>> not
> >>>>>>>>>>>> like a
> >>>>>>>>>>>>>>> query language. It can be reasonably assumed that the same
> >> will
> >>>>>>>> happen
> >>>>>>>>>>>> for
> >>>>>>>>>>>>>>> TP3. This will be further aggravated by the fact that
> element
> >>>>>>>>>>>> traversals
> >>>>>>>>>>>>>>> are no longer supported in TP3. Meaning, you can no longer
> do
> >>>>>>>>>>>>>>> v.out('knows').in('knows") but have to put the vertex back
> >> into
> >>>>>> the
> >>>>>>>>>>>>>>> GraphTraversalSource. That will be very confusing and one
> can
> >>>>>>>> expect
> >>>>>>>>>>>> that
> >>>>>>>>>>>>>>> user's will prefer using the primitive adjacency list calls
> >>>>>>>> instead.
> >>>>>>>>>>>>>>> When you have a vertex and you try to traverse out of it,
> you
> >>>>>> will
> >>>>>>>>>>>> type in
> >>>>>>>>>>>>>>> "v." in your IDE. Lacking any other options, the user will
> >>>> select
> >>>>>>>>>>>>>>> "v.edges()", etc.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> I wanted to bring this to your attention since I like the
> >>>> vision
> >>>>>> of
> >>>>>>>>>>>>>>> "everything is Gremlin". In naming this is true but I am
> >> afraid
> >>>>>>>> that
> >>>>>>>>>>>> actual
> >>>>>>>>>>>>>>> user behavior will be different.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> - Why not hide the access methods in the iterator method as
> >> was
> >>>>>>>> done
> >>>>>>>>>>>> in the
> >>>>>>>>>>>>>>> last milestone release?
> >>>>>>>>>>>>>>> - Should we enforce that the GraphTraversalSource is
> attached
> >>>> to
> >>>>>>>> each
> >>>>>>>>>>>>>>> element so that traversing out of it is possible?
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>>>> Matthias
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>>>
> >>>>
> >>>>
> >>
> >>
>
>

Re: Blueprints - Gremlin in TP3

Reply via email to