Re: Blueprints - Gremlin in TP3

Marko Rodriguez Mon, 13 Apr 2015 17:37:40 -0700

Hi,

> I disagree that this is desired behavior. The graph is the raw structure of
> the data and a traversal gives you a "view" of that data. That
> characterization makes sense. So, let's say we have a traversal with
> PartitionGraphStrategy added to our traversal g, then
> t = g.V().out()
> gives us a few of the graph restricted to the data that a particular user
> can see. However, I see a problem in that the raw graph elements can leak
> out of that view. If I iterate this traversal out, I get normal vertex
> objects and from there I can access anything I want. From a conceptual
> perspective, we have the raw data leaking out of the view which I think
> will be very confusing to users. In particular, the more a traversal…


This can happen regardless of TraversalSource. This was possible with 
GraphStrategies.


> Now, for the argument that you shouldn't be doing that and if you want to
> do something to the elements in a traversal you should do that in the
> traversal itself: I agree with that sentiment and the idea of traversal
> becoming the query language and that's all you ever use. However, if that
> is the case, then we should consider not returning elements at all.
> By analogy to SQL, SQL doesn't allow you to accidentally (or not) slip out
> of the relational algebra and start manipulating database records directly
> (some earlier systems actually allowed that for performance reasons though
> (I assume) it was quickly realized what a horrible idea that is). That's
> kind of what TinkerPop3 does right now. A graph traversal should be
> self-contained and sealed to avoid such conceptual leakage.

I have also thought about not letting elements be returned by a Traversal -- 
only primitives. But thought it was too restrictive so left it as it is. Its 
very easy for vendors/application developers to provide a strategy to restrict 
that (see my next comment).

> Here are my 2 cents on a resolution: It seems we are in agreement that
> people should write traversals to produce the result set they are
> interested in and not do Blueprints style "coding" to get there. With the
> introduced nested traversals. modifiers and all the other new features it
> should indeed be possible to do that 90%+ of the time without using lambdas.
> We could go the route of simply not returning elements in traversals at all
> (but only some projection of them, like a valueMap). However, that leaves a
> small percentage of use cases where do want to get elements (for instance
> if you need a lambda step inside your traversal or absolutely want to get a
> vertex back). In those cases, TP3 should simply wrap the element into a
> TraversalElement which holds a pointer to the source traversal so that it
> remains within the "view".

I don't think wrappers is a good idea. I think if people want to have such 
restrictions, we simply provide "PrimitivesOnlyVerificationStrategy" which 
doesn't allow VertexStep, VertexEdgeStep, or PropertiesStep to be the last step 
in a traversal. Easy peasy -- < 10 lines of code. Moreover, it would be like 
LambdaTraversalStrategy, with the ability to turn it on and off as needed.

Take care,
Marko.


> 
> 
> 
> 
> 
> On Mon, Apr 13, 2015 at 2:52 PM Marko Rodriguez <[email protected]>
> wrote:
> 
>> Hi,
>> 
>> Yep. The GraphTraversal is your "query." Get the data you want from your
>> query,
>> 
>> In SQL, do you want a Row back or do you want a String, a List of Strings,
>> a Map of counts, etc?
>> 
>> Finally, if there is something you want to do that can't be done with the
>> provided steps, then use a lambda.
>> 
>> Marko.
>> 
>> http://markorodriguez.com
>> 
>> On Apr 13, 2015, at 3:44 PM, Matt Frantz <[email protected]>
>> wrote:
>> 
>>> It's true that doing things The Right Way takes a bit of discipline.
>> When
>>> I first started with TP3, I wanted to get the vertices and then do
>>> post-processing in the application.  Matthias's point (if I understand
>> it)
>>> is that this "what can I do with a vertex" approach leads to suboptimal
>>> implementations.  Expressing what you want in lambda-free Gremlin is the
>>> goal.  So the rule of thumb is to return to your original traversal and
>>> keep extending it until it does everything you want to do.
>>> 
>>> On Mon, Apr 13, 2015 at 2:27 PM, Marko Rodriguez <[email protected]>
>>> wrote:
>>> 
>>>> Hi,
>>>> 
>>>> Yea, there could be a step that yields Traversals if you plan to
>> traverse
>>>> off the returns. But then why not have the logic in your original
>> traversal?
>>>> 
>>>> We have to think of Graph as a data structure of vertices and edges.
>> Then
>>>> there are TraversalSources. When you put the graph into these traversal
>>>> sources and you get a "view of the graph" from the perspective of the
>> DSL.
>>>> If you are getting our a vertex, its a vertex. Thats that. However,
>> what do
>>>> you want with that vertex? Its id? Well, end with id(). Its label, well
>> end
>>>> with label(). So forth and so on… end the GraphTraversal with the
>> ultimate
>>>> result you want.
>>>> 
>>>> Thanks,
>>>> Marko.
>>>> 
>>>> http://markorodriguez.com
>>>> 
>>>> On Apr 13, 2015, at 3:09 PM, Matt Frantz <[email protected]>
>>>> wrote:
>>>> 
>>>>> I guess what you want to avoid is a new set of interfaces like
>>>>> VertexForTraversal, EdgeForTraversal, etc.  That's a fair point.
>>>>> 
>>>>> What a developer has to do now is something like this:
>>>>> 
>>>>> t = g.traversal().V().out()
>>>>> while (t.hasNext()) {
>>>>> v = t.next();
>>>>> vt = g.traversal().V(v);
>>>>> vt.out()...;
>>>>> }
>>>>> 
>>>>> In effect, the proposed "forTraversal" (or perhaps "asTraversal" or
>> just
>>>>> "traversal") step would simply produce those "vt" traversals.
>>>>> 
>>>>> If you wanted both the element and the springboard, you could use
>> select:
>>>>> 
>>>>> g.traversal().V().out().as('v').traversal().as('vt').select('v', 'vt');
>>>>> 
>>>>> 
>>>>> 
>>>>> On Mon, Apr 13, 2015 at 1:19 PM, Marko Rodriguez <[email protected]
>>> 
>>>>> wrote:
>>>>> 
>>>>>> Technically, that is possible.
>>>>>> 
>>>>>> Would I implement it, no. Wrappers just lead to problems as we have
>> seen
>>>>>> with Graph strategies.
>>>>>> 
>>>>>> Marko.
>>>>>> 
>>>>>> http://markorodriguez.com
>>>>>> 
>>>>>> On Apr 13, 2015, at 2:14 PM, Matt Frantz <[email protected]>
>>>>>> wrote:
>>>>>> 
>>>>>>> What about a step that would wrap the elements, so that the developer
>>>>>> could
>>>>>>> decide if she wanted them to be springboards for subsequent
>> traversals?
>>>>>>> 
>>>>>>> g.traversal().V().out().forTraversal()
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On Mon, Apr 13, 2015 at 10:17 AM, Marko Rodriguez <
>>>> [email protected]>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> Hello,
>>>>>>>> 
>>>>>>>> No, as that reference does not exist and to add it to every Element
>>>>>>>> produced would be exceeding expensive --- not only from a 64-bit
>>>>>> reference
>>>>>>>> standpoint, but also from a threading standpoint. To make it work,
>> you
>>>>>>>> would have to wrap each Element produced and that would be an Object
>>>>>>>> wrapper with a 64-bit reference. Eek. And then in OLAP, where
>> Elements
>>>>>> are
>>>>>>>> created all over the cluster, what 64-bit reference to use?! --
>> which
>>>>>> JVM?
>>>>>>>> 
>>>>>>>> Marko.
>>>>>>>> 
>>>>>>>> http://markorodriguez.com
>>>>>>>> 
>>>>>>>> On Apr 13, 2015, at 10:55 AM, Matt Frantz <
>> [email protected]
>>>>> 
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> Could Element.traversal() be a shortcut for returning to the
>>>>>>>>> TraversalSource that produced the Element?
>>>>>>>>> 
>>>>>>>>> On Mon, Apr 13, 2015 at 9:07 AM, Marko Rodriguez <
>>>> [email protected]
>>>>>>> 
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> They are stateless. Create once -- use over and over and over.
>>>>>>>>>> 
>>>>>>>>>> Marko.
>>>>>>>>>> 
>>>>>>>>>> http://markorodriguez.com
>>>>>>>>>> 
>>>>>>>>>> On Apr 13, 2015, at 10:01 AM, Bryn Cooke <[email protected]>
>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> Hi Marko,
>>>>>>>>>>> 
>>>>>>>>>>> What is the recommended scope of a TraversalSource?
>>>>>>>>>>> 
>>>>>>>>>>> Per graph?
>>>>>>>>>>> Per thread?
>>>>>>>>>>> 
>>>>>>>>>>> Should I be pooling them?
>>>>>>>>>>> 
>>>>>>>>>>> Cheers,
>>>>>>>>>>> 
>>>>>>>>>>> Bryn
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> On 13/04/15 16:57, Marko Rodriguez wrote:
>>>>>>>>>>>> Hi Matt,
>>>>>>>>>>>> 
>>>>>>>>>>>> Yes, that is possible and easy to do, but I would not add it as
>> we
>>>>>>>> need
>>>>>>>>>> to stress to people to always go through the same TraversalSource.
>>>>>>>>>>>> 
>>>>>>>>>>>> The importance of TraversalSource can not be overstated. It is
>>>>>>>>>> impossible to just have Vertex.out() for the following reasons:
>>>>>>>>>>>> 
>>>>>>>>>>>> 1. GraphTraversal is just one type of DSL.
>>>>>>>>>>>> 2. Ignoring 1, then what is the traversal engine that will
>>>>>> execute
>>>>>>>>>> Vertex.out()? Spark, Giraph, standard iterator, GremlinServer,
>> etc.?
>>>>>>>>>>>> 3. What are the strategies you are applying? You might have
>>>>>>>>>> ReadOnlyStrategy on g.V(), but then you v.out().remove().
>> Strategies
>>>>>>>> gone…
>>>>>>>>>>>> 
>>>>>>>>>>>> TraversalSource is your "traversal context." Users should always
>>>> use
>>>>>>>>>> this. If they want low level methods, they can, but they are not
>>>>>>>> guaranteed:
>>>>>>>>>>>> 
>>>>>>>>>>>> 1. An execution engine.
>>>>>>>>>>>> 2. A set of strategies.
>>>>>>>>>>>> 3. DSL method chaining.
>>>>>>>>>>>> 
>>>>>>>>>>>> While we can do v.traversal().out(), you are then creating a new
>>>>>>>>>> TraversalSource. This is expensive and diverts the user from using
>>>> the
>>>>>>>>>> original TraversalSource. For instance, lets say you are working
>>>> with
>>>>>>>>>> SparkGraphComputer, the you would have to do this:
>>>>>>>>>>>> 
>>>>>>>>>>>> v.traversal(computer(SparkComputerEngine)).out()
>>>>>>>>>>>> 
>>>>>>>>>>>> This creates a new TraversalSource, traversal engine, graph
>>>>>>>> references,
>>>>>>>>>> etc… its just not "the way."
>>>>>>>>>>>> 
>>>>>>>>>>>> Marko.
>>>>>>>>>>>> 
>>>>>>>>>>>> http://markorodriguez.com
>>>>>>>>>>>> 
>>>>>>>>>>>> On Apr 13, 2015, at 9:42 AM, Matt Frantz <
>>>>>> [email protected]>
>>>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> Could something similar to what was done in splitting Graph and
>>>>>>>>>>>>> GraphTraversalSource happen with Vertex/Edge?  That is:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> v.traversal().out()...
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Mon, Apr 13, 2015 at 7:28 AM, Marko Rodriguez <
>>>>>>>> [email protected]
>>>>>>>>>>> 
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> You can't start a traversal from any element because
>>>>>> GraphTraversal
>>>>>>>> is
>>>>>>>>>>>>>> just one type of DSL. For instance,
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>   vertex.friends().name()
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> …would not exist as methods.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Finally, users can do vertex.edges() if they please, but its
>> not
>>>>>>>> from
>>>>>>>>>> a
>>>>>>>>>>>>>> TraversalSource so its not "DSL"'d. If you want optimizations,
>>>>>>>> method
>>>>>>>>>>>>>> chaining, etc., everything must go through a TraversalSource,
>> if
>>>>>>>> not,
>>>>>>>>>> its
>>>>>>>>>>>>>> "raw methods."
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Marko.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> http://markorodriguez.com
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Apr 13, 2015, at 3:39 AM, Bryn Cooke <[email protected]>
>>>>>>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> I have to agree,
>>>>>>>>>>>>>>> The loss of being able to start a traversal at an element is
>> a
>>>>>> real
>>>>>>>>>>>>>> blow, although I'm sure it was done for good reasons.
>>>>>>>>>>>>>>> Here are some additional considerations:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> * Graph vendors and TP users have different requirements for
>> an
>>>>>> API
>>>>>>>>>>>>>>> that may not be unifiable in a satisfactory way. So perhaps
>> the
>>>>>>>>>>>>>>> current interfaces are geared towards graph vendors and a
>>>> wrapper
>>>>>>>>>>>>>>> could be created for users. Without moving from interfaces to
>>>>>>>>>>>>>>> abstract classes and therefore gaining the extra power of
>>>>>> protected
>>>>>>>>>>>>>>> scope any unified API will be difficult to achieve.
>>>>>>>>>>>>>>> * Scala and Groovy have added functionality to make Gremin
>>>> easier
>>>>>>>> to
>>>>>>>>>>>>>>> deal with. The same can and perhaps should be done for Java.
>>>> Type
>>>>>>>>>>>>>>> safety and syntactic sugar is available to different degrees
>> in
>>>>>>>> each
>>>>>>>>>>>>>>> language, so perhaps we should not try too hard in gremlin
>> core
>>>>>> and
>>>>>>>>>>>>>>> leave that to language specific bindings. In short, gremlin
>>>> core
>>>>>>>>>>>>>>> could be targeted to the JVM and Java/Scala/Groovy users have
>>>>>>>>>>>>>>> something else that happens to allow traversals from
>> elements.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Bryn
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On 13/04/15 07:09, pieter-gmail wrote:
>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> I concur with Matthias.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>>>> Pieter
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On 13/04/2015 01:59, Matthias Broecheler wrote:
>>>>>>>>>>>>>>>>> Hi guys,
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> after playing with the M8 release for a bit I wanted to
>>>> discuss
>>>>>>>> the
>>>>>>>>>>>>>>>>> following: With M8, TP3 effectively brings back the
>>>> distinction
>>>>>>>>>> between
>>>>>>>>>>>>>>>>> Blueprints and Gremlin, i.e. there are low level methods
>> for
>>>>>>>>>> accessing
>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>> vertex' adjacency list and there are the traversals.
>>>>>>>>>>>>>>>>> In TP2 that was an issue because developers would start
>>>>>>>>>> implementing
>>>>>>>>>>>>>>>>> against Blueprints directly and treat it like a graph
>>>> library -
>>>>>>>> not
>>>>>>>>>>>>>> like a
>>>>>>>>>>>>>>>>> query language. It can be reasonably assumed that the same
>>>> will
>>>>>>>>>> happen
>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>> TP3. This will be further aggravated by the fact that
>> element
>>>>>>>>>>>>>> traversals
>>>>>>>>>>>>>>>>> are no longer supported in TP3. Meaning, you can no longer
>> do
>>>>>>>>>>>>>>>>> v.out('knows').in('knows") but have to put the vertex back
>>>> into
>>>>>>>> the
>>>>>>>>>>>>>>>>> GraphTraversalSource. That will be very confusing and one
>> can
>>>>>>>>>> expect
>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>> user's will prefer using the primitive adjacency list calls
>>>>>>>>>> instead.
>>>>>>>>>>>>>>>>> When you have a vertex and you try to traverse out of it,
>> you
>>>>>>>> will
>>>>>>>>>>>>>> type in
>>>>>>>>>>>>>>>>> "v." in your IDE. Lacking any other options, the user will
>>>>>> select
>>>>>>>>>>>>>>>>> "v.edges()", etc.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> I wanted to bring this to your attention since I like the
>>>>>> vision
>>>>>>>> of
>>>>>>>>>>>>>>>>> "everything is Gremlin". In naming this is true but I am
>>>> afraid
>>>>>>>>>> that
>>>>>>>>>>>>>> actual
>>>>>>>>>>>>>>>>> user behavior will be different.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> - Why not hide the access methods in the iterator method as
>>>> was
>>>>>>>>>> done
>>>>>>>>>>>>>> in the
>>>>>>>>>>>>>>>>> last milestone release?
>>>>>>>>>>>>>>>>> - Should we enforce that the GraphTraversalSource is
>> attached
>>>>>> to
>>>>>>>>>> each
>>>>>>>>>>>>>>>>> element so that traversing out of it is possible?
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>> Matthias
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>>> 
>>>> 
>>>> 
>> 
>>

Re: Blueprints - Gremlin in TP3

Reply via email to