Hi,
So is g.V. With g.V.has('age',lt(60)) you get some filtering before going on to
x.out(follows).
?,
Marko.
http://markorodriguez.com
On Jun 24, 2015, at 5:52 PM, Matthias Broecheler <[email protected]> wrote:
> I think you misunderstood my example. Take another look -
> as('x').has('age',lt(60))
> has the same start label as the match label. The name constraint applies to
> "y". This means, only the age constraint would be pulled out. And that
> would be bad because finding all vertices that have an age less than 60 can
> take forever.
>
> On Wed, Jun 24, 2015 at 4:49 PM Marko Rodriguez <[email protected]>
> wrote:
>
>> Hello,
>>
>> Again, it only pulls it out if the match()-step if the has()-part start
>> label is the same as the match() start label. Moreover, if there is no
>> start label to match()-step, then then nothing is pulled out. In your case,
>> the as('y').has('name','marko') can't "go first" as you need to bind "y"
>> first.
>>
>> Now, if you had as("x").has("name","marko") as well as
>> as("x").has("age",lt(60)), both would be pulled out and thus, available to
>> the vendor for index lookups as they please.
>>
>> Marko.
>>
>> http://markorodriguez.com
>>
>> On Jun 24, 2015, at 5:32 PM, Matthias Broecheler <[email protected]> wrote:
>>
>>> Consider this example:
>>>
>>> g.V.match("x", as('x').has('age',lt(60)), as('x').out('knows').as('y'),
>>> as('y').has('name','marko'))
>>>
>>> In this case the age constraint would be pulled out if I understand
>>> correctly. But this constraint has very poor selectivity in particular
>>> compared to the has('name','marko') constraint on 'y'. So, the better way
>>> to execute this match would be to start by retrieving all markos, finding
>>> the people who know them and then filter those by age.
>>> However, that is not possible if you pull out the age constraint.
>>>
>>>
>>> On Wed, Jun 24, 2015 at 4:11 PM Marko Rodriguez <[email protected]>
>>> wrote:
>>>
>>>> Hi Matthias,
>>>>
>>>> So the has()-container "pulling" only happens if a startLabel is
>> provided
>>>> (i.e. match("x", as("x").has("name","matthias")). And in that case, I
>> can't
>>>> imagine it ever not being desired as if you leave it in MatchStep, then
>> you
>>>> have one more pattern to order, keep runtime statistics on, cycle
>> through
>>>> for determine if a match has occurred, deduping on, and one more pattern
>>>> label to add to each match, etc. By pulling out the has()-container, you
>>>> can reduce the overhead in MatchStep. Finally, while I said it was "for
>>>> vendor indexing," its really not just about that because if the vendor
>>>> can't use it for indexing, its still good to have it outside the match()
>>>> for the stated reasons.
>>>>
>>>> Hope that is clear,
>>>> Marko.
>>>>
>>>> http://markorodriguez.com
>>>>
>>>> On Jun 19, 2015, at 12:07 PM, Matthias Broecheler <[email protected]>
>>>> wrote:
>>>>
>>>>> Hi Marko,
>>>>>
>>>>> is it possible to disable pulling out the has-containers? For many
>>>> graphdb
>>>>> vendors it would make sense to leave the has containers in the match
>> step
>>>>> and then select those has containers that promise the highest
>> selectivity
>>>>> for index calls based on the index statistics. Since TP3 isn't aware of
>>>>> indexes it could make such a call.
>>>>>
>>>>> Thanks,
>>>>> Matthias
>>>>>
>>>>> On Fri, Jun 19, 2015 at 10:42 AM Marko Rodriguez <[email protected]
>>>
>>>>> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> So, this morning I realized something neat about MatchStep<->WhereStep
>>>>>> interplay.
>>>>>>
>>>>>> First, MatchWhereStrategy is now called MatchPredicateStrategy as it
>> is
>>>>>> about moving predicates in and out of match().
>>>>>> - where()s go in.
>>>>>>
>>>>>>
>>>>
>> https://github.com/apache/incubator-tinkerpop/blob/2e3a25c318136b7f6c1aec5fae2c0c1b950fb3f9/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/strategy/optimization/MatchPredicateStrategy.java#L69
>>>>>> - has() containers go out.
>>>>>>
>>>>>>
>>>>
>> https://github.com/apache/incubator-tinkerpop/blob/2e3a25c318136b7f6c1aec5fae2c0c1b950fb3f9/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/strategy/optimization/MatchPredicateStrategy.java#L80
>>>>>>
>>>>>> Next, the question about "predicate traversals" in MatchStep is solved
>>>> by
>>>>>> simply saying:
>>>>>> "If you want a predicate traversal, use a where()-clause in your
>>>>>> pattern."
>>>>>>
>>>>>> Thats it! Lets look at what I mean by that (Josh and Daniel will
>>>>>> understand the ramifications best).
>>>>>>
>>>>>> gremlin> g.V().match('a',
>>>>>> __.as('a').out('created').as('b'),
>>>>>> __.as('a').repeat(out()).times(2))
>>>>>> ==>[a:v[1], b:v[3]]
>>>>>> ==>[a:v[1], b:v[3]]
>>>>>>
>>>>>> The above match() returns duplicates. Why? Because the second pattern
>>>>>> isn't binding, its just "checking" -- that is, it passes the traverser
>>>>>> through and if that traverser splits, well, there are more traversers
>>>>>> returned. In the original MatchStep, these were called "predicate
>>>>>> traversals" because they did not bind variables (i.e. no as() at the
>>>> end).
>>>>>> As such, their output didn't matter. However, in the new MatchStep, I
>>>> can't
>>>>>> do that so easily given the OLAP constraint. However, if you want
>>>>>> "predicate traversal" behavior, use WhereStep!
>>>>>>
>>>>>> g.V().match('a',
>>>>>> __.as('a').out('created').as('b'),
>>>>>> __.where(__.as('a').repeat(out()).times(2))
>>>>>> )
>>>>>> ==>[a:v[1], b:v[3]]
>>>>>>
>>>>>> So, if you don't care about the result of a pattern, only if it
>>>>>> "hasNext()" (which is much faster than "iterate()"), then wrap it in a
>>>>>> where() and there you go. Not only is this way more efficient as you
>> are
>>>>>> not generating traversers (i.e. results), you are also not creating
>>>>>> duplicate results (i.e. traversers with similar path histories).
>>>>>>
>>>>>> Finally, note you can also do this for a nice look and feel:
>>>>>>
>>>>>> g.V().match('a',
>>>>>> __.as('a').out('created').as('b'),
>>>>>> __.as('a').where(repeat(out()).times(2))
>>>>>> )
>>>>>> ==>[a:v[1], b:v[3]]
>>>>>>
>>>>>> So whats the catch? Why not just wrap all match patterns without an
>>>>>> end-label step in where()? Two reasons:
>>>>>> 1. Semantics. MatchStep is set of traversals where the traverser
>>>>>> is pushed into the traversals and when there are no more traversals to
>>>>>> take, it goes to the next step. Its not a filter-step, its a map-step.
>>>>>> 2. OLAP. WhereStep's internal traversal is a "local child" and
>>>>>> thus, can only compute as far as the local star graph in OLAP.
>>>> Typically,
>>>>>> any step that needs to know what happened at the end of an internal
>>>>>> traversal (filter or not) has to be locally bound. … this is the
>>>>>> fundamental difference between Gremlin OLAP and Gremlin OLTP.
>>>>>>
>>>>>> Finally finally….the last big issue I was having was "not()" inside
>>>> Match.
>>>>>> Again, because MatchStep uses "global children", it can't know what
>>>>>> happened to the traverser once it enters a pattern. And steps like NOT
>>>> need
>>>>>> to know if the traverser was filtered. Well, not() in where() works
>>>> great:
>>>>>>
>>>>>> g.V().as('a').out('created').
>>>>>> where(__.in('created').count().is(gt(1))).values('name')
>>>>>> ==>lop
>>>>>> ==>lop
>>>>>> ==>lop
>>>>>> g.V().as('a').out('created').
>>>>>> where(__.not(__.in('created').count().is(gt(1)))).values('name') //
>> it
>>>>>> sucks that groovy requires not and in to have __.
>>>>>> ==>ripple
>>>>>>
>>>>>> And guess what, if you want to NOT a pattern in match(), do it via
>>>> where()!
>>>>>>
>>>>>> g.V().match('a',
>>>>>> __.as('a').out('created').as('b'),
>>>>>> __.as('b').where(__.in('created').count().is(gt(1)))).
>>>>>> select().by('name')
>>>>>> ==>[a:marko, b:lop]
>>>>>> ==>[a:josh, b:lop]
>>>>>> ==>[a:peter, b:lop]
>>>>>> g.V().match('a',
>>>>>> __.as('a').out('created').as('b'),
>>>>>> __.as('b').where(__.not(__.in('created').count().is(gt(1))))).
>>>>>> select().by('name')
>>>>>> ==>[a:josh, b:ripple]
>>>>>>
>>>>>> And there we go. MatchPredicateStrategy can just throw where()-steps
>>>> into
>>>>>> MatchStep as is and the issue of "predicate traversals" is no longer
>> an
>>>>>> issue.
>>>>>>
>>>>>> Thanks for reading,
>>>>>> Marko.
>>>>>>
>>>>>> http://markorodriguez.com
>>>>>>
>>>>>> On Jun 17, 2015, at 4:25 PM, Marko Rodriguez <[email protected]>
>>>> wrote:
>>>>>>
>>>>>>> Hello,
>>>>>>>
>>>>>>> To extend on Kuppitz' comment -- Yes, MatchWhereStrategy folds in
>>>>>> where()-clauses. Note that with the recent work on XMatchStep (if we
>> go
>>>>>> with that for GA), where() clauses work natively in XMatchStep and we
>>>> will
>>>>>> also just fold any "right handed" where()-clauses into match() as
>> well.
>>>>>>>
>>>>>>> Marko.
>>>>>>>
>>>>>>> http://markorodriguez.com
>>>>>>>
>>>>>>> On Jun 17, 2015, at 3:04 PM, Daniel Kuppitz <[email protected]> wrote:
>>>>>>>
>>>>>>>> After actually looking into the docs, I decided to keep the example,
>>>>>> since
>>>>>>>> the description explicitely states, that in such a case the where()
>>>>>> clause
>>>>>>>> will automatically be folded into match():
>>>>>>>>
>>>>>>>> The where()-step can take either a BiPredicate (first example below)
>>>> or
>>>>>> a
>>>>>>>>> Traversal (second example below). Using MatchWhereStrategy,
>>>>>> where()-clauses
>>>>>>>>> can be automatically folded into match() and thus, subject to
>>>>>> match()-steps
>>>>>>>>> budget-match algorithm.
>>>>>>>>>
>>>>>>>>
>>>>>>>> The sample then shows, that
>>>>>>>>
>>>>>>>> g.V().match('a',
>>>>>>>> __.as('a').out('created').as('b'),
>>>>>>>> __.as('b').in('created').as('c')).
>>>>>>>> where(__.as('a').out('knows').as('c')).
>>>>>>>> select('a','c').by('name')
>>>>>>>>
>>>>>>>>
>>>>>>>> is - after the MatchWhereStrategy was applied (this is done
>>>>>> automatically)
>>>>>>>> - in fact the same thing as:
>>>>>>>>
>>>>>>>> g.V().match('a',
>>>>>>>> __.as('a').out('created').as('b'),
>>>>>>>> __.as('a').out('knows').as('c'),
>>>>>>>> __.as('b').in('created').as('c')).
>>>>>>>> select('a','c').by('name')
>>>>>>>>
>>>>>>>> ....
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> Daniel
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Jun 17, 2015 at 10:47 PM, Daniel Kuppitz <[email protected]>
>>>>>> wrote:
>>>>>>>>
>>>>>>>>> You're right. It's actually a pretty good example for where(), but
>>>> not
>>>>>>>>> for match()/where(). I will remove it and make sure that we have
>>>>>>>>> something similar in the where() sample section. Something like:
>>>>>>>>>
>>>>>>>>> g.V().as("a").out("created").as("b").in("created").as("c").
>>>>>>>>> where(__as("a").out("knows").as("c")).select().by("name")
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Cheers,
>>>>>>>>> Daniel
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Jun 17, 2015 at 8:43 PM, Matthias Broecheler <
>>>> [email protected]
>>>>>>>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi guys,
>>>>>>>>>>
>>>>>>>>>> looking at the second example in the following section of the
>> docs I
>>>>>>>>>> noticed a semantic overlap between match and where:
>>>>>>>>>>
>>>> http://www.tinkerpop.com/docs/3.0.0-SNAPSHOT/#using-where-with-match
>>>>>>>>>>
>>>>>>>>>> traversal = g.V().match('a', __.as('a').out('created').as('b'),
>>>>>> __.as('b'
>>>>>>>>>> ).in('created').as('c')). where(__.as('a').out('knows').as('c')).
>>>>>>>>>> select('a'
>>>>>>>>>> ,'c').by('name');
>>>>>>>>>>
>>>>>>>>>> The provided where clause could also have been folded into the
>>>> actual
>>>>>>>>>> traversal to yield the same result.
>>>>>>>>>> I wonder:
>>>>>>>>>> 1) Is there a way to avoid this ambiguity?
>>>>>>>>>> 2) or should we simply not promote it in the docs. As the docs are
>>>>>>>>>> currently written I am worried that users might get confused as to
>>>> how
>>>>>>>>>> match steps are supposed to be written.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Matthias
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>
>>>>
>>
>>