Hi,

So, this morning I realized something neat about MatchStep<->WhereStep 
interplay.

First, MatchWhereStrategy is now called MatchPredicateStrategy as it is about 
moving predicates in and out of match().
        - where()s go in.
                
https://github.com/apache/incubator-tinkerpop/blob/2e3a25c318136b7f6c1aec5fae2c0c1b950fb3f9/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/strategy/optimization/MatchPredicateStrategy.java#L69
        - has() containers go out.
                
https://github.com/apache/incubator-tinkerpop/blob/2e3a25c318136b7f6c1aec5fae2c0c1b950fb3f9/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/strategy/optimization/MatchPredicateStrategy.java#L80

Next, the question about "predicate traversals" in MatchStep is solved by 
simply saying:
        "If you want a predicate traversal, use a where()-clause in your 
pattern."

Thats it! Lets look at what I mean by that (Josh and Daniel will understand the 
ramifications best).

gremlin> g.V().match('a',
   __.as('a').out('created').as('b'),
   __.as('a').repeat(out()).times(2))
==>[a:v[1], b:v[3]]
==>[a:v[1], b:v[3]]

The above match() returns duplicates. Why? Because the second pattern isn't 
binding, its just "checking" -- that is, it passes the traverser through and if 
that traverser splits, well, there are more traversers returned. In the 
original MatchStep, these were called "predicate traversals" because they did 
not bind variables (i.e. no as() at the end). As such, their output didn't 
matter. However, in the new MatchStep, I can't do that so easily given the OLAP 
constraint. However, if you want "predicate traversal" behavior, use WhereStep!

g.V().match('a',
  __.as('a').out('created').as('b'),
  __.where(__.as('a').repeat(out()).times(2))
)
==>[a:v[1], b:v[3]]

So, if you don't care about the result of a pattern, only if it "hasNext()" 
(which is much faster than "iterate()"), then wrap it in a where() and there 
you go. Not only is this way more efficient as you are not generating 
traversers (i.e. results), you are also not creating duplicate results (i.e. 
traversers with similar path histories).

Finally, note you can also do this for a nice look and feel:

g.V().match('a',
  __.as('a').out('created').as('b'),
  __.as('a').where(repeat(out()).times(2))
)
==>[a:v[1], b:v[3]]

So whats the catch? Why not just wrap all match patterns without an end-label 
step in where()? Two reasons:
        1. Semantics. MatchStep is set of traversals where the traverser is 
pushed into the traversals and when there are no more traversals to take, it 
goes to the next step. Its not a filter-step, its a map-step.
        2. OLAP. WhereStep's internal traversal is a "local child" and thus, 
can only compute as far as the local star graph in OLAP. Typically, any step 
that needs to know what happened at the end of an internal traversal (filter or 
not) has to be locally bound. … this is the fundamental difference between 
Gremlin OLAP and Gremlin OLTP.

Finally finally….the last big issue I was having was "not()" inside Match. 
Again, because MatchStep uses "global children", it can't know what happened to 
the traverser once it enters a pattern. And steps like NOT need to know if the 
traverser was filtered. Well, not() in where() works great:

g.V().as('a').out('created').
   where(__.in('created').count().is(gt(1))).values('name')
==>lop
==>lop
==>lop
g.V().as('a').out('created').
   where(__.not(__.in('created').count().is(gt(1)))).values('name') // it sucks 
that groovy requires not and in to have __.
==>ripple

And guess what, if you want to NOT a pattern in match(), do it via where()!

g.V().match('a',
  __.as('a').out('created').as('b'),
  __.as('b').where(__.in('created').count().is(gt(1)))).
    select().by('name')
==>[a:marko, b:lop]
==>[a:josh, b:lop]
==>[a:peter, b:lop]
g.V().match('a',
  __.as('a').out('created').as('b'),
  __.as('b').where(__.not(__.in('created').count().is(gt(1))))).
    select().by('name')
==>[a:josh, b:ripple]

And there we go. MatchPredicateStrategy can just throw where()-steps into 
MatchStep as is and the issue of "predicate traversals" is no longer an issue.

Thanks for reading,
Marko.

http://markorodriguez.com

On Jun 17, 2015, at 4:25 PM, Marko Rodriguez <[email protected]> wrote:

> Hello,
> 
> To extend on Kuppitz' comment -- Yes, MatchWhereStrategy folds in 
> where()-clauses. Note that with the recent work on XMatchStep (if we go with 
> that for GA), where() clauses work natively in XMatchStep and we will also 
> just fold any "right handed" where()-clauses into match() as well.
> 
> Marko.
> 
> http://markorodriguez.com
> 
> On Jun 17, 2015, at 3:04 PM, Daniel Kuppitz <[email protected]> wrote:
> 
>> After actually looking into the docs, I decided to keep the example, since
>> the description explicitely states, that in such a case the where() clause
>> will automatically be folded into match():
>> 
>> The where()-step can take either a BiPredicate (first example below) or a
>>> Traversal (second example below). Using MatchWhereStrategy, where()-clauses
>>> can be automatically folded into match() and thus, subject to match()-steps
>>> budget-match algorithm.
>>> 
>> 
>> The sample then shows, that
>> 
>> g.V().match('a',
>>    __.as('a').out('created').as('b'),
>>    __.as('b').in('created').as('c')).
>>  where(__.as('a').out('knows').as('c')).
>>  select('a','c').by('name')
>> 
>> 
>> is - after the MatchWhereStrategy was applied (this is done automatically)
>> - in fact the same thing as:
>> 
>> g.V().match('a',
>>    __.as('a').out('created').as('b'),
>>    __.as('a').out('knows').as('c'),
>>    __.as('b').in('created').as('c')).
>>  select('a','c').by('name')
>> 
>> ....
>> 
>> Cheers,
>> Daniel
>> 
>> 
>> On Wed, Jun 17, 2015 at 10:47 PM, Daniel Kuppitz <[email protected]> wrote:
>> 
>>> You're right. It's actually a pretty good example for where(), but not
>>> for match()/where(). I will remove it and make sure that we have
>>> something similar in the where() sample section. Something like:
>>> 
>>> g.V().as("a").out("created").as("b").in("created").as("c").
>>>    where(__as("a").out("knows").as("c")).select().by("name")
>>> 
>>> 
>>> Cheers,
>>> Daniel
>>> 
>>> 
>>> On Wed, Jun 17, 2015 at 8:43 PM, Matthias Broecheler <[email protected]>
>>> wrote:
>>> 
>>>> Hi guys,
>>>> 
>>>> looking at the second example in the following section of the docs I
>>>> noticed a semantic overlap between match and where:
>>>> http://www.tinkerpop.com/docs/3.0.0-SNAPSHOT/#using-where-with-match
>>>> 
>>>> traversal = g.V().match('a', __.as('a').out('created').as('b'), __.as('b'
>>>> ).in('created').as('c')). where(__.as('a').out('knows').as('c')).
>>>> select('a'
>>>> ,'c').by('name');
>>>> 
>>>> The provided where clause could also have been folded into the actual
>>>> traversal to yield the same result.
>>>> I wonder:
>>>> 1) Is there a way to avoid this ambiguity?
>>>> 2) or should we simply not promote it in the docs. As the docs are
>>>> currently written I am worried that users might get confused as to how
>>>> match steps are supposed to be written.
>>>> 
>>>> Thanks,
>>>> Matthias
>>>> 
>>> 
>>> 
> 

Reply via email to