Hi,

I made it so that the match() API is simply match(Traversal…). There is no more 
"start label."

There were two reason to have start label:
        1. If the steps prior to the match() were some long complicated 
traversal, what did does the traverser go in as?
        2. I didn't have a dependency tree algorithm that determined what the 
legal start labels were for the full pattern.

For the first situation, this is solved by forcing the user to label the 
incoming traverser if they want it to go to a predestined variable location.
        g.V.out…out.as('a').match(as('a')…)

For the second, I wrote the following method:
        
https://github.com/apache/incubator-tinkerpop/blob/master/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/MatchStep.java#L541-L566
…which is used if the incoming traverser does NOT have a path history that has 
labels that match any of the start labels of the match() patterns.

Given that 1 and 2 are solved, we no longer need the 
match(startLabel,traversal…) method, all we have now is match(traversal…).

Docs have been updated accordingly:
        http://tinkerpop.incubator.apache.org/docs/3.0.0-SNAPSHOT/#match-step

BTW: Did you know that Daniel and I also added DISTINCT capabilities to 
MatchStep? We extended the behavior of DedupGlobalStep to allow for scoping 
variables. In other words, dedup() will dedup the current object, but 
dedup('a','b','c') will dedup the history value collection. Check out the last 
code block here:
        http://tinkerpop.incubator.apache.org/docs/3.0.0-SNAPSHOT/#dedup-step
The MatchPredicateStrategy is smart about folding DeduptGlobalStep into 
MatchStep so that while in MatchStep we can dedup and thus, reduce the search 
space of the match patterns.

Take care,
Marko.

http://markorodriguez.com

On Jun 24, 2015, at 6:06 PM, Marko Rodriguez <[email protected]> wrote:

> Hi,
> 
> When MatchStep advances and we can reverse traversal patterns as well as the 
> MatchAlgorithm API advances to allows such things, then yes, we will leave 
> has()-fragments in there.
> 
> Marko.
> 
> http://markorodriguez.com
> 
> On Jun 24, 2015, at 6:01 PM, Matthias Broecheler <[email protected]> wrote:
> 
>> But you wouldn't want to process the query like that. You would want to
>> start with "y" because filtering by name is much more efficient. If you
>> left the age constraint inside match step then a smart query optimizer can
>> figure that out for you and do the right thing. If you pull it out then it
>> cannot (unless you add another strategy to pull it back in ;-) because the
>> filtering by age happens before the match step is executed.
>> 
>> What I mean is this: The pulling out is based on what "x" is. If you user
>> chooses "x" in a suboptimal way then the query optimizer is screwed. Hence,
>> leave everything inside match step so that the optimizer can make smart
>> choices for the user.
>> 
>> 
>> On Wed, Jun 24, 2015 at 4:58 PM Marko Rodriguez <[email protected]>
>> wrote:
>> 
>>> Hi,
>>> 
>>> So is g.V. With g.V.has('age',lt(60)) you get some filtering before going
>>> on to x.out(follows).
>>> 
>>> ?,
>>> Marko.
>>> 
>>> http://markorodriguez.com
>>> 
>>> On Jun 24, 2015, at 5:52 PM, Matthias Broecheler <[email protected]> wrote:
>>> 
>>>> I think you misunderstood my example. Take another look -
>>>> as('x').has('age',lt(60))
>>>> has the same start label as the match label. The name constraint applies
>>> to
>>>> "y". This means, only the age constraint would be pulled out. And that
>>>> would be bad because finding all vertices that have an age less than 60
>>> can
>>>> take forever.
>>>> 
>>>> On Wed, Jun 24, 2015 at 4:49 PM Marko Rodriguez <[email protected]>
>>>> wrote:
>>>> 
>>>>> Hello,
>>>>> 
>>>>> Again, it only pulls it out if the match()-step if the has()-part start
>>>>> label is the same as the match() start label. Moreover, if there is no
>>>>> start label to match()-step, then then nothing is pulled out. In your
>>> case,
>>>>> the as('y').has('name','marko') can't "go first" as you need to bind "y"
>>>>> first.
>>>>> 
>>>>> Now, if you had as("x").has("name","marko") as well as
>>>>> as("x").has("age",lt(60)), both would be pulled out and thus, available
>>> to
>>>>> the vendor for index lookups as they please.
>>>>> 
>>>>> Marko.
>>>>> 
>>>>> http://markorodriguez.com
>>>>> 
>>>>> On Jun 24, 2015, at 5:32 PM, Matthias Broecheler <[email protected]>
>>> wrote:
>>>>> 
>>>>>> Consider this example:
>>>>>> 
>>>>>> g.V.match("x", as('x').has('age',lt(60)), as('x').out('knows').as('y'),
>>>>>> as('y').has('name','marko'))
>>>>>> 
>>>>>> In this case the age constraint would be pulled out if I understand
>>>>>> correctly. But this constraint has very poor selectivity in particular
>>>>>> compared to the has('name','marko') constraint on 'y'. So, the better
>>> way
>>>>>> to execute this match would be to start by retrieving all markos,
>>> finding
>>>>>> the people who know them and then filter those by age.
>>>>>> However, that is not possible if you pull out the age constraint.
>>>>>> 
>>>>>> 
>>>>>> On Wed, Jun 24, 2015 at 4:11 PM Marko Rodriguez <[email protected]>
>>>>>> wrote:
>>>>>> 
>>>>>>> Hi Matthias,
>>>>>>> 
>>>>>>> So the has()-container "pulling" only happens if a startLabel is
>>>>> provided
>>>>>>> (i.e. match("x", as("x").has("name","matthias")). And in that case, I
>>>>> can't
>>>>>>> imagine it ever not being desired as if you leave it in MatchStep,
>>> then
>>>>> you
>>>>>>> have one more pattern to order, keep runtime statistics on, cycle
>>>>> through
>>>>>>> for determine if a match has occurred, deduping on, and one more
>>> pattern
>>>>>>> label to add to each match, etc. By pulling out the has()-container,
>>> you
>>>>>>> can reduce the overhead in MatchStep. Finally, while I said it was
>>> "for
>>>>>>> vendor indexing," its really not just about that because if the vendor
>>>>>>> can't use it for indexing, its still good to have it outside the
>>> match()
>>>>>>> for the stated reasons.
>>>>>>> 
>>>>>>> Hope that is clear,
>>>>>>> Marko.
>>>>>>> 
>>>>>>> http://markorodriguez.com
>>>>>>> 
>>>>>>> On Jun 19, 2015, at 12:07 PM, Matthias Broecheler <[email protected]>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> Hi Marko,
>>>>>>>> 
>>>>>>>> is it possible to disable pulling out the has-containers? For many
>>>>>>> graphdb
>>>>>>>> vendors it would make sense to leave the has containers in the match
>>>>> step
>>>>>>>> and then select those has containers that promise the highest
>>>>> selectivity
>>>>>>>> for index calls based on the index statistics. Since TP3 isn't aware
>>> of
>>>>>>>> indexes it could make such a call.
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> Matthias
>>>>>>>> 
>>>>>>>> On Fri, Jun 19, 2015 at 10:42 AM Marko Rodriguez <
>>> [email protected]
>>>>>> 
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> Hi,
>>>>>>>>> 
>>>>>>>>> So, this morning I realized something neat about
>>> MatchStep<->WhereStep
>>>>>>>>> interplay.
>>>>>>>>> 
>>>>>>>>> First, MatchWhereStrategy is now called MatchPredicateStrategy as it
>>>>> is
>>>>>>>>> about moving predicates in and out of match().
>>>>>>>>>     - where()s go in.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>> 
>>> https://github.com/apache/incubator-tinkerpop/blob/2e3a25c318136b7f6c1aec5fae2c0c1b950fb3f9/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/strategy/optimization/MatchPredicateStrategy.java#L69
>>>>>>>>>     - has() containers go out.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>> 
>>> https://github.com/apache/incubator-tinkerpop/blob/2e3a25c318136b7f6c1aec5fae2c0c1b950fb3f9/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/strategy/optimization/MatchPredicateStrategy.java#L80
>>>>>>>>> 
>>>>>>>>> Next, the question about "predicate traversals" in MatchStep is
>>> solved
>>>>>>> by
>>>>>>>>> simply saying:
>>>>>>>>>     "If you want a predicate traversal, use a where()-clause in
>>> your
>>>>>>>>> pattern."
>>>>>>>>> 
>>>>>>>>> Thats it! Lets look at what I mean by that (Josh and Daniel will
>>>>>>>>> understand the ramifications best).
>>>>>>>>> 
>>>>>>>>> gremlin> g.V().match('a',
>>>>>>>>> __.as('a').out('created').as('b'),
>>>>>>>>> __.as('a').repeat(out()).times(2))
>>>>>>>>> ==>[a:v[1], b:v[3]]
>>>>>>>>> ==>[a:v[1], b:v[3]]
>>>>>>>>> 
>>>>>>>>> The above match() returns duplicates. Why? Because the second
>>> pattern
>>>>>>>>> isn't binding, its just "checking" -- that is, it passes the
>>> traverser
>>>>>>>>> through and if that traverser splits, well, there are more
>>> traversers
>>>>>>>>> returned. In the original MatchStep, these were called "predicate
>>>>>>>>> traversals" because they did not bind variables (i.e. no as() at the
>>>>>>> end).
>>>>>>>>> As such, their output didn't matter. However, in the new MatchStep,
>>> I
>>>>>>> can't
>>>>>>>>> do that so easily given the OLAP constraint. However, if you want
>>>>>>>>> "predicate traversal" behavior, use WhereStep!
>>>>>>>>> 
>>>>>>>>> g.V().match('a',
>>>>>>>>> __.as('a').out('created').as('b'),
>>>>>>>>> __.where(__.as('a').repeat(out()).times(2))
>>>>>>>>> )
>>>>>>>>> ==>[a:v[1], b:v[3]]
>>>>>>>>> 
>>>>>>>>> So, if you don't care about the result of a pattern, only if it
>>>>>>>>> "hasNext()" (which is much faster than "iterate()"), then wrap it
>>> in a
>>>>>>>>> where() and there you go. Not only is this way more efficient as you
>>>>> are
>>>>>>>>> not generating traversers (i.e. results), you are also not creating
>>>>>>>>> duplicate results (i.e. traversers with similar path histories).
>>>>>>>>> 
>>>>>>>>> Finally, note you can also do this for a nice look and feel:
>>>>>>>>> 
>>>>>>>>> g.V().match('a',
>>>>>>>>> __.as('a').out('created').as('b'),
>>>>>>>>> __.as('a').where(repeat(out()).times(2))
>>>>>>>>> )
>>>>>>>>> ==>[a:v[1], b:v[3]]
>>>>>>>>> 
>>>>>>>>> So whats the catch? Why not just wrap all match patterns without an
>>>>>>>>> end-label step in where()? Two reasons:
>>>>>>>>>     1. Semantics. MatchStep is set of traversals where the
>>> traverser
>>>>>>>>> is pushed into the traversals and when there are no more traversals
>>> to
>>>>>>>>> take, it goes to the next step. Its not a filter-step, its a
>>> map-step.
>>>>>>>>>     2. OLAP. WhereStep's internal traversal is a "local child" and
>>>>>>>>> thus, can only compute as far as the local star graph in OLAP.
>>>>>>> Typically,
>>>>>>>>> any step that needs to know what happened at the end of an internal
>>>>>>>>> traversal (filter or not) has to be locally bound. … this is the
>>>>>>>>> fundamental difference between Gremlin OLAP and Gremlin OLTP.
>>>>>>>>> 
>>>>>>>>> Finally finally….the last big issue I was having was "not()" inside
>>>>>>> Match.
>>>>>>>>> Again, because MatchStep uses "global children", it can't know what
>>>>>>>>> happened to the traverser once it enters a pattern. And steps like
>>> NOT
>>>>>>> need
>>>>>>>>> to know if the traverser was filtered. Well, not() in where() works
>>>>>>> great:
>>>>>>>>> 
>>>>>>>>> g.V().as('a').out('created').
>>>>>>>>> where(__.in('created').count().is(gt(1))).values('name')
>>>>>>>>> ==>lop
>>>>>>>>> ==>lop
>>>>>>>>> ==>lop
>>>>>>>>> g.V().as('a').out('created').
>>>>>>>>> where(__.not(__.in('created').count().is(gt(1)))).values('name') //
>>>>> it
>>>>>>>>> sucks that groovy requires not and in to have __.
>>>>>>>>> ==>ripple
>>>>>>>>> 
>>>>>>>>> And guess what, if you want to NOT a pattern in match(), do it via
>>>>>>> where()!
>>>>>>>>> 
>>>>>>>>> g.V().match('a',
>>>>>>>>> __.as('a').out('created').as('b'),
>>>>>>>>> __.as('b').where(__.in('created').count().is(gt(1)))).
>>>>>>>>> select().by('name')
>>>>>>>>> ==>[a:marko, b:lop]
>>>>>>>>> ==>[a:josh, b:lop]
>>>>>>>>> ==>[a:peter, b:lop]
>>>>>>>>> g.V().match('a',
>>>>>>>>> __.as('a').out('created').as('b'),
>>>>>>>>> __.as('b').where(__.not(__.in('created').count().is(gt(1))))).
>>>>>>>>> select().by('name')
>>>>>>>>> ==>[a:josh, b:ripple]
>>>>>>>>> 
>>>>>>>>> And there we go. MatchPredicateStrategy can just throw where()-steps
>>>>>>> into
>>>>>>>>> MatchStep as is and the issue of "predicate traversals" is no longer
>>>>> an
>>>>>>>>> issue.
>>>>>>>>> 
>>>>>>>>> Thanks for reading,
>>>>>>>>> Marko.
>>>>>>>>> 
>>>>>>>>> http://markorodriguez.com
>>>>>>>>> 
>>>>>>>>> On Jun 17, 2015, at 4:25 PM, Marko Rodriguez <[email protected]>
>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> Hello,
>>>>>>>>>> 
>>>>>>>>>> To extend on Kuppitz' comment -- Yes, MatchWhereStrategy folds in
>>>>>>>>> where()-clauses. Note that with the recent work on XMatchStep (if we
>>>>> go
>>>>>>>>> with that for GA), where() clauses work natively in XMatchStep and
>>> we
>>>>>>> will
>>>>>>>>> also just fold any "right handed" where()-clauses into match() as
>>>>> well.
>>>>>>>>>> 
>>>>>>>>>> Marko.
>>>>>>>>>> 
>>>>>>>>>> http://markorodriguez.com
>>>>>>>>>> 
>>>>>>>>>> On Jun 17, 2015, at 3:04 PM, Daniel Kuppitz <[email protected]>
>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> After actually looking into the docs, I decided to keep the
>>> example,
>>>>>>>>> since
>>>>>>>>>>> the description explicitely states, that in such a case the
>>> where()
>>>>>>>>> clause
>>>>>>>>>>> will automatically be folded into match():
>>>>>>>>>>> 
>>>>>>>>>>> The where()-step can take either a BiPredicate (first example
>>> below)
>>>>>>> or
>>>>>>>>> a
>>>>>>>>>>>> Traversal (second example below). Using MatchWhereStrategy,
>>>>>>>>> where()-clauses
>>>>>>>>>>>> can be automatically folded into match() and thus, subject to
>>>>>>>>> match()-steps
>>>>>>>>>>>> budget-match algorithm.
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> The sample then shows, that
>>>>>>>>>>> 
>>>>>>>>>>> g.V().match('a',
>>>>>>>>>>> __.as('a').out('created').as('b'),
>>>>>>>>>>> __.as('b').in('created').as('c')).
>>>>>>>>>>> where(__.as('a').out('knows').as('c')).
>>>>>>>>>>> select('a','c').by('name')
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> is - after the MatchWhereStrategy was applied (this is done
>>>>>>>>> automatically)
>>>>>>>>>>> - in fact the same thing as:
>>>>>>>>>>> 
>>>>>>>>>>> g.V().match('a',
>>>>>>>>>>> __.as('a').out('created').as('b'),
>>>>>>>>>>> __.as('a').out('knows').as('c'),
>>>>>>>>>>> __.as('b').in('created').as('c')).
>>>>>>>>>>> select('a','c').by('name')
>>>>>>>>>>> 
>>>>>>>>>>> ....
>>>>>>>>>>> 
>>>>>>>>>>> Cheers,
>>>>>>>>>>> Daniel
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> On Wed, Jun 17, 2015 at 10:47 PM, Daniel Kuppitz <[email protected]
>>>> 
>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> You're right. It's actually a pretty good example for where(),
>>> but
>>>>>>> not
>>>>>>>>>>>> for match()/where(). I will remove it and make sure that we have
>>>>>>>>>>>> something similar in the where() sample section. Something like:
>>>>>>>>>>>> 
>>>>>>>>>>>> g.V().as("a").out("created").as("b").in("created").as("c").
>>>>>>>>>>>> where(__as("a").out("knows").as("c")).select().by("name")
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> Cheers,
>>>>>>>>>>>> Daniel
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> On Wed, Jun 17, 2015 at 8:43 PM, Matthias Broecheler <
>>>>>>> [email protected]
>>>>>>>>>> 
>>>>>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> Hi guys,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> looking at the second example in the following section of the
>>>>> docs I
>>>>>>>>>>>>> noticed a semantic overlap between match and where:
>>>>>>>>>>>>> 
>>>>>>> http://www.tinkerpop.com/docs/3.0.0-SNAPSHOT/#using-where-with-match
>>>>>>>>>>>>> 
>>>>>>>>>>>>> traversal = g.V().match('a', __.as('a').out('created').as('b'),
>>>>>>>>> __.as('b'
>>>>>>>>>>>>> ).in('created').as('c')).
>>> where(__.as('a').out('knows').as('c')).
>>>>>>>>>>>>> select('a'
>>>>>>>>>>>>> ,'c').by('name');
>>>>>>>>>>>>> 
>>>>>>>>>>>>> The provided where clause could also have been folded into the
>>>>>>> actual
>>>>>>>>>>>>> traversal to yield the same result.
>>>>>>>>>>>>> I wonder:
>>>>>>>>>>>>> 1) Is there a way to avoid this ambiguity?
>>>>>>>>>>>>> 2) or should we simply not promote it in the docs. As the docs
>>> are
>>>>>>>>>>>>> currently written I am worried that users might get confused as
>>> to
>>>>>>> how
>>>>>>>>>>>>> match steps are supposed to be written.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Matthias
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>> 
>>>>> 
>>> 
>>> 
> 

Reply via email to