Hi,

When MatchStep advances and we can reverse traversal patterns as well as the 
MatchAlgorithm API advances to allows such things, then yes, we will leave 
has()-fragments in there.

Marko.

http://markorodriguez.com

On Jun 24, 2015, at 6:01 PM, Matthias Broecheler <[email protected]> wrote:

> But you wouldn't want to process the query like that. You would want to
> start with "y" because filtering by name is much more efficient. If you
> left the age constraint inside match step then a smart query optimizer can
> figure that out for you and do the right thing. If you pull it out then it
> cannot (unless you add another strategy to pull it back in ;-) because the
> filtering by age happens before the match step is executed.
> 
> What I mean is this: The pulling out is based on what "x" is. If you user
> chooses "x" in a suboptimal way then the query optimizer is screwed. Hence,
> leave everything inside match step so that the optimizer can make smart
> choices for the user.
> 
> 
> On Wed, Jun 24, 2015 at 4:58 PM Marko Rodriguez <[email protected]>
> wrote:
> 
>> Hi,
>> 
>> So is g.V. With g.V.has('age',lt(60)) you get some filtering before going
>> on to x.out(follows).
>> 
>> ?,
>> Marko.
>> 
>> http://markorodriguez.com
>> 
>> On Jun 24, 2015, at 5:52 PM, Matthias Broecheler <[email protected]> wrote:
>> 
>>> I think you misunderstood my example. Take another look -
>>> as('x').has('age',lt(60))
>>> has the same start label as the match label. The name constraint applies
>> to
>>> "y". This means, only the age constraint would be pulled out. And that
>>> would be bad because finding all vertices that have an age less than 60
>> can
>>> take forever.
>>> 
>>> On Wed, Jun 24, 2015 at 4:49 PM Marko Rodriguez <[email protected]>
>>> wrote:
>>> 
>>>> Hello,
>>>> 
>>>> Again, it only pulls it out if the match()-step if the has()-part start
>>>> label is the same as the match() start label. Moreover, if there is no
>>>> start label to match()-step, then then nothing is pulled out. In your
>> case,
>>>> the as('y').has('name','marko') can't "go first" as you need to bind "y"
>>>> first.
>>>> 
>>>> Now, if you had as("x").has("name","marko") as well as
>>>> as("x").has("age",lt(60)), both would be pulled out and thus, available
>> to
>>>> the vendor for index lookups as they please.
>>>> 
>>>> Marko.
>>>> 
>>>> http://markorodriguez.com
>>>> 
>>>> On Jun 24, 2015, at 5:32 PM, Matthias Broecheler <[email protected]>
>> wrote:
>>>> 
>>>>> Consider this example:
>>>>> 
>>>>> g.V.match("x", as('x').has('age',lt(60)), as('x').out('knows').as('y'),
>>>>> as('y').has('name','marko'))
>>>>> 
>>>>> In this case the age constraint would be pulled out if I understand
>>>>> correctly. But this constraint has very poor selectivity in particular
>>>>> compared to the has('name','marko') constraint on 'y'. So, the better
>> way
>>>>> to execute this match would be to start by retrieving all markos,
>> finding
>>>>> the people who know them and then filter those by age.
>>>>> However, that is not possible if you pull out the age constraint.
>>>>> 
>>>>> 
>>>>> On Wed, Jun 24, 2015 at 4:11 PM Marko Rodriguez <[email protected]>
>>>>> wrote:
>>>>> 
>>>>>> Hi Matthias,
>>>>>> 
>>>>>> So the has()-container "pulling" only happens if a startLabel is
>>>> provided
>>>>>> (i.e. match("x", as("x").has("name","matthias")). And in that case, I
>>>> can't
>>>>>> imagine it ever not being desired as if you leave it in MatchStep,
>> then
>>>> you
>>>>>> have one more pattern to order, keep runtime statistics on, cycle
>>>> through
>>>>>> for determine if a match has occurred, deduping on, and one more
>> pattern
>>>>>> label to add to each match, etc. By pulling out the has()-container,
>> you
>>>>>> can reduce the overhead in MatchStep. Finally, while I said it was
>> "for
>>>>>> vendor indexing," its really not just about that because if the vendor
>>>>>> can't use it for indexing, its still good to have it outside the
>> match()
>>>>>> for the stated reasons.
>>>>>> 
>>>>>> Hope that is clear,
>>>>>> Marko.
>>>>>> 
>>>>>> http://markorodriguez.com
>>>>>> 
>>>>>> On Jun 19, 2015, at 12:07 PM, Matthias Broecheler <[email protected]>
>>>>>> wrote:
>>>>>> 
>>>>>>> Hi Marko,
>>>>>>> 
>>>>>>> is it possible to disable pulling out the has-containers? For many
>>>>>> graphdb
>>>>>>> vendors it would make sense to leave the has containers in the match
>>>> step
>>>>>>> and then select those has containers that promise the highest
>>>> selectivity
>>>>>>> for index calls based on the index statistics. Since TP3 isn't aware
>> of
>>>>>>> indexes it could make such a call.
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Matthias
>>>>>>> 
>>>>>>> On Fri, Jun 19, 2015 at 10:42 AM Marko Rodriguez <
>> [email protected]
>>>>> 
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> Hi,
>>>>>>>> 
>>>>>>>> So, this morning I realized something neat about
>> MatchStep<->WhereStep
>>>>>>>> interplay.
>>>>>>>> 
>>>>>>>> First, MatchWhereStrategy is now called MatchPredicateStrategy as it
>>>> is
>>>>>>>> about moving predicates in and out of match().
>>>>>>>>     - where()s go in.
>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>> 
>> https://github.com/apache/incubator-tinkerpop/blob/2e3a25c318136b7f6c1aec5fae2c0c1b950fb3f9/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/strategy/optimization/MatchPredicateStrategy.java#L69
>>>>>>>>     - has() containers go out.
>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>> 
>> https://github.com/apache/incubator-tinkerpop/blob/2e3a25c318136b7f6c1aec5fae2c0c1b950fb3f9/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/strategy/optimization/MatchPredicateStrategy.java#L80
>>>>>>>> 
>>>>>>>> Next, the question about "predicate traversals" in MatchStep is
>> solved
>>>>>> by
>>>>>>>> simply saying:
>>>>>>>>     "If you want a predicate traversal, use a where()-clause in
>> your
>>>>>>>> pattern."
>>>>>>>> 
>>>>>>>> Thats it! Lets look at what I mean by that (Josh and Daniel will
>>>>>>>> understand the ramifications best).
>>>>>>>> 
>>>>>>>> gremlin> g.V().match('a',
>>>>>>>> __.as('a').out('created').as('b'),
>>>>>>>> __.as('a').repeat(out()).times(2))
>>>>>>>> ==>[a:v[1], b:v[3]]
>>>>>>>> ==>[a:v[1], b:v[3]]
>>>>>>>> 
>>>>>>>> The above match() returns duplicates. Why? Because the second
>> pattern
>>>>>>>> isn't binding, its just "checking" -- that is, it passes the
>> traverser
>>>>>>>> through and if that traverser splits, well, there are more
>> traversers
>>>>>>>> returned. In the original MatchStep, these were called "predicate
>>>>>>>> traversals" because they did not bind variables (i.e. no as() at the
>>>>>> end).
>>>>>>>> As such, their output didn't matter. However, in the new MatchStep,
>> I
>>>>>> can't
>>>>>>>> do that so easily given the OLAP constraint. However, if you want
>>>>>>>> "predicate traversal" behavior, use WhereStep!
>>>>>>>> 
>>>>>>>> g.V().match('a',
>>>>>>>> __.as('a').out('created').as('b'),
>>>>>>>> __.where(__.as('a').repeat(out()).times(2))
>>>>>>>> )
>>>>>>>> ==>[a:v[1], b:v[3]]
>>>>>>>> 
>>>>>>>> So, if you don't care about the result of a pattern, only if it
>>>>>>>> "hasNext()" (which is much faster than "iterate()"), then wrap it
>> in a
>>>>>>>> where() and there you go. Not only is this way more efficient as you
>>>> are
>>>>>>>> not generating traversers (i.e. results), you are also not creating
>>>>>>>> duplicate results (i.e. traversers with similar path histories).
>>>>>>>> 
>>>>>>>> Finally, note you can also do this for a nice look and feel:
>>>>>>>> 
>>>>>>>> g.V().match('a',
>>>>>>>> __.as('a').out('created').as('b'),
>>>>>>>> __.as('a').where(repeat(out()).times(2))
>>>>>>>> )
>>>>>>>> ==>[a:v[1], b:v[3]]
>>>>>>>> 
>>>>>>>> So whats the catch? Why not just wrap all match patterns without an
>>>>>>>> end-label step in where()? Two reasons:
>>>>>>>>     1. Semantics. MatchStep is set of traversals where the
>> traverser
>>>>>>>> is pushed into the traversals and when there are no more traversals
>> to
>>>>>>>> take, it goes to the next step. Its not a filter-step, its a
>> map-step.
>>>>>>>>     2. OLAP. WhereStep's internal traversal is a "local child" and
>>>>>>>> thus, can only compute as far as the local star graph in OLAP.
>>>>>> Typically,
>>>>>>>> any step that needs to know what happened at the end of an internal
>>>>>>>> traversal (filter or not) has to be locally bound. … this is the
>>>>>>>> fundamental difference between Gremlin OLAP and Gremlin OLTP.
>>>>>>>> 
>>>>>>>> Finally finally….the last big issue I was having was "not()" inside
>>>>>> Match.
>>>>>>>> Again, because MatchStep uses "global children", it can't know what
>>>>>>>> happened to the traverser once it enters a pattern. And steps like
>> NOT
>>>>>> need
>>>>>>>> to know if the traverser was filtered. Well, not() in where() works
>>>>>> great:
>>>>>>>> 
>>>>>>>> g.V().as('a').out('created').
>>>>>>>> where(__.in('created').count().is(gt(1))).values('name')
>>>>>>>> ==>lop
>>>>>>>> ==>lop
>>>>>>>> ==>lop
>>>>>>>> g.V().as('a').out('created').
>>>>>>>> where(__.not(__.in('created').count().is(gt(1)))).values('name') //
>>>> it
>>>>>>>> sucks that groovy requires not and in to have __.
>>>>>>>> ==>ripple
>>>>>>>> 
>>>>>>>> And guess what, if you want to NOT a pattern in match(), do it via
>>>>>> where()!
>>>>>>>> 
>>>>>>>> g.V().match('a',
>>>>>>>> __.as('a').out('created').as('b'),
>>>>>>>> __.as('b').where(__.in('created').count().is(gt(1)))).
>>>>>>>> select().by('name')
>>>>>>>> ==>[a:marko, b:lop]
>>>>>>>> ==>[a:josh, b:lop]
>>>>>>>> ==>[a:peter, b:lop]
>>>>>>>> g.V().match('a',
>>>>>>>> __.as('a').out('created').as('b'),
>>>>>>>> __.as('b').where(__.not(__.in('created').count().is(gt(1))))).
>>>>>>>> select().by('name')
>>>>>>>> ==>[a:josh, b:ripple]
>>>>>>>> 
>>>>>>>> And there we go. MatchPredicateStrategy can just throw where()-steps
>>>>>> into
>>>>>>>> MatchStep as is and the issue of "predicate traversals" is no longer
>>>> an
>>>>>>>> issue.
>>>>>>>> 
>>>>>>>> Thanks for reading,
>>>>>>>> Marko.
>>>>>>>> 
>>>>>>>> http://markorodriguez.com
>>>>>>>> 
>>>>>>>> On Jun 17, 2015, at 4:25 PM, Marko Rodriguez <[email protected]>
>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> Hello,
>>>>>>>>> 
>>>>>>>>> To extend on Kuppitz' comment -- Yes, MatchWhereStrategy folds in
>>>>>>>> where()-clauses. Note that with the recent work on XMatchStep (if we
>>>> go
>>>>>>>> with that for GA), where() clauses work natively in XMatchStep and
>> we
>>>>>> will
>>>>>>>> also just fold any "right handed" where()-clauses into match() as
>>>> well.
>>>>>>>>> 
>>>>>>>>> Marko.
>>>>>>>>> 
>>>>>>>>> http://markorodriguez.com
>>>>>>>>> 
>>>>>>>>> On Jun 17, 2015, at 3:04 PM, Daniel Kuppitz <[email protected]>
>> wrote:
>>>>>>>>> 
>>>>>>>>>> After actually looking into the docs, I decided to keep the
>> example,
>>>>>>>> since
>>>>>>>>>> the description explicitely states, that in such a case the
>> where()
>>>>>>>> clause
>>>>>>>>>> will automatically be folded into match():
>>>>>>>>>> 
>>>>>>>>>> The where()-step can take either a BiPredicate (first example
>> below)
>>>>>> or
>>>>>>>> a
>>>>>>>>>>> Traversal (second example below). Using MatchWhereStrategy,
>>>>>>>> where()-clauses
>>>>>>>>>>> can be automatically folded into match() and thus, subject to
>>>>>>>> match()-steps
>>>>>>>>>>> budget-match algorithm.
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> The sample then shows, that
>>>>>>>>>> 
>>>>>>>>>> g.V().match('a',
>>>>>>>>>> __.as('a').out('created').as('b'),
>>>>>>>>>> __.as('b').in('created').as('c')).
>>>>>>>>>> where(__.as('a').out('knows').as('c')).
>>>>>>>>>> select('a','c').by('name')
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> is - after the MatchWhereStrategy was applied (this is done
>>>>>>>> automatically)
>>>>>>>>>> - in fact the same thing as:
>>>>>>>>>> 
>>>>>>>>>> g.V().match('a',
>>>>>>>>>> __.as('a').out('created').as('b'),
>>>>>>>>>> __.as('a').out('knows').as('c'),
>>>>>>>>>> __.as('b').in('created').as('c')).
>>>>>>>>>> select('a','c').by('name')
>>>>>>>>>> 
>>>>>>>>>> ....
>>>>>>>>>> 
>>>>>>>>>> Cheers,
>>>>>>>>>> Daniel
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On Wed, Jun 17, 2015 at 10:47 PM, Daniel Kuppitz <[email protected]
>>> 
>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> You're right. It's actually a pretty good example for where(),
>> but
>>>>>> not
>>>>>>>>>>> for match()/where(). I will remove it and make sure that we have
>>>>>>>>>>> something similar in the where() sample section. Something like:
>>>>>>>>>>> 
>>>>>>>>>>> g.V().as("a").out("created").as("b").in("created").as("c").
>>>>>>>>>>> where(__as("a").out("knows").as("c")).select().by("name")
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Cheers,
>>>>>>>>>>> Daniel
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> On Wed, Jun 17, 2015 at 8:43 PM, Matthias Broecheler <
>>>>>> [email protected]
>>>>>>>>> 
>>>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> Hi guys,
>>>>>>>>>>>> 
>>>>>>>>>>>> looking at the second example in the following section of the
>>>> docs I
>>>>>>>>>>>> noticed a semantic overlap between match and where:
>>>>>>>>>>>> 
>>>>>> http://www.tinkerpop.com/docs/3.0.0-SNAPSHOT/#using-where-with-match
>>>>>>>>>>>> 
>>>>>>>>>>>> traversal = g.V().match('a', __.as('a').out('created').as('b'),
>>>>>>>> __.as('b'
>>>>>>>>>>>> ).in('created').as('c')).
>> where(__.as('a').out('knows').as('c')).
>>>>>>>>>>>> select('a'
>>>>>>>>>>>> ,'c').by('name');
>>>>>>>>>>>> 
>>>>>>>>>>>> The provided where clause could also have been folded into the
>>>>>> actual
>>>>>>>>>>>> traversal to yield the same result.
>>>>>>>>>>>> I wonder:
>>>>>>>>>>>> 1) Is there a way to avoid this ambiguity?
>>>>>>>>>>>> 2) or should we simply not promote it in the docs. As the docs
>> are
>>>>>>>>>>>> currently written I am worried that users might get confused as
>> to
>>>>>> how
>>>>>>>>>>>> match steps are supposed to be written.
>>>>>>>>>>>> 
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Matthias
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>>> 
>>>> 
>>>> 
>> 
>> 

Reply via email to