Hi, When MatchStep advances and we can reverse traversal patterns as well as the MatchAlgorithm API advances to allows such things, then yes, we will leave has()-fragments in there.
Marko. http://markorodriguez.com On Jun 24, 2015, at 6:01 PM, Matthias Broecheler <[email protected]> wrote: > But you wouldn't want to process the query like that. You would want to > start with "y" because filtering by name is much more efficient. If you > left the age constraint inside match step then a smart query optimizer can > figure that out for you and do the right thing. If you pull it out then it > cannot (unless you add another strategy to pull it back in ;-) because the > filtering by age happens before the match step is executed. > > What I mean is this: The pulling out is based on what "x" is. If you user > chooses "x" in a suboptimal way then the query optimizer is screwed. Hence, > leave everything inside match step so that the optimizer can make smart > choices for the user. > > > On Wed, Jun 24, 2015 at 4:58 PM Marko Rodriguez <[email protected]> > wrote: > >> Hi, >> >> So is g.V. With g.V.has('age',lt(60)) you get some filtering before going >> on to x.out(follows). >> >> ?, >> Marko. >> >> http://markorodriguez.com >> >> On Jun 24, 2015, at 5:52 PM, Matthias Broecheler <[email protected]> wrote: >> >>> I think you misunderstood my example. Take another look - >>> as('x').has('age',lt(60)) >>> has the same start label as the match label. The name constraint applies >> to >>> "y". This means, only the age constraint would be pulled out. And that >>> would be bad because finding all vertices that have an age less than 60 >> can >>> take forever. >>> >>> On Wed, Jun 24, 2015 at 4:49 PM Marko Rodriguez <[email protected]> >>> wrote: >>> >>>> Hello, >>>> >>>> Again, it only pulls it out if the match()-step if the has()-part start >>>> label is the same as the match() start label. Moreover, if there is no >>>> start label to match()-step, then then nothing is pulled out. In your >> case, >>>> the as('y').has('name','marko') can't "go first" as you need to bind "y" >>>> first. >>>> >>>> Now, if you had as("x").has("name","marko") as well as >>>> as("x").has("age",lt(60)), both would be pulled out and thus, available >> to >>>> the vendor for index lookups as they please. >>>> >>>> Marko. >>>> >>>> http://markorodriguez.com >>>> >>>> On Jun 24, 2015, at 5:32 PM, Matthias Broecheler <[email protected]> >> wrote: >>>> >>>>> Consider this example: >>>>> >>>>> g.V.match("x", as('x').has('age',lt(60)), as('x').out('knows').as('y'), >>>>> as('y').has('name','marko')) >>>>> >>>>> In this case the age constraint would be pulled out if I understand >>>>> correctly. But this constraint has very poor selectivity in particular >>>>> compared to the has('name','marko') constraint on 'y'. So, the better >> way >>>>> to execute this match would be to start by retrieving all markos, >> finding >>>>> the people who know them and then filter those by age. >>>>> However, that is not possible if you pull out the age constraint. >>>>> >>>>> >>>>> On Wed, Jun 24, 2015 at 4:11 PM Marko Rodriguez <[email protected]> >>>>> wrote: >>>>> >>>>>> Hi Matthias, >>>>>> >>>>>> So the has()-container "pulling" only happens if a startLabel is >>>> provided >>>>>> (i.e. match("x", as("x").has("name","matthias")). And in that case, I >>>> can't >>>>>> imagine it ever not being desired as if you leave it in MatchStep, >> then >>>> you >>>>>> have one more pattern to order, keep runtime statistics on, cycle >>>> through >>>>>> for determine if a match has occurred, deduping on, and one more >> pattern >>>>>> label to add to each match, etc. By pulling out the has()-container, >> you >>>>>> can reduce the overhead in MatchStep. Finally, while I said it was >> "for >>>>>> vendor indexing," its really not just about that because if the vendor >>>>>> can't use it for indexing, its still good to have it outside the >> match() >>>>>> for the stated reasons. >>>>>> >>>>>> Hope that is clear, >>>>>> Marko. >>>>>> >>>>>> http://markorodriguez.com >>>>>> >>>>>> On Jun 19, 2015, at 12:07 PM, Matthias Broecheler <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Hi Marko, >>>>>>> >>>>>>> is it possible to disable pulling out the has-containers? For many >>>>>> graphdb >>>>>>> vendors it would make sense to leave the has containers in the match >>>> step >>>>>>> and then select those has containers that promise the highest >>>> selectivity >>>>>>> for index calls based on the index statistics. Since TP3 isn't aware >> of >>>>>>> indexes it could make such a call. >>>>>>> >>>>>>> Thanks, >>>>>>> Matthias >>>>>>> >>>>>>> On Fri, Jun 19, 2015 at 10:42 AM Marko Rodriguez < >> [email protected] >>>>> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> So, this morning I realized something neat about >> MatchStep<->WhereStep >>>>>>>> interplay. >>>>>>>> >>>>>>>> First, MatchWhereStrategy is now called MatchPredicateStrategy as it >>>> is >>>>>>>> about moving predicates in and out of match(). >>>>>>>> - where()s go in. >>>>>>>> >>>>>>>> >>>>>> >>>> >> https://github.com/apache/incubator-tinkerpop/blob/2e3a25c318136b7f6c1aec5fae2c0c1b950fb3f9/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/strategy/optimization/MatchPredicateStrategy.java#L69 >>>>>>>> - has() containers go out. >>>>>>>> >>>>>>>> >>>>>> >>>> >> https://github.com/apache/incubator-tinkerpop/blob/2e3a25c318136b7f6c1aec5fae2c0c1b950fb3f9/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/strategy/optimization/MatchPredicateStrategy.java#L80 >>>>>>>> >>>>>>>> Next, the question about "predicate traversals" in MatchStep is >> solved >>>>>> by >>>>>>>> simply saying: >>>>>>>> "If you want a predicate traversal, use a where()-clause in >> your >>>>>>>> pattern." >>>>>>>> >>>>>>>> Thats it! Lets look at what I mean by that (Josh and Daniel will >>>>>>>> understand the ramifications best). >>>>>>>> >>>>>>>> gremlin> g.V().match('a', >>>>>>>> __.as('a').out('created').as('b'), >>>>>>>> __.as('a').repeat(out()).times(2)) >>>>>>>> ==>[a:v[1], b:v[3]] >>>>>>>> ==>[a:v[1], b:v[3]] >>>>>>>> >>>>>>>> The above match() returns duplicates. Why? Because the second >> pattern >>>>>>>> isn't binding, its just "checking" -- that is, it passes the >> traverser >>>>>>>> through and if that traverser splits, well, there are more >> traversers >>>>>>>> returned. In the original MatchStep, these were called "predicate >>>>>>>> traversals" because they did not bind variables (i.e. no as() at the >>>>>> end). >>>>>>>> As such, their output didn't matter. However, in the new MatchStep, >> I >>>>>> can't >>>>>>>> do that so easily given the OLAP constraint. However, if you want >>>>>>>> "predicate traversal" behavior, use WhereStep! >>>>>>>> >>>>>>>> g.V().match('a', >>>>>>>> __.as('a').out('created').as('b'), >>>>>>>> __.where(__.as('a').repeat(out()).times(2)) >>>>>>>> ) >>>>>>>> ==>[a:v[1], b:v[3]] >>>>>>>> >>>>>>>> So, if you don't care about the result of a pattern, only if it >>>>>>>> "hasNext()" (which is much faster than "iterate()"), then wrap it >> in a >>>>>>>> where() and there you go. Not only is this way more efficient as you >>>> are >>>>>>>> not generating traversers (i.e. results), you are also not creating >>>>>>>> duplicate results (i.e. traversers with similar path histories). >>>>>>>> >>>>>>>> Finally, note you can also do this for a nice look and feel: >>>>>>>> >>>>>>>> g.V().match('a', >>>>>>>> __.as('a').out('created').as('b'), >>>>>>>> __.as('a').where(repeat(out()).times(2)) >>>>>>>> ) >>>>>>>> ==>[a:v[1], b:v[3]] >>>>>>>> >>>>>>>> So whats the catch? Why not just wrap all match patterns without an >>>>>>>> end-label step in where()? Two reasons: >>>>>>>> 1. Semantics. MatchStep is set of traversals where the >> traverser >>>>>>>> is pushed into the traversals and when there are no more traversals >> to >>>>>>>> take, it goes to the next step. Its not a filter-step, its a >> map-step. >>>>>>>> 2. OLAP. WhereStep's internal traversal is a "local child" and >>>>>>>> thus, can only compute as far as the local star graph in OLAP. >>>>>> Typically, >>>>>>>> any step that needs to know what happened at the end of an internal >>>>>>>> traversal (filter or not) has to be locally bound. … this is the >>>>>>>> fundamental difference between Gremlin OLAP and Gremlin OLTP. >>>>>>>> >>>>>>>> Finally finally….the last big issue I was having was "not()" inside >>>>>> Match. >>>>>>>> Again, because MatchStep uses "global children", it can't know what >>>>>>>> happened to the traverser once it enters a pattern. And steps like >> NOT >>>>>> need >>>>>>>> to know if the traverser was filtered. Well, not() in where() works >>>>>> great: >>>>>>>> >>>>>>>> g.V().as('a').out('created'). >>>>>>>> where(__.in('created').count().is(gt(1))).values('name') >>>>>>>> ==>lop >>>>>>>> ==>lop >>>>>>>> ==>lop >>>>>>>> g.V().as('a').out('created'). >>>>>>>> where(__.not(__.in('created').count().is(gt(1)))).values('name') // >>>> it >>>>>>>> sucks that groovy requires not and in to have __. >>>>>>>> ==>ripple >>>>>>>> >>>>>>>> And guess what, if you want to NOT a pattern in match(), do it via >>>>>> where()! >>>>>>>> >>>>>>>> g.V().match('a', >>>>>>>> __.as('a').out('created').as('b'), >>>>>>>> __.as('b').where(__.in('created').count().is(gt(1)))). >>>>>>>> select().by('name') >>>>>>>> ==>[a:marko, b:lop] >>>>>>>> ==>[a:josh, b:lop] >>>>>>>> ==>[a:peter, b:lop] >>>>>>>> g.V().match('a', >>>>>>>> __.as('a').out('created').as('b'), >>>>>>>> __.as('b').where(__.not(__.in('created').count().is(gt(1))))). >>>>>>>> select().by('name') >>>>>>>> ==>[a:josh, b:ripple] >>>>>>>> >>>>>>>> And there we go. MatchPredicateStrategy can just throw where()-steps >>>>>> into >>>>>>>> MatchStep as is and the issue of "predicate traversals" is no longer >>>> an >>>>>>>> issue. >>>>>>>> >>>>>>>> Thanks for reading, >>>>>>>> Marko. >>>>>>>> >>>>>>>> http://markorodriguez.com >>>>>>>> >>>>>>>> On Jun 17, 2015, at 4:25 PM, Marko Rodriguez <[email protected]> >>>>>> wrote: >>>>>>>> >>>>>>>>> Hello, >>>>>>>>> >>>>>>>>> To extend on Kuppitz' comment -- Yes, MatchWhereStrategy folds in >>>>>>>> where()-clauses. Note that with the recent work on XMatchStep (if we >>>> go >>>>>>>> with that for GA), where() clauses work natively in XMatchStep and >> we >>>>>> will >>>>>>>> also just fold any "right handed" where()-clauses into match() as >>>> well. >>>>>>>>> >>>>>>>>> Marko. >>>>>>>>> >>>>>>>>> http://markorodriguez.com >>>>>>>>> >>>>>>>>> On Jun 17, 2015, at 3:04 PM, Daniel Kuppitz <[email protected]> >> wrote: >>>>>>>>> >>>>>>>>>> After actually looking into the docs, I decided to keep the >> example, >>>>>>>> since >>>>>>>>>> the description explicitely states, that in such a case the >> where() >>>>>>>> clause >>>>>>>>>> will automatically be folded into match(): >>>>>>>>>> >>>>>>>>>> The where()-step can take either a BiPredicate (first example >> below) >>>>>> or >>>>>>>> a >>>>>>>>>>> Traversal (second example below). Using MatchWhereStrategy, >>>>>>>> where()-clauses >>>>>>>>>>> can be automatically folded into match() and thus, subject to >>>>>>>> match()-steps >>>>>>>>>>> budget-match algorithm. >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> The sample then shows, that >>>>>>>>>> >>>>>>>>>> g.V().match('a', >>>>>>>>>> __.as('a').out('created').as('b'), >>>>>>>>>> __.as('b').in('created').as('c')). >>>>>>>>>> where(__.as('a').out('knows').as('c')). >>>>>>>>>> select('a','c').by('name') >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> is - after the MatchWhereStrategy was applied (this is done >>>>>>>> automatically) >>>>>>>>>> - in fact the same thing as: >>>>>>>>>> >>>>>>>>>> g.V().match('a', >>>>>>>>>> __.as('a').out('created').as('b'), >>>>>>>>>> __.as('a').out('knows').as('c'), >>>>>>>>>> __.as('b').in('created').as('c')). >>>>>>>>>> select('a','c').by('name') >>>>>>>>>> >>>>>>>>>> .... >>>>>>>>>> >>>>>>>>>> Cheers, >>>>>>>>>> Daniel >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Wed, Jun 17, 2015 at 10:47 PM, Daniel Kuppitz <[email protected] >>> >>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> You're right. It's actually a pretty good example for where(), >> but >>>>>> not >>>>>>>>>>> for match()/where(). I will remove it and make sure that we have >>>>>>>>>>> something similar in the where() sample section. Something like: >>>>>>>>>>> >>>>>>>>>>> g.V().as("a").out("created").as("b").in("created").as("c"). >>>>>>>>>>> where(__as("a").out("knows").as("c")).select().by("name") >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Cheers, >>>>>>>>>>> Daniel >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Wed, Jun 17, 2015 at 8:43 PM, Matthias Broecheler < >>>>>> [email protected] >>>>>>>>> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi guys, >>>>>>>>>>>> >>>>>>>>>>>> looking at the second example in the following section of the >>>> docs I >>>>>>>>>>>> noticed a semantic overlap between match and where: >>>>>>>>>>>> >>>>>> http://www.tinkerpop.com/docs/3.0.0-SNAPSHOT/#using-where-with-match >>>>>>>>>>>> >>>>>>>>>>>> traversal = g.V().match('a', __.as('a').out('created').as('b'), >>>>>>>> __.as('b' >>>>>>>>>>>> ).in('created').as('c')). >> where(__.as('a').out('knows').as('c')). >>>>>>>>>>>> select('a' >>>>>>>>>>>> ,'c').by('name'); >>>>>>>>>>>> >>>>>>>>>>>> The provided where clause could also have been folded into the >>>>>> actual >>>>>>>>>>>> traversal to yield the same result. >>>>>>>>>>>> I wonder: >>>>>>>>>>>> 1) Is there a way to avoid this ambiguity? >>>>>>>>>>>> 2) or should we simply not promote it in the docs. As the docs >> are >>>>>>>>>>>> currently written I am worried that users might get confused as >> to >>>>>> how >>>>>>>>>>>> match steps are supposed to be written. >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> Matthias >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>> >>>>>> >>>> >>>> >> >>
