Whoa, that's great. The traversers not being at the same location was what wasn't clicking for me and that is trippy. This'll do it.
Thanks, Ted On Thu, Nov 5, 2015 at 9:18 AM, Marko Rodriguez <[email protected]> wrote: > Hi Ted, > > So, your email is long and I have the attention span of pebble. However, > let me just say some things about sack merging and hopefully that solves > your problem. > > Sacks only merge if the traversers are in the same equivalent > class. > That is: same graph location, same traversal location, > same loop counter, same path history (if path calculations are on). > > Thus, from your query, I believe perhaps your traversers are NOT at the > same graph location (the ones you want merged). > > This is where I usually try and squirm out of things by just asking > questions and putting the problem back onto you. However, for once, I > actually ran someone's code and WALA! I was right, your traversers are > violating the equivalence class location relation and thus, your sacks > don't merge. > > gremlin> g.withBulk(false).withSack{[]}{it.clone()}{a, b -> l = []; l << a; > gremlin> l << b; > gremlin> > l}.V().has('relType','scan').until(has('type','join')).repeat(sack{s, v -> > gremlin> s << v.value('relType')}.in('hasInput')).emit().barrier() > ==>v[4] > ==>v[8] > ==>v[10] > ==>v[6] > gremlin> > > *** Sack merging is like bulk merging. Only on the same location. > > However, watch this insanity. > > gremlin> g.withBulk(false).withSack{[]}{it.clone()}{a, b -> l = []; l << a; > gremlin> l << b; > gremlin> > l}.V().has('relType','scan').until(has('type','join')).repeat(sack{s, v -> > gremlin> s << > v.value('relType')}.in('hasInput'))emit().barrier().sack().barrier() > ==>[scan] > ==>[[scan, filter], [scan, filter]] > ==>[[scan, filter, join], [scan, filter, join]] > > With the added end barrier() your sack becomes a location! TRIPPY! and > thus, your sacks merge. > > Godspeed, > Marko. > > http://markorodriguez.com > > On Nov 3, 2015, at 11:08 AM, Ted Wilmes <[email protected]> wrote: > > > Hello, > > I've been working on a sql-to-gremlin compiler and have a question about > > the latest 3.1.0-SNAPSHOT updates to sack, specifically the merge bits. > As > > I was writing a bunch of essentially recursive tree walking code to > > translate a Calcite logical plan into Gremlin, I thought to myself, maybe > > this would be cleaner if I loaded the logical plan into a TinkerGraph and > > then wrote some Gremlin to traverse the plan and in turn, produce the > > compiled Gremlin (consequently, this got my thinking down how cool would > it > > be if I could directly interact with the current object graph without > > having to perform the intermediate step of loading it into a separate > graph > > implementation, like a TP3 enabled JVM interface). In addition to > > readability, this has the added bonus of being kind of meta, Gremlin > begets > > Gremlin. I started down this path, and I think it would have been nasty > > without the 3.1 sack updates since there wasn't a clean way to define > what > > would occur on a merge. Here's a basic example of what I'm thinking, > maybe > > I'm off base though. > > > > Given the following sql query against the Northwinds schema (imagine a > the > > graph version vs the relational version sitting behind it eg. > sql2gremlin): > > > > select * from customer c > > inner join country co on c.country_id = co.country_id > > where c.name = 'United States' > > > > Calcite produces the following logical plan which is basically the query > > parsed + transformations applied by Calcite's optimizer. I don't have > many > > rules turned on, so you'll see here that the filters aren't pushed down > > below the join, but for example purposes, this should do it. > > > > EnumerableProject(CUSTOMER_ID=[$0], ORDER_ID=[$1], COUNTRY_ID=[$2], > > REGION_ID=[$3], NAME=[$4], COUNTRY_ID0=[$5], NAME0=[$6]) > > EnumerableFilter(condition=[=(CAST($4):VARCHAR(3) CHARACTER SET > > "ISO-8859-1" COLLATE "ISO-8859-1$en_US$primary", 'United States')]) > > EnumerableFilter(condition=[=($2, $2)]) > > EnumerableJoin(condition=[true], joinType=[inner]) > > GremlinToEnumerableConverter > > GremlinTableScan(table=[[gremlin, CUSTOMER]]) > > GremlinToEnumerableConverter > > GremlinTableScan(table=[[gremlin, COUNTRY]]) > > > > That has a lot of cruft not pertinent to the core of my question, but the > > salient bit is that it is internally represented as an object graph of > > "relation nodes". I've loaded these relation nodes (rel nodes) into a > > TinkerGraph and my basic idea is to start down at what is called the > > GremlinTableScan (you could think of this as the most basic retrieval of > > all vertices filtered by a given label), and then work my way upwards > using > > the sack to hold the traversals as I build them. For example, you start > > with the equivalent of retrieval all vertices with label 'x', but then > you > > hit a filter, so add a 'has'. When a "join" node is hit, the sacks would > > be merged by taking the incoming sack traversals and generating a match > > statement. An initial cut of this may nest matches within matches, but I > > think it could fairly easily be made to just create one uber-match > > statement. Here is some example code to load a similar test graph into a > > TinkerGraph. > > > > graph = TinkerGraph.open() > > scan1 = graph.addVertex(label, "relNode", "relType","scan") > > scan2 = graph.addVertex(label, "relNode", "relType","scan") > > filter1 = graph.addVertex(label, "relNode", "relType", "filter") > > filter2 = graph.addVertex(label, "relNode", "relType", "filter") > > join = graph.addVertex(label, "relNode", "relType", "join") > > project = graph.addVertex(label, "relNode", "relType", "project") > > > > project.addEdge("hasInput", join) > > join.addEdge("hasInput", filter1) > > join.addEdge("hasInput", filter2) > > filter1.addEdge("hasInput", scan1) > > filter2.addEdge("hasInput", scan2) > > > > My current issue is that I can't figure out how to get the sack merge on > > join to work. I admittedly could also be way off base with this approach > > due to some fundamental misunderstanding, but the sack descriptions > > regarding splitting and merging of energies made a lot of sense to me and > > applicable to what I'm trying to do. > > > > Here's an example of what I've tried. In this, case, I'm simulating > > building the traversals up by seeding the sack with an array, and then > > simply adding the relTyp (scan, filter, project) as I walk backwards up > > from the scans. In this case, there aren't any splits, but there should > > be the one merge at the "join" relNode. I've defined my merge operator > as > > adding the two incoming lists to a single list thereby producing a list > of > > lists. My TP3 skills are in their infancy but here it goes: > > > > gremlin> g.withBulk(false).withSack{[]}{it.clone()}{a, b -> l = []; l << > a; > > l << b; > > l}.V().has('relType','scan').until(has('type','join')).repeat(sack{s, v > -> > > s << v.value('relType')}.in('hasInput'))emit().barrier().sack() > > ==>[scan] > > ==>[[scan, filter], [scan, filter]] > > ==>[[scan, filter, join], [scan, filter, join]] > > ==>[scan] > > > > I expected to just end up with a single result of [[scan, filter, join], > > [scan, filter, join]] but I think I am misunderstanding how the > individual > > traversers are working and in turn merging. Is there a way I could tweak > > this query (or full out rewrite it) so that I end up with a single sack > > result at the end containing that list of lists (or in the real > > application, the two separate traversals that I'd then add to a match > step) > > > > Thanks, > > Ted > >
