Whoa, that's great.  The traversers not being at the same location was what
wasn't clicking for me and that is trippy.  This'll do it.

Thanks,
Ted

On Thu, Nov 5, 2015 at 9:18 AM, Marko Rodriguez <[email protected]>
wrote:

> Hi Ted,
>
> So, your email is long and I have the attention span of pebble. However,
> let me just say some things about sack merging and hopefully that solves
> your problem.
>
>         Sacks only merge if the traversers are in the same equivalent
> class.
>                 That is: same graph location, same traversal location,
> same loop counter, same path history (if path calculations are on).
>
> Thus, from your query, I believe perhaps your traversers are NOT at the
> same graph location (the ones you want merged).
>
> This is where I usually try and squirm out of things by just asking
> questions and putting the problem back onto you. However, for once, I
> actually ran someone's code and WALA! I was right, your traversers are
> violating the equivalence class location relation and thus, your sacks
> don't merge.
>
> gremlin> g.withBulk(false).withSack{[]}{it.clone()}{a, b -> l = []; l << a;
> gremlin> l << b;
> gremlin>
> l}.V().has('relType','scan').until(has('type','join')).repeat(sack{s, v ->
> gremlin> s << v.value('relType')}.in('hasInput')).emit().barrier()
> ==>v[4]
> ==>v[8]
> ==>v[10]
> ==>v[6]
> gremlin>
>
> *** Sack merging is like bulk merging. Only on the same location.
>
> However, watch this insanity.
>
> gremlin> g.withBulk(false).withSack{[]}{it.clone()}{a, b -> l = []; l << a;
> gremlin> l << b;
> gremlin>
> l}.V().has('relType','scan').until(has('type','join')).repeat(sack{s, v ->
> gremlin> s <<
> v.value('relType')}.in('hasInput'))emit().barrier().sack().barrier()
> ==>[scan]
> ==>[[scan, filter], [scan, filter]]
> ==>[[scan, filter, join], [scan, filter, join]]
>
> With the added end barrier() your sack becomes a location! TRIPPY! and
> thus, your sacks merge.
>
> Godspeed,
> Marko.
>
> http://markorodriguez.com
>
> On Nov 3, 2015, at 11:08 AM, Ted Wilmes <[email protected]> wrote:
>
> > Hello,
> > I've been working on a sql-to-gremlin compiler and have a question about
> > the latest 3.1.0-SNAPSHOT updates to sack, specifically the merge bits.
> As
> > I was writing a bunch of essentially recursive tree walking code to
> > translate a Calcite logical plan into Gremlin, I thought to myself, maybe
> > this would be cleaner if I loaded the logical plan into a TinkerGraph and
> > then wrote some Gremlin to traverse the plan and in turn, produce the
> > compiled Gremlin (consequently, this got my thinking down how cool would
> it
> > be if I could directly interact with the current object graph without
> > having to perform the intermediate step of loading it into a separate
> graph
> > implementation, like a TP3 enabled JVM interface).  In addition to
> > readability, this has the added bonus of being kind of meta, Gremlin
> begets
> > Gremlin.  I started down this path, and I think it would have been nasty
> > without the 3.1 sack updates since there wasn't a clean way to define
> what
> > would occur on a merge.  Here's a basic example of what I'm thinking,
> maybe
> > I'm off base though.
> >
> > Given the following sql query against the Northwinds schema (imagine a
> the
> > graph version vs the relational version sitting behind it eg.
> sql2gremlin):
> >
> > select * from customer c
> >    inner join country co on c.country_id = co.country_id
> >        where c.name = 'United States'
> >
> > Calcite produces the following logical plan which is basically the query
> > parsed + transformations applied by Calcite's optimizer.  I don't have
> many
> > rules turned on, so you'll see here that the filters aren't pushed down
> > below the join, but for example purposes, this should do it.
> >
> > EnumerableProject(CUSTOMER_ID=[$0], ORDER_ID=[$1], COUNTRY_ID=[$2],
> > REGION_ID=[$3], NAME=[$4], COUNTRY_ID0=[$5], NAME0=[$6])
> >    EnumerableFilter(condition=[=(CAST($4):VARCHAR(3) CHARACTER SET
> > "ISO-8859-1" COLLATE "ISO-8859-1$en_US$primary", 'United States')])
> >        EnumerableFilter(condition=[=($2, $2)])
> >            EnumerableJoin(condition=[true], joinType=[inner])
> >                GremlinToEnumerableConverter
> >                    GremlinTableScan(table=[[gremlin, CUSTOMER]])
> >                GremlinToEnumerableConverter
> >                    GremlinTableScan(table=[[gremlin, COUNTRY]])
> >
> > That has a lot of cruft not pertinent to the core of my question, but the
> > salient bit is that it is internally represented as an object graph of
> > "relation nodes".  I've loaded these relation nodes (rel nodes) into a
> > TinkerGraph and my basic idea is to start down at what is called the
> > GremlinTableScan (you could think of this as the most basic retrieval of
> > all vertices filtered by a given label), and then work my way upwards
> using
> > the sack to hold the traversals as I build them.  For example, you start
> > with the equivalent of retrieval all vertices with label 'x', but then
> you
> > hit a filter, so add a 'has'.  When a "join" node is hit, the sacks would
> > be merged by taking the incoming sack traversals and generating a match
> > statement.  An initial cut of this may nest matches within matches, but I
> > think it could fairly easily be made to just create one uber-match
> > statement.  Here is some example code to load a similar test graph into a
> > TinkerGraph.
> >
> > graph = TinkerGraph.open()
> > scan1 = graph.addVertex(label, "relNode", "relType","scan")
> > scan2 = graph.addVertex(label, "relNode", "relType","scan")
> > filter1 = graph.addVertex(label, "relNode", "relType", "filter")
> > filter2 = graph.addVertex(label, "relNode", "relType", "filter")
> > join = graph.addVertex(label, "relNode", "relType", "join")
> > project = graph.addVertex(label, "relNode", "relType", "project")
> >
> > project.addEdge("hasInput", join)
> > join.addEdge("hasInput", filter1)
> > join.addEdge("hasInput", filter2)
> > filter1.addEdge("hasInput", scan1)
> > filter2.addEdge("hasInput", scan2)
> >
> > My current issue is that I can't figure out how to get the sack merge on
> > join to work.  I admittedly could also be way off base with this approach
> > due to some fundamental misunderstanding, but the sack descriptions
> > regarding splitting and merging of energies made a lot of sense to me and
> > applicable to what I'm trying to do.
> >
> > Here's an example of what I've tried.  In this, case, I'm simulating
> > building the traversals up by seeding the sack with an array, and then
> > simply adding the relTyp (scan, filter, project) as I walk backwards up
> > from the scans.  In  this case, there aren't any splits, but there should
> > be the one merge at the "join" relNode.  I've defined my merge operator
> as
> > adding the two incoming lists to a single list thereby producing a list
> of
> > lists.  My TP3 skills are in their infancy but here it goes:
> >
> > gremlin> g.withBulk(false).withSack{[]}{it.clone()}{a, b -> l = []; l <<
> a;
> > l << b;
> > l}.V().has('relType','scan').until(has('type','join')).repeat(sack{s, v
> ->
> > s << v.value('relType')}.in('hasInput'))emit().barrier().sack()
> > ==>[scan]
> > ==>[[scan, filter], [scan, filter]]
> > ==>[[scan, filter, join], [scan, filter, join]]
> > ==>[scan]
> >
> > I expected to just end up with a single result of [[scan, filter, join],
> > [scan, filter, join]] but I think I am misunderstanding how the
> individual
> > traversers are working and in turn merging.  Is there a way I could tweak
> > this query (or full out rewrite it) so that I end up with a single sack
> > result at the end containing that list of lists (or in the real
> > application, the two separate traversals that I'd then add to a match
> step)
> >
> > Thanks,
> > Ted
>
>

Reply via email to