Re: 3.1.0 sack/bulking update applicability to sql-gremlin

Ted Wilmes Thu, 05 Nov 2015 07:55:53 -0800

Whoa, that's great.  The traversers not being at the same location was what
wasn't clicking for me and that is trippy.  This'll do it.


Thanks,
Ted

On Thu, Nov 5, 2015 at 9:18 AM, Marko Rodriguez <[email protected]>
wrote:

> Hi Ted,
>
> So, your email is long and I have the attention span of pebble. However,
> let me just say some things about sack merging and hopefully that solves
> your problem.
>
>         Sacks only merge if the traversers are in the same equivalent
> class.
>                 That is: same graph location, same traversal location,
> same loop counter, same path history (if path calculations are on).
>
> Thus, from your query, I believe perhaps your traversers are NOT at the
> same graph location (the ones you want merged).
>
> This is where I usually try and squirm out of things by just asking
> questions and putting the problem back onto you. However, for once, I
> actually ran someone's code and WALA! I was right, your traversers are
> violating the equivalence class location relation and thus, your sacks
> don't merge.
>
> gremlin> g.withBulk(false).withSack{[]}{it.clone()}{a, b -> l = []; l << a;
> gremlin> l << b;
> gremlin>
> l}.V().has('relType','scan').until(has('type','join')).repeat(sack{s, v ->
> gremlin> s << v.value('relType')}.in('hasInput')).emit().barrier()
> ==>v[4]
> ==>v[8]
> ==>v[10]
> ==>v[6]
> gremlin>
>
> *** Sack merging is like bulk merging. Only on the same location.
>
> However, watch this insanity.
>
> gremlin> g.withBulk(false).withSack{[]}{it.clone()}{a, b -> l = []; l << a;
> gremlin> l << b;
> gremlin>
> l}.V().has('relType','scan').until(has('type','join')).repeat(sack{s, v ->
> gremlin> s <<
> v.value('relType')}.in('hasInput'))emit().barrier().sack().barrier()
> ==>[scan]
> ==>[[scan, filter], [scan, filter]]
> ==>[[scan, filter, join], [scan, filter, join]]
>
> With the added end barrier() your sack becomes a location! TRIPPY! and
> thus, your sacks merge.
>
> Godspeed,
> Marko.
>
> http://markorodriguez.com
>
> On Nov 3, 2015, at 11:08 AM, Ted Wilmes <[email protected]> wrote:
>
> > Hello,
> > I've been working on a sql-to-gremlin compiler and have a question about
> > the latest 3.1.0-SNAPSHOT updates to sack, specifically the merge bits.
> As
> > I was writing a bunch of essentially recursive tree walking code to
> > translate a Calcite logical plan into Gremlin, I thought to myself, maybe
> > this would be cleaner if I loaded the logical plan into a TinkerGraph and
> > then wrote some Gremlin to traverse the plan and in turn, produce the
> > compiled Gremlin (consequently, this got my thinking down how cool would
> it
> > be if I could directly interact with the current object graph without
> > having to perform the intermediate step of loading it into a separate
> graph
> > implementation, like a TP3 enabled JVM interface).  In addition to
> > readability, this has the added bonus of being kind of meta, Gremlin
> begets
> > Gremlin.  I started down this path, and I think it would have been nasty
> > without the 3.1 sack updates since there wasn't a clean way to define
> what
> > would occur on a merge.  Here's a basic example of what I'm thinking,
> maybe
> > I'm off base though.
> >
> > Given the following sql query against the Northwinds schema (imagine a
> the
> > graph version vs the relational version sitting behind it eg.
> sql2gremlin):
> >
> > select * from customer c
> >    inner join country co on c.country_id = co.country_id
> >        where c.name = 'United States'
> >
> > Calcite produces the following logical plan which is basically the query
> > parsed + transformations applied by Calcite's optimizer.  I don't have
> many
> > rules turned on, so you'll see here that the filters aren't pushed down
> > below the join, but for example purposes, this should do it.
> >
> > EnumerableProject(CUSTOMER_ID=[$0], ORDER_ID=[$1], COUNTRY_ID=[$2],
> > REGION_ID=[$3], NAME=[$4], COUNTRY_ID0=[$5], NAME0=[$6])
> >    EnumerableFilter(condition=[=(CAST($4):VARCHAR(3) CHARACTER SET
> > "ISO-8859-1" COLLATE "ISO-8859-1$en_US$primary", 'United States')])
> >        EnumerableFilter(condition=[=($2, $2)])
> >            EnumerableJoin(condition=[true], joinType=[inner])
> >                GremlinToEnumerableConverter
> >                    GremlinTableScan(table=[[gremlin, CUSTOMER]])
> >                GremlinToEnumerableConverter
> >                    GremlinTableScan(table=[[gremlin, COUNTRY]])
> >
> > That has a lot of cruft not pertinent to the core of my question, but the
> > salient bit is that it is internally represented as an object graph of
> > "relation nodes".  I've loaded these relation nodes (rel nodes) into a
> > TinkerGraph and my basic idea is to start down at what is called the
> > GremlinTableScan (you could think of this as the most basic retrieval of
> > all vertices filtered by a given label), and then work my way upwards
> using
> > the sack to hold the traversals as I build them.  For example, you start
> > with the equivalent of retrieval all vertices with label 'x', but then
> you
> > hit a filter, so add a 'has'.  When a "join" node is hit, the sacks would
> > be merged by taking the incoming sack traversals and generating a match
> > statement.  An initial cut of this may nest matches within matches, but I
> > think it could fairly easily be made to just create one uber-match
> > statement.  Here is some example code to load a similar test graph into a
> > TinkerGraph.
> >
> > graph = TinkerGraph.open()
> > scan1 = graph.addVertex(label, "relNode", "relType","scan")
> > scan2 = graph.addVertex(label, "relNode", "relType","scan")
> > filter1 = graph.addVertex(label, "relNode", "relType", "filter")
> > filter2 = graph.addVertex(label, "relNode", "relType", "filter")
> > join = graph.addVertex(label, "relNode", "relType", "join")
> > project = graph.addVertex(label, "relNode", "relType", "project")
> >
> > project.addEdge("hasInput", join)
> > join.addEdge("hasInput", filter1)
> > join.addEdge("hasInput", filter2)
> > filter1.addEdge("hasInput", scan1)
> > filter2.addEdge("hasInput", scan2)
> >
> > My current issue is that I can't figure out how to get the sack merge on
> > join to work.  I admittedly could also be way off base with this approach
> > due to some fundamental misunderstanding, but the sack descriptions
> > regarding splitting and merging of energies made a lot of sense to me and
> > applicable to what I'm trying to do.
> >
> > Here's an example of what I've tried.  In this, case, I'm simulating
> > building the traversals up by seeding the sack with an array, and then
> > simply adding the relTyp (scan, filter, project) as I walk backwards up
> > from the scans.  In  this case, there aren't any splits, but there should
> > be the one merge at the "join" relNode.  I've defined my merge operator
> as
> > adding the two incoming lists to a single list thereby producing a list
> of
> > lists.  My TP3 skills are in their infancy but here it goes:
> >
> > gremlin> g.withBulk(false).withSack{[]}{it.clone()}{a, b -> l = []; l <<
> a;
> > l << b;
> > l}.V().has('relType','scan').until(has('type','join')).repeat(sack{s, v
> ->
> > s << v.value('relType')}.in('hasInput'))emit().barrier().sack()
> > ==>[scan]
> > ==>[[scan, filter], [scan, filter]]
> > ==>[[scan, filter, join], [scan, filter, join]]
> > ==>[scan]
> >
> > I expected to just end up with a single result of [[scan, filter, join],
> > [scan, filter, join]] but I think I am misunderstanding how the
> individual
> > traversers are working and in turn merging.  Is there a way I could tweak
> > this query (or full out rewrite it) so that I end up with a single sack
> > result at the end containing that list of lists (or in the real
> > application, the two separate traversals that I'd then add to a match
> step)
> >
> > Thanks,
> > Ted
>
>

Re: 3.1.0 sack/bulking update applicability to sql-gremlin

Reply via email to