Re: 3.1.0 sack/bulking update applicability to sql-gremlin

Marko Rodriguez Thu, 05 Nov 2015 07:19:26 -0800

Hi Ted,

So, your email is long and I have the attention span of pebble. However, let me 
just say some things about sack merging and hopefully that solves your problem.


        Sacks only merge if the traversers are in the same equivalent class.
                That is: same graph location, same traversal location, same 
loop counter, same path history (if path calculations are on).

Thus, from your query, I believe perhaps your traversers are NOT at the same 
graph location (the ones you want merged).

This is where I usually try and squirm out of things by just asking questions 
and putting the problem back onto you. However, for once, I actually ran 
someone's code and WALA! I was right, your traversers are violating the 
equivalence class location relation and thus, your sacks don't merge.

gremlin> g.withBulk(false).withSack{[]}{it.clone()}{a, b -> l = []; l << a;
gremlin> l << b;
gremlin> l}.V().has('relType','scan').until(has('type','join')).repeat(sack{s, 
v ->
gremlin> s << v.value('relType')}.in('hasInput')).emit().barrier()
==>v[4]
==>v[8]
==>v[10]
==>v[6]
gremlin>

*** Sack merging is like bulk merging. Only on the same location.

However, watch this insanity.

gremlin> g.withBulk(false).withSack{[]}{it.clone()}{a, b -> l = []; l << a;
gremlin> l << b;
gremlin> l}.V().has('relType','scan').until(has('type','join')).repeat(sack{s, 
v ->
gremlin> s << 
v.value('relType')}.in('hasInput'))emit().barrier().sack().barrier()
==>[scan]
==>[[scan, filter], [scan, filter]]
==>[[scan, filter, join], [scan, filter, join]]

With the added end barrier() your sack becomes a location! TRIPPY! and thus, 
your sacks merge.

Godspeed,
Marko.

http://markorodriguez.com

On Nov 3, 2015, at 11:08 AM, Ted Wilmes <[email protected]> wrote:

> Hello,
> I've been working on a sql-to-gremlin compiler and have a question about
> the latest 3.1.0-SNAPSHOT updates to sack, specifically the merge bits.  As
> I was writing a bunch of essentially recursive tree walking code to
> translate a Calcite logical plan into Gremlin, I thought to myself, maybe
> this would be cleaner if I loaded the logical plan into a TinkerGraph and
> then wrote some Gremlin to traverse the plan and in turn, produce the
> compiled Gremlin (consequently, this got my thinking down how cool would it
> be if I could directly interact with the current object graph without
> having to perform the intermediate step of loading it into a separate graph
> implementation, like a TP3 enabled JVM interface).  In addition to
> readability, this has the added bonus of being kind of meta, Gremlin begets
> Gremlin.  I started down this path, and I think it would have been nasty
> without the 3.1 sack updates since there wasn't a clean way to define what
> would occur on a merge.  Here's a basic example of what I'm thinking, maybe
> I'm off base though.
> 
> Given the following sql query against the Northwinds schema (imagine a the
> graph version vs the relational version sitting behind it eg. sql2gremlin):
> 
> select * from customer c
>    inner join country co on c.country_id = co.country_id
>        where c.name = 'United States'
> 
> Calcite produces the following logical plan which is basically the query
> parsed + transformations applied by Calcite's optimizer.  I don't have many
> rules turned on, so you'll see here that the filters aren't pushed down
> below the join, but for example purposes, this should do it.
> 
> EnumerableProject(CUSTOMER_ID=[$0], ORDER_ID=[$1], COUNTRY_ID=[$2],
> REGION_ID=[$3], NAME=[$4], COUNTRY_ID0=[$5], NAME0=[$6])
>    EnumerableFilter(condition=[=(CAST($4):VARCHAR(3) CHARACTER SET
> "ISO-8859-1" COLLATE "ISO-8859-1$en_US$primary", 'United States')])
>        EnumerableFilter(condition=[=($2, $2)])
>            EnumerableJoin(condition=[true], joinType=[inner])
>                GremlinToEnumerableConverter
>                    GremlinTableScan(table=[[gremlin, CUSTOMER]])
>                GremlinToEnumerableConverter
>                    GremlinTableScan(table=[[gremlin, COUNTRY]])
> 
> That has a lot of cruft not pertinent to the core of my question, but the
> salient bit is that it is internally represented as an object graph of
> "relation nodes".  I've loaded these relation nodes (rel nodes) into a
> TinkerGraph and my basic idea is to start down at what is called the
> GremlinTableScan (you could think of this as the most basic retrieval of
> all vertices filtered by a given label), and then work my way upwards using
> the sack to hold the traversals as I build them.  For example, you start
> with the equivalent of retrieval all vertices with label 'x', but then you
> hit a filter, so add a 'has'.  When a "join" node is hit, the sacks would
> be merged by taking the incoming sack traversals and generating a match
> statement.  An initial cut of this may nest matches within matches, but I
> think it could fairly easily be made to just create one uber-match
> statement.  Here is some example code to load a similar test graph into a
> TinkerGraph.
> 
> graph = TinkerGraph.open()
> scan1 = graph.addVertex(label, "relNode", "relType","scan")
> scan2 = graph.addVertex(label, "relNode", "relType","scan")
> filter1 = graph.addVertex(label, "relNode", "relType", "filter")
> filter2 = graph.addVertex(label, "relNode", "relType", "filter")
> join = graph.addVertex(label, "relNode", "relType", "join")
> project = graph.addVertex(label, "relNode", "relType", "project")
> 
> project.addEdge("hasInput", join)
> join.addEdge("hasInput", filter1)
> join.addEdge("hasInput", filter2)
> filter1.addEdge("hasInput", scan1)
> filter2.addEdge("hasInput", scan2)
> 
> My current issue is that I can't figure out how to get the sack merge on
> join to work.  I admittedly could also be way off base with this approach
> due to some fundamental misunderstanding, but the sack descriptions
> regarding splitting and merging of energies made a lot of sense to me and
> applicable to what I'm trying to do.
> 
> Here's an example of what I've tried.  In this, case, I'm simulating
> building the traversals up by seeding the sack with an array, and then
> simply adding the relTyp (scan, filter, project) as I walk backwards up
> from the scans.  In  this case, there aren't any splits, but there should
> be the one merge at the "join" relNode.  I've defined my merge operator as
> adding the two incoming lists to a single list thereby producing a list of
> lists.  My TP3 skills are in their infancy but here it goes:
> 
> gremlin> g.withBulk(false).withSack{[]}{it.clone()}{a, b -> l = []; l << a;
> l << b;
> l}.V().has('relType','scan').until(has('type','join')).repeat(sack{s, v ->
> s << v.value('relType')}.in('hasInput'))emit().barrier().sack()
> ==>[scan]
> ==>[[scan, filter], [scan, filter]]
> ==>[[scan, filter, join], [scan, filter, join]]
> ==>[scan]
> 
> I expected to just end up with a single result of [[scan, filter, join],
> [scan, filter, join]] but I think I am misunderstanding how the individual
> traversers are working and in turn merging.  Is there a way I could tweak
> this query (or full out rewrite it) so that I end up with a single sack
> result at the end containing that list of lists (or in the real
> application, the two separate traversals that I'd then add to a match step)
> 
> Thanks,
> Ted

Re: 3.1.0 sack/bulking update applicability to sql-gremlin

Reply via email to