Re: A collection of examples that map a query language query to provider bytecode.

Dmitry Novikov Thu, 09 May 2019 05:08:08 -0700

Hello,

please clarify one moment regarding Universal Bytecode and Gremlin Clients.


In TP3 compilation to Bytecode can happen on Gremlin Client side or Gremlin 
Server side:

1. If compilation is simple, it is possible to implement it for all Gremlin 
Clients: Java, Python, JavaScript, .NET...
2. If compilation is complex, it is possible to create a plugin for Gremlin 
Server. Clients send query string, and server does the compilation.

For example, in Cypher for Gremlin it is possible to use compilation to 
Bytecode in JVM client, or on the server when using [other language clients][1].

My current understanding is that TP4 Server would serve only for I/O purposes.

Where do you see "Query language -> Universal Bytecode" part in TP4 
architecture? Will it be in the VM? Or in middleware? How will clients look 
like in TP4?

[1]: 
https://github.com/opencypher/cypher-for-gremlin/tree/master/tinkerpop/cypher-gremlin-server-plugin#usage

On 2019/05/06 15:58:09, Marko Rodriguez <okramma...@gmail.com> wrote: 
> Hello,
> 
> I’m experimenting with moving between X query language and Y bytecode via 
> Universal Bytecode.
> 
> The general (and very difficult) goal of TP4 is to be able to execute queries 
> (from any known query language) against any database (regardless of 
> underlying data model) using any processing engine.
>       - e.g. Gremlin over MySQL (as Relational) using RxJava.
>       - e.g. SQL over Cassandra (as WideColumn) using Flink.
>       - e.g. SPARQL over MongoDB (as Document) using Akka.
>       - e.g. Cypher over Neptune (as Graph) using Pipes.
>       - e.g. ...
> 
> ——————
> 
> NOTES:
>       1. Realize that databases are both processors and structures.
>               - MySQL has its own SQL engine.
>               - Cassandra has its own CQL engine.
>               - MongoDB has its own DocumentQuery engine.
>               - …
>       2. What can be processed by the database’s engine should be evaluated 
> by the database (typically).
>       3. What can not be processed by the database’s engine should be 
> evaluated by the processor.
>       4. DATABASE_ENGINE->PROCESSOR->DATABASE_ENGINE->PROCESSOR->etc.
>               - data may move from database to processor back to database 
> back to processor, etc. to yield the final query result.
> 
> The universal bytecode chunks in the examples to come assume the following 
> n-tuple structure accessible via db():
> 
> [0][id:1, label:person, name:marko, outE:*1]
> [1][0:*2, 1:*3]
> [2][id:7, label:knows, outV:*0, inV:*4]
> [3][id:8, label:knows, outV:*0, inV:*5]
> [4][id:2, label:person, name:vadas]
> [5][id:4, label:person, name:josh]
> 
> - All tuples have an id that is outside the id-space of the underlying data 
> model.
> - Field values can have pointers to other tuples via *-prefix notation.
> 
> Every compilation goes:
> 
>       Query language -> Universal Bytecode -> Data Model Bytecode -> Provider 
> Bytecode
> 
> ——————
> 
> How do you compile Gremlin to universal bytecode for evaluation over MySQL?
> 
> —— GREMLIN QUERY ——
> 
> g.V().has(“name”,”marko”).out(“knows”).values(“name”)
> 
> —— UNIVERSAL BYTECODE ——
> 
>    ==> (using tuple pointers)
> 
> db().has(‘label’, within(‘person’,’project’)).
>      has(‘name’,’marko’)
>   values(‘outE’).has(‘label’,’knows’).
>   values(‘inV’).values(‘name’)
> 
> —— RELATIONAL BYTECODE ——
> 
>    ==>
> 
> R(“people”,”projects").has(“name”,”marko”).
>  join(R(“knows”)).by(“id”,eq(“outV”)).
>  join(R(“people”)).by(“inV”,eq(“id”)).
>   values(“name”)
> 
> —— JDBC BYTECODE ——
> 
>    ==>
> 
> union(
>   sql(‘SELECT name FROM people as p1, knows, people as p2 WHERE 
> p1.id=knows.outV AND knows.inV=p2.id’),
>   sql(‘SELECT name FROM projects as p1, knows, projects as p2 WHERE 
> p1.id=knows.outV AND knows.inV=p2.id’))
>       
> The assumed SQL tables are:
> 
> CREATE TABLE people (
>     id int,
>     label string, // person
>     name string,
>     PRIMARY KEY id
> );
> 
> CREATE TABLE knows (
>     id int,
>     label string, // knows
>     name string,
>     outV int,
>     inV int,
>     PRIMARY KEY id,
>     FOREIGN KEY (outV) REFERENCES people(id),
>     FOREIGN KEY (inV) REFERENCES people(id)
> );
> 
> There needs to be two mapping specifications (Graph->Universal & 
> Universal->Relational)
>       - V() -> vertex tables are people+projects.
>       - label() -> plural is table name
>       - outE -> person.outE.knows is resolved via knows.outV (foreign key to 
> person table by id)
>       - inV -> knows.values(‘inV’) is resolved to person (foreign key to 
> person table by id)
> 
> Next, we are assuming that a property graph is encoded in MySQL as we have 
> outE, inV, etc. column names. If we want to interpret any relational data as 
> graph, then it is important to denote which tables are “join tables” and what 
> the column-names are for joining. What about when a “join table” references 
> more than 2 other rows? (i.e. n-ary relation — hypergraph) ? Is there a 
> general solution to looking at any relational schema in terms of a binary 
> property graph?
> 
> ——————
> 
> How do you compile SQL to universal bytecode for evaluation over Cassandra?
>       
> —— SQL QUERY ——
> 
> SELECT p2.name FROM people as p1, knows, people as p2 
>   WHERE p1.name=marko AND p1.id=knows.outV AND knows.inV=p2.id
> 
> —— UNIVERSAL BYTECODE ——
> 
>    ==> (using tuple pointers)
> 
> db().has(‘label’,’person’).has(‘name’,’marko’).
>   values(‘outE’).has(‘label’,’knows’).
>   values(‘inV’).values(‘name’)
> 
> —— WIDE-COLUMN BYTECODE ——
> 
>    ==>
> 
> R(‘people’).has(‘name’,’marko’).
>   values(‘outE’).has(‘label’,’knows’).values(‘inV’).as(‘$inV’)
> R(‘people’).has(‘id’,eq(path(‘$inV’))).values(‘name’)
> 
> —— CASSANDRA BYTECODE ——
> 
>    ==> 
> 
> cql('SELECT outE:knows FROM people WHERE name=marko’).
> cql('SELECT name FROM people WHERE id=$inV’).by(‘inV’)
> 
> There needs to be a mapping specification from SQL->Universal
>       - knows.outV is foreign key a person row.
>       - person.outE.knows is referenced by a knows.outV.
> 
> The people-table is is defined as below where each edge is a column.
> 
> CREATE TABLE people (
>     id int,
>     name string,
>     age int
>     outE list<map<string,object>>,
>     PRIMARY KEY (id,name)
> );
> 
> The last bytecode is chained CQL where each result from the first CQL 
> instruction is fed into the second CQL instruction one at a time (i.e. cql() 
> is a flatmap). While CQL does not support joins, Flink (the registered 
> processor) does and thus, joins could be done by Flink. This email does not 
> discuss processor strategies.
> 
> ——————
> 
> How do you compile SPARQL to universal bytecode for evaluation over MongoDB?
> 
> —— SPARQL QUERY ——
> 
> SELECT ?name WHERE {
>   ?x ex:label person .
>   ?x ex:name marko .
>   ?x ex:outE ?y .
>   ?y ex:label knows .
>   ?y ex:inV ?z .
>   ?z ex:name ?name
> }
> 
> —— UNIVERSAL BYTECODE ——
> 
>    ==> (using tuple pointers)
> 
> db().has(‘label’,’person’).has(‘name’,’marko’).
>   values(‘outE’).has(‘label’,’knows’).
>   values(‘inV’).values(‘name’)
> 
> —— DOCUMENT BYTECODE ——
> 
>    ==>
> 
> D().has(‘label’,’person’).has(‘name’,’marko’).
>   values(‘outE’).has(‘label’,’knows’).values(‘outV’).as(‘a’)
> D().has(‘_id’,eq(path(‘a’))).
>   values(’name’)    
> 
> —— MONGODB BYTECODE ——
> 
>    ==> 
> 
> mquery('{ label:’person’, name:’marko’, { outE: {label:’knows’} },{ 
> outE.inV:1 }’).
>   values(‘outE.inV’).
> mquery('{ _id:$inV },{ name:1 }’).
>   values(‘name’)
> 
> The MongoDB documents have an outE field which is a list of edges that are 
> themselves objects with label and inV fields.
> 
> ——————
> 
> Just spittin’ ideas,
> Marko.

Re: A collection of examples that map a query language query to provider bytecode.

Reply via email to