Traversal ByteCode and Translators (The TINKERPOP-1278 Saga)

Marko Rodriguez Sat, 02 Jul 2016 07:05:27 -0700

Hello,

So TINKERPOP-1278 (aka “Gremlin-Python”) has introduced the notion of Traversal 
ByteCode.


        
https://github.com/apache/tinkerpop/blob/c34fe9fc2cee8bf231c7d4a2d52e746053273129/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/ByteCode.java
 
<https://github.com/apache/tinkerpop/blob/c34fe9fc2cee8bf231c7d4a2d52e746053273129/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/ByteCode.java>

In essence, ByteCode is the construction history of a traversal and is of the 
form:

        [string, object*]* // a list of (operator, arguments[])

The traversal 

g.V(1).outE(‘created’).inV().
  repeat(out(‘created’).in(‘created’)).times(5).
  valueMap(’name’,’age’) 

has a ByteCode representation as below:


[
        [V, 1]
        [outE, ‘created’]
        [inV]
        [repeat, [
                [out, ‘created’]
                [in, ‘created’]
        ]
        [times, 5]
        [valueMap, name, age]
]

Again, Gremlin is a simple language based on function concatenation and 
nesting. Thats all there is to it. Thus, it forms a tree and trees are easy to 
encode, distribute, serialize, decode, prune/optimize, search, etc. Moreover, 
every programming language supports function composition and nesting and thus, 
Gremlin is able to be hosted in any programming language. 
[http://tinkerpop.apache.org/gremlin.html 
<http://tinkerpop.apache.org/gremlin.html>]

The benefit of ByteCode as it applies to TINKERPOP-1278 is that a Translator is 
able to access the ByteCode of the traversal and then use that linear-nested 
structure (wide-tree) to generate a traversal representation in another 
language — e.g. Gremlin-Python, Gremlin-Ruby, Gremlin-JavaScript, etc. 

Here is the Gremlin-Python translator that will turn ByteCode into 
Gremlin-Python:
        
https://github.com/apache/tinkerpop/blob/c34fe9fc2cee8bf231c7d4a2d52e746053273129/gremlin-python/src/main/java/org/apache/tinkerpop/gremlin/java/translator/PythonTranslator.java
 
<https://github.com/apache/tinkerpop/blob/c34fe9fc2cee8bf231c7d4a2d52e746053273129/gremlin-python/src/main/java/org/apache/tinkerpop/gremlin/java/translator/PythonTranslator.java>
Here is the Gremlin-Groovy translator that will turn ByteCode into 
Gremlin-Groovy:
        
https://github.com/apache/tinkerpop/blob/c34fe9fc2cee8bf231c7d4a2d52e746053273129/gremlin-groovy/src/main/java/org/apache/tinkerpop/gremlin/java/translator/GroovyTranslator.java
 
<https://github.com/apache/tinkerpop/blob/c34fe9fc2cee8bf231c7d4a2d52e746053273129/gremlin-groovy/src/main/java/org/apache/tinkerpop/gremlin/java/translator/GroovyTranslator.java>

Pretty simple, eh? So, why would you want Gremlin-Java to translate to 
Gremlin-Groovy? Well, so you can code in Gremlin-Java and then have it execute 
on GremlinServer via the GremlinGroovy JSR223 ScriptEngine. However, one can 
imagine a Gremlin-Java->Gremlin-Java translator! What is that?! Well, it would 
use reflection (or some more efficient mechanism) to reconstruct the 
Gremlin-Java traversal from ByteCode generated from Gremlin-Java and thus, the 
entire cluster/sever infrastructure is simply migrating ByteCode around as 
opposed to worrying about language specific representation — e.g. it has 
nothing to do with the JVM! Also, assume a Python-based graph database exists 
that implements the GreminVM — Gremlin-Java can easily talk to it via ByteCode.

To ensure ByteCode generation is not costly, here are the runtimes for 
construction and compilation of a “fairly complex” traversal in both master/ 
and TINKERPOP-1278/

        master/

gremlin> 
clock(5000){g.V().repeat(out()).times(2).as("a").union(both(),out()).dedup().as("b").select("a","b").identity()
 }
==>0.004184923
gremlin> 
clock(5000){g.V().repeat(out()).times(2).as("a").union(both(),out()).dedup().as("b").select("a","b").identity().applyStrategies()
 }
==>0.0264354126

        TINKERPOP-1278/

gremlin> 
clock(5000){g.V().repeat(out()).times(2).as("a").union(both(),out()).dedup().as("b").select("a","b").identity()
 }
==>0.0048526357999999995
gremlin> 
clock(5000){g.V().repeat(out()).times(2).as("a").union(both(),out()).dedup().as("b").select("a","b").identity().applyStrategies()
 }
==>0.0245924514

Finally, there are various entailments that come from this ByteCode 
representation that have started to surface in my mind:

        1. We can now optimize at the ByteCode level and not at the step level 
— ByteCodeStrategies could be a new TraversalStrategy class that comes before 
DecorationStrategies.
        2. Gremlin-Java can support bindings so that its Gremlin-Groovy (e.g.) 
compiled form uses bindings. How? By simply rewriting the ByteCode prior to 
compilation and replacing values with variables!
        3. GraphSON can now easily support Traversal serialization — ByteCode 
in JSON is natural. Gremlin-GraphSON anyone? (get it?)
                [g, V : [1], outE : [created], inV : repeat : [out : [created], 
in : [created]], times : 5], valueMap : [name, age]]

This is what makes Gremlin so powerful — its syntax is crazy simple as its just 
functions and thus, it can naturally exist in any language — even XML! … 
ByteCode is what is going to free Gremlin from the JVM…as we are already now on 
the CPython VM with relative ease:

        PythonGraphTraversal
                
https://github.com/apache/tinkerpop/blob/c34fe9fc2cee8bf231c7d4a2d52e746053273129/gremlin-python/src/main/jython/gremlin_python/graph_traversal.py
 
<https://github.com/apache/tinkerpop/blob/c34fe9fc2cee8bf231c7d4a2d52e746053273129/gremlin-python/src/main/jython/gremlin_python/graph_traversal.py>
 
        JythonTranslator (Gremlin-Python to Gremlin-Jython)
                
https://github.com/apache/tinkerpop/blob/c34fe9fc2cee8bf231c7d4a2d52e746053273129/gremlin-python/src/main/jython/gremlin_python/jython_translator.py
 
<https://github.com/apache/tinkerpop/blob/c34fe9fc2cee8bf231c7d4a2d52e746053273129/gremlin-python/src/main/jython/gremlin_python/jython_translator.py>
        GroovyTranslator (Gremlin-Python to Gremlin-Groovy)
                
https://github.com/apache/tinkerpop/blob/c34fe9fc2cee8bf231c7d4a2d52e746053273129/gremlin-python/src/main/jython/gremlin_python/groovy_translator.py
 
<https://github.com/apache/tinkerpop/blob/c34fe9fc2cee8bf231c7d4a2d52e746053273129/gremlin-python/src/main/jython/gremlin_python/groovy_translator.py>

*** NOTE: I have yet to introduce the ByteCode concepts to gremlin-python/ 
package so use your imagination when seeing how PythonGraphTraversal will 
construct ByteCode and the respective translators will translate it. Finally, 
realize that people can now compile Python ByteCode into Python-based steps and 
thus, Gremlin can live and execute on the Python VM against Python-based graph 
systems. Its really that easy (though, lots of work to implement the 30 some 
standard steps in Python). What this means though is that, in the future, 
Gremlin can just move between languages/VMs … between Python, JVM, Ruby, 
JavaScript, C, etc.-based graph processing systems. One language, tailored to 
its host, and agnostic to the underlying virtual machine.

Enjoy,
Marko.

http://markorodriguez.com

Traversal ByteCode and Translators (The TINKERPOP-1278 Saga)

Reply via email to