Hello,
So TINKERPOP-1278 (aka “Gremlin-Python”) has introduced the notion of Traversal
ByteCode.
https://github.com/apache/tinkerpop/blob/c34fe9fc2cee8bf231c7d4a2d52e746053273129/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/ByteCode.java
<https://github.com/apache/tinkerpop/blob/c34fe9fc2cee8bf231c7d4a2d52e746053273129/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/ByteCode.java>
In essence, ByteCode is the construction history of a traversal and is of the
form:
[string, object*]* // a list of (operator, arguments[])
The traversal
g.V(1).outE(‘created’).inV().
repeat(out(‘created’).in(‘created’)).times(5).
valueMap(’name’,’age’)
has a ByteCode representation as below:
[
[V, 1]
[outE, ‘created’]
[inV]
[repeat, [
[out, ‘created’]
[in, ‘created’]
]
[times, 5]
[valueMap, name, age]
]
Again, Gremlin is a simple language based on function concatenation and
nesting. Thats all there is to it. Thus, it forms a tree and trees are easy to
encode, distribute, serialize, decode, prune/optimize, search, etc. Moreover,
every programming language supports function composition and nesting and thus,
Gremlin is able to be hosted in any programming language.
[http://tinkerpop.apache.org/gremlin.html
<http://tinkerpop.apache.org/gremlin.html>]
The benefit of ByteCode as it applies to TINKERPOP-1278 is that a Translator is
able to access the ByteCode of the traversal and then use that linear-nested
structure (wide-tree) to generate a traversal representation in another
language — e.g. Gremlin-Python, Gremlin-Ruby, Gremlin-JavaScript, etc.
Here is the Gremlin-Python translator that will turn ByteCode into
Gremlin-Python:
https://github.com/apache/tinkerpop/blob/c34fe9fc2cee8bf231c7d4a2d52e746053273129/gremlin-python/src/main/java/org/apache/tinkerpop/gremlin/java/translator/PythonTranslator.java
<https://github.com/apache/tinkerpop/blob/c34fe9fc2cee8bf231c7d4a2d52e746053273129/gremlin-python/src/main/java/org/apache/tinkerpop/gremlin/java/translator/PythonTranslator.java>
Here is the Gremlin-Groovy translator that will turn ByteCode into
Gremlin-Groovy:
https://github.com/apache/tinkerpop/blob/c34fe9fc2cee8bf231c7d4a2d52e746053273129/gremlin-groovy/src/main/java/org/apache/tinkerpop/gremlin/java/translator/GroovyTranslator.java
<https://github.com/apache/tinkerpop/blob/c34fe9fc2cee8bf231c7d4a2d52e746053273129/gremlin-groovy/src/main/java/org/apache/tinkerpop/gremlin/java/translator/GroovyTranslator.java>
Pretty simple, eh? So, why would you want Gremlin-Java to translate to
Gremlin-Groovy? Well, so you can code in Gremlin-Java and then have it execute
on GremlinServer via the GremlinGroovy JSR223 ScriptEngine. However, one can
imagine a Gremlin-Java->Gremlin-Java translator! What is that?! Well, it would
use reflection (or some more efficient mechanism) to reconstruct the
Gremlin-Java traversal from ByteCode generated from Gremlin-Java and thus, the
entire cluster/sever infrastructure is simply migrating ByteCode around as
opposed to worrying about language specific representation — e.g. it has
nothing to do with the JVM! Also, assume a Python-based graph database exists
that implements the GreminVM — Gremlin-Java can easily talk to it via ByteCode.
To ensure ByteCode generation is not costly, here are the runtimes for
construction and compilation of a “fairly complex” traversal in both master/
and TINKERPOP-1278/
master/
gremlin>
clock(5000){g.V().repeat(out()).times(2).as("a").union(both(),out()).dedup().as("b").select("a","b").identity()
}
==>0.004184923
gremlin>
clock(5000){g.V().repeat(out()).times(2).as("a").union(both(),out()).dedup().as("b").select("a","b").identity().applyStrategies()
}
==>0.0264354126
TINKERPOP-1278/
gremlin>
clock(5000){g.V().repeat(out()).times(2).as("a").union(both(),out()).dedup().as("b").select("a","b").identity()
}
==>0.0048526357999999995
gremlin>
clock(5000){g.V().repeat(out()).times(2).as("a").union(both(),out()).dedup().as("b").select("a","b").identity().applyStrategies()
}
==>0.0245924514
Finally, there are various entailments that come from this ByteCode
representation that have started to surface in my mind:
1. We can now optimize at the ByteCode level and not at the step level
— ByteCodeStrategies could be a new TraversalStrategy class that comes before
DecorationStrategies.
2. Gremlin-Java can support bindings so that its Gremlin-Groovy (e.g.)
compiled form uses bindings. How? By simply rewriting the ByteCode prior to
compilation and replacing values with variables!
3. GraphSON can now easily support Traversal serialization — ByteCode
in JSON is natural. Gremlin-GraphSON anyone? (get it?)
[g, V : [1], outE : [created], inV : repeat : [out : [created],
in : [created]], times : 5], valueMap : [name, age]]
This is what makes Gremlin so powerful — its syntax is crazy simple as its just
functions and thus, it can naturally exist in any language — even XML! …
ByteCode is what is going to free Gremlin from the JVM…as we are already now on
the CPython VM with relative ease:
PythonGraphTraversal
https://github.com/apache/tinkerpop/blob/c34fe9fc2cee8bf231c7d4a2d52e746053273129/gremlin-python/src/main/jython/gremlin_python/graph_traversal.py
<https://github.com/apache/tinkerpop/blob/c34fe9fc2cee8bf231c7d4a2d52e746053273129/gremlin-python/src/main/jython/gremlin_python/graph_traversal.py>
JythonTranslator (Gremlin-Python to Gremlin-Jython)
https://github.com/apache/tinkerpop/blob/c34fe9fc2cee8bf231c7d4a2d52e746053273129/gremlin-python/src/main/jython/gremlin_python/jython_translator.py
<https://github.com/apache/tinkerpop/blob/c34fe9fc2cee8bf231c7d4a2d52e746053273129/gremlin-python/src/main/jython/gremlin_python/jython_translator.py>
GroovyTranslator (Gremlin-Python to Gremlin-Groovy)
https://github.com/apache/tinkerpop/blob/c34fe9fc2cee8bf231c7d4a2d52e746053273129/gremlin-python/src/main/jython/gremlin_python/groovy_translator.py
<https://github.com/apache/tinkerpop/blob/c34fe9fc2cee8bf231c7d4a2d52e746053273129/gremlin-python/src/main/jython/gremlin_python/groovy_translator.py>
*** NOTE: I have yet to introduce the ByteCode concepts to gremlin-python/
package so use your imagination when seeing how PythonGraphTraversal will
construct ByteCode and the respective translators will translate it. Finally,
realize that people can now compile Python ByteCode into Python-based steps and
thus, Gremlin can live and execute on the Python VM against Python-based graph
systems. Its really that easy (though, lots of work to implement the 30 some
standard steps in Python). What this means though is that, in the future,
Gremlin can just move between languages/VMs … between Python, JVM, Ruby,
JavaScript, C, etc.-based graph processing systems. One language, tailored to
its host, and agnostic to the underlying virtual machine.
Enjoy,
Marko.
http://markorodriguez.com