Hello,

Gremlin bytecode provides a language agnostic way of sending Gremlin traversals 
between machines — whether physical or virtual. For instance, it is possible to 
send bytecode from one JVM to another or from CPython to the JVM across the 
network. Once bytecode is received, it needs to be translated into a 
representation that the processing VM can then evaluate.

GremlinServer is smart in that when bytecode is received it will analyze it for 
lambdas. If there are lambdas, written in language X, then it will use 
XTranslator and XScriptEngine to evaluate the bytecode and create a Traversal 
for evaluation. However, if there are no lambdas, then it will use 
JavaTranslator to create a Traversal for evaluation.

So, the question for me is:

        Is JavaTranslator (which uses Java reflection to convert bytecode to 
Traversal) faster than GroovyTranslator/GroovyScriptEngine (which creates a 
String script for and evaluates it in the ScriptEngine)?

Lets see. Here is our script in total.

import org.apache.tinkerpop.gremlin.jsr223.JavaTranslator
import org.apache.tinkerpop.gremlin.groovy.jsr223.GroovyTranslator

//// EXECUTED LOCALLY (e.g. CLIENT APPLICATION) ////

g = EmptyGraph.instance().traversal()

t = g.V().has('name','marko').
      repeat(out()).times(2).
      groupCount().by('name'); []
bytecode = t.bytecode
// send the bytecode over the wire

//// EXECUTED REMOTELY (e.g. GREMLIN SERVER) ////

groovy = new GremlinGroovyScriptEngine()
bindings = groovy.createBindings()
bindings.put('g',g)
compiled = groovy.compile(GroovyTranslator.of('g').translate(bytecode))

x = JavaTranslator.of(g).translate(bytecode); []
y = compiled.eval(bindings); []
z = groovy.eval(GroovyTranslator.of('g').translate(bytecode), bindings); []
x == y
y == z
z == x
x.toString()

clock(1000){ JavaTranslator.of(g).translate(bytecode) }
clock(1000){ compiled.eval(bindings) } // caching
clock(1000){ groovy.reset(); 
groovy.eval(GroovyTranslator.of('g').translate(bytecode), bindings) } // no 
caching

First, lets make sure they all return the same traversal:

gremlin> x = JavaTranslator.of(g).translate(bytecode); []
gremlin> y = compiled.eval(bindings); []
gremlin> z = groovy.eval(GroovyTranslator.of('g').translate(bytecode), 
bindings); []
gremlin> x == y
==>true
gremlin> y == z
==>true
gremlin> z == x
==>true
gremlin> x.toString()
==>[GraphStep(vertex,[]), HasStep([name.eq(marko)]), 
RepeatStep([VertexStep(OUT,vertex), 
RepeatEndStep],until(loops(2)),emit(false)), GroupCountStep(value(name))]
gremlin>

Great. They do. Now lets see how fast they are.

gremlin> clock(1000){ JavaTranslator.of(g).translate(bytecode) }
==>0.004768085
gremlin> clock(1000){ compiled.eval(bindings) } // caching
==>0.015168259
gremlin> clock(1000){ groovy.reset(); 
groovy.eval(GroovyTranslator.of('g').translate(bytecode), bindings) } // no 
caching
==>40.790075693
gremlin>

Cool. JavaTranslator is about 1000x faster than a evaluating a String script 
and about 3x faster than evaluating a compiled script. JavaTranslator takes 
about 40 micro-seconds to translate the bytecode, while an uncached String 
script takes 40 milliseconds.

So, what did we learn?

        1. Bytecode is slick in that we don’t have to use Gremlin-Groovy to 
evaluate it (if there are no lambdas) and thus, can do everything in Java and 
fast!
        2. It very important to always use parameterized queries with 
GremlinServer/etc. as you can see how costly it is to evaluate a String script 
repeatedly.

What is crazy is that my JavaTranslator code is gheeeeeetto.

        
https://github.com/apache/tinkerpop/blob/master/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/jsr223/JavaTranslator.java
 
<https://github.com/apache/tinkerpop/blob/master/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/jsr223/JavaTranslator.java>

If anyone wants to submit a PR to make JavaTranslator more efficient, please 
do. However, we are still doing well with what we have regardless.

Take care,
Marko.

http://markorodriguez.com



Reply via email to