Hello,
There are two types of “programs” in Gremlin: Bytecode and Traversals.
Bytecode => Virtual machine instructions (like Java bytecode)
Traversals => Machine instructions (like Intel machine code)
The core of Gremlin’s compiler is its TraversalStrategies. A traversal strategy
works on a traversal-by-traversal level walking the traversal tree rewriting
sections of the traversal into (typically) more optimal forms.
void TraversalStrategy.apply(Traversal<S,E> traversal)
Working at the Traversal object level is important because the Gremlin language
steps (has(), out(), in(), etc.) don’t always map one-to-one with the machine
instructions (HasStep, VertexStep, VertexStep). Its better to work at the
machine-level because there are more nick-nack mutations one can do at that
level. However, as you can see, traversal strategies are “machine dependent.”
That is, they are tied to the Gremlin traversal machine implementation.
While there is currently only one Gremlin virtual machine (Gremlin-Java
machine), there are many Gremlin language variants — Gremlin-Java, -Groovy,
-Python, SQL-Gremlin, SPARQL-Gremlin, etc. When these languages communicate
with a/the Gremlin traversal machine, they communicate via Gremlin bytecode.
Now, it is possible to optimize bytecode. In principle, we can do “client side”
optimizations on the bytecode prior to sending it to the Gremlin traversal
machine for execution. Why would we want do this?
1. We can reduce the amount of work (clock cycles) required of “the
server” which would ultimately do the TraversalStrategy optimization.
2. We can have optimizations that are machine independent and thus, can
be useful against any Gremlin traversal machine implementation.
3. While the server is “streaming in” the Bytecode, it can also
optimize the bytecode prior to applying TraversalStrategy optimizations.
[Gremlin-Java Traversal Machine] <== network connection ==> [Gremlin-XXX
Language Variant]
* pre-process bytecode * pre-process
bytecode
before translating to traversal before sending
over network
* apply traversal strategies
* execute traversal
What would Bytecode strategies look like? Here is an idea:
void TraversalStrategy.apply(Bytecode bytecode)
Lets look at a simple strategy. IdentityRemoveStrategy will turn traversals of
the form g.V().identity().as(“a”).identity() into g.V().as(“a”). Here is this
strategy written in both Java and Python:
https://gist.github.com/okram/7bb2512935f8955551f9e3f87623b488
<https://gist.github.com/okram/7bb2512935f8955551f9e3f87623b488>
Given that there (currently) is no Gremlin-Python traversal machine
implementation, __apply_traversal(traversal) does nothing. However, given that
there is a Gremlin-Python language variant, __apply_bytecode(traversal) does
something. Moreover, note that we already have IdentityRemovalStrategy in
Gremlin-Python, but, as you can see, it does nothing as (currently) strategies
only operate on traversals.
https://github.com/apache/tinkerpop/blob/master/gremlin-python/src/main/jython/gremlin_python/process/strategies.py#L117-L119
<https://github.com/apache/tinkerpop/blob/master/gremlin-python/src/main/jython/gremlin_python/process/strategies.py#L117-L119>
AS A SIDE: The reason strategies exists in Gremlin-Python is so that users can
do stuff like:
https://github.com/apache/tinkerpop/blob/master/gremlin-python/src/main/jython/tests/driver/test_driver_remote_connection.py#L80-L98
<https://github.com/apache/tinkerpop/blob/master/gremlin-python/src/main/jython/tests/driver/test_driver_remote_connection.py#L80-L98>
Anywho, so there you have it. I’ve made a ticket:
https://issues.apache.org/jira/browse/TINKERPOP-1501
<https://issues.apache.org/jira/browse/TINKERPOP-1501>
You thoughts on the idea are more than appreciated.
Take care,
Marko.
http://markorodriguez.com