Hello,
This is an update of what I’ve been up to on the tp4/ branch since the last
report 2 weeks ago.
1. Arguments
TP4 brings the concept of an Argument to the front and center.
An argument can either be a constant (e.g. 2) or a dynamically determined value
(e.g. out().count()). This means that users will be able to do things such as:
* has(‘name’,out(‘father’).value(‘name’)) // is he a jr?
* is(eq(out(‘mananger’))) // is he is own boss?
This flexibility is starting to make the steps bleed into each
other.
is(eq(select(‘a’))) == where(eq(‘a’))
One Gremlin-C# guy on Twitter was saying that Gremlin has too
many ways to do things. It will be nice if we can reduce the number of steps we
have with Arguments.
2. Console
Java9+ brings with it JShell. I posed the question on dev@ — do
we need GremlinConsole?
https://lists.apache.org/thread.html/b9083cf992b01bcfe4b82d14b9aa2d30c90707c4c134c6cfefade4ae@%3Cdev.tinkerpop.apache.org%3E
<https://lists.apache.org/thread.html/b9083cf992b01bcfe4b82d14b9aa2d30c90707c4c134c6cfefade4ae@%3Cdev.tinkerpop.apache.org%3E>
It is possible to configure JShell to look (and feel?) like the
GremlinConsole with a short startup script.
I would like to shoot for TP4 being as small and compact as
possible — less to build, less to document, less to maintain, …
Gremlin-Java -> JShell, Gremlin-Groovy -> GroovySh,
Gremlin-Python -> Python CLI, … why not reuse?
The most beautiful code is the code that was never written. The
greatest programmers are those that coded themselves out of a job. Let us be
great and beautiful.
3. Data Structures
I’m still trying to figure out how to generalize Gremlin out of
graph. Limited luck.
Worked with Kuppitz a bit on how to represent all steps using
just map, flatmap, reduce, filter, branch only! (its a little too nutz for my
tastes, but maybe…)
https://twitter.com/twarko/status/1109491874333515778
<https://twitter.com/twarko/status/1109491874333515778>
Ryan Wisnesky was kind enough to provide a demo of his Category
Query Language (CQL) on Monday. Cool stuff indeed.
Ryan pointed me to this paper which I found worthwhile:
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.49.3252&rep=rep1&type=pdf
<http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.49.3252&rep=rep1&type=pdf>
This is the big unknown for me and I want to solve it. If we
can do this right, TinkerPop will permeate all things Apache…all things data.
https://twitter.com/twarko/status/1109540859442163712
<https://twitter.com/twarko/status/1109540859442163712>
4. The Machine
I introduced the Machine interface.
https://github.com/apache/tinkerpop/blob/tp4/java/machine/machine-core/src/main/java/org/apache/tinkerpop/machine/Machine.java
<https://github.com/apache/tinkerpop/blob/tp4/java/machine/machine-core/src/main/java/org/apache/tinkerpop/machine/Machine.java>
This interface encompasses both TraversalSource and
RemoteConnection functionality.
The general use is g =
Gremlin.traversal(machine).withProcess(...).withStrategy(...)
This move turned Gremlin into basically “nothing” — Gremlin is
a just the “builder-pattern” applied to Bytecode. Check out how small Gremlin
is!
https://github.com/apache/tinkerpop/tree/tp4/java/language/gremlin/src/main/java/org/apache/tinkerpop/language/gremlin
<https://github.com/apache/tinkerpop/tree/tp4/java/language/gremlin/src/main/java/org/apache/tinkerpop/language/gremlin>
Thats it. ?! … Gremlin is trivial. Much less to
consider for Gremlin-JS, Gremlin-C#, Gremlin-?? …
5. RemoteMachine, TraverserServer, and MachineServer
https://twitter.com/twarko/status/1110612168968265729
<https://twitter.com/twarko/status/1110612168968265729>
“GremlinServer” is too serial in concept. Receive bytecode,
execute bytecode, aggregate traversers, return traversers.
- This is bad. We need to start thinking distributed
execution and aggregation from the start. We need to blur the concept of a
“server.”
https://github.com/apache/tinkerpop/tree/tp4/java/machine/machine-core/src/main/java/org/apache/tinkerpop/machine/species/remote
<https://github.com/apache/tinkerpop/tree/tp4/java/machine/machine-core/src/main/java/org/apache/tinkerpop/machine/species/remote>
MachineServer — sits somewhere an accepts Bytecode.
(multi-threaded server)
RemoteMachine — can talk to a MachineServer to submit
Bytecode. (single socket client)
Processor — exists throughout the cluster and executes
bytecode. (parallel/distributed execution engine)
TraverserServer — can sit somewhere and parallellily
(?is that a word?) accept traverser results. (multi-threaded server)
The thing which accepts bytecode, the thing which executes
bytecode, and the thing which aggregates results are all different things and
the entailments are worthy.
Much like how the Machine interface killed the complexity of
Gremlin, I believe this server architecture will kill the complexity of
GremlinServer.
- The biggest part of our I/O will be the binary
protocol (for now I’m just using Object[Input/Output]Stream).
6. Implementing Instructions
I’m trying not to rip out the full language as I just want to
focus on implementing only one instruction from each “class” of instruction.
This way, if an insight comes, large amounts of code don’t need
to be rewritten.
My latest achievement was the implementation of
order().by().by(). [from the barrier class of instructions]
- Along with match() and repeat(), this is arguably one
of the more difficult steps to implement.
- The TP4 implementation is 1/3 the size of the TP3
implementation and it just worked right out of the box on Apache Beam.
- The abstract VM model we have in TP4 is simple and
consistent. Complex operations are just working.
There you have it. That is a review of the tp4/ branch over the last two weeks.
Moving forward, I hope to make headway on the following:
* AkkaProcessor
- unlike Pipes and Beam where Function is the thread of
execution, for Akka, Traverser is the thread of execution.
- Will the TP4 architecture be able to naturally support this
conceptual tweak? TP3 couldn’t.
* A data structure breakthrough.
- Contrary to popular belief, everything is not a graph.
- The only time I think “graph” is when I talk to a graphdb.
- For the most part I think in lists, maps, sets, primitives —
don’t you?
* A better understanding of the TP4 instruction set.
- What is truly needed? What is our core instruction set?
* A documentation infrastructure stub.
- Gremlin-Groovy away… how do we do documentation?
* Traverser species
- I’m currently copying the TP3 model. I didn’t like it before
and I still don’t like it.
* Strategies
- I haven’t worked on this much, but I believe we might have
“strategies” all wrong (these are our compiler optimizations).
- The TP3 model worked well enough for TP3, but for TP4, I
think we might need a major conceptual overhaul.
- Just a feeling at this point…
Thanks for reading. As always, I’m more than happy to receive any questions or
comments.
Take care,
Marko.
http://rredux.com <http://rredux.com/>