Hello,
This is a “big idea” in that I don’t see it being helpful for TinkerPop3, but
something to consider for TinkerPop4.
We currently support two “internal” serialization mechanisms — Gryo and
GraphSON. We also support GraphML but this is primarily for data
reading/writing (interoperability with other graph frameworks).
The reason we have Gryo is that its “fast” and good for things that are
Java-to-Java (or internal to the Gremlin VM like OLAP serialization).
However:
1. With the development of the Gremlin language variants outside the
JavaVM (e.g. Gremlin-C# and Gremlin-Python), GraphSON is the only viable
serialization format.
2. With the development of other Gremlin virtual machines (e.g.
CosmosDB’s Gremlin .NET implementation), GraphSON is the only viable
serialization format.
I think we should work to get rid of Gryo for TinkerPop4 and make GraphSON the
default/standard/universal serialization mechanism.
To do this, we need to make it “fast.”
I saw an excellent talk by the ArangoDB guys at GraphDay about their immutable
JSON serialization format. In short, they can have a binary JSON representation
and can get data from it without having to turn it into a nested-Map structure.
In this way, they can do fast lookup operations on the byte stream.
With that, we could learn from their efforts and develop a class along the
lines of:
GraphSONByteStream.toJSON() -> yields the String { [ ] } JSON
representation.
GraphSONByteStream.toMap() -> yields the nested Map structure we are
familiar with in TinkerPop.
GraphSONByteStream.get(“/id”) -> does a random access lookup into the
byte stream so we don’t have to deserialize — all byte offset based.
In short, Gryo is too hard to manage and too Java specific. In the future, we
should look to making an ultra fast GraphSON serializer/deserializer along with
ensuring an elegant, self consistent representation of all the necessary
TinkerPop objects.
Thoughts?,
Marko.
http://markorodriguez.com