Hello,
Throughout our documentation we show uses of the “Blueprints API” (i.e.
Graph/Vertex/Edge/etc. classes & methods) as well as the use of the Traversal
API (i.e. Gremlin).
Enabling users to have two ways of interacting with the graph system has its
problems:
1. The DetachedXXX problem — how much data should a returned
vertex/edge/etc. have associated with it?
2. graph.addVertex() and g.addV() — which should I use? The first is
faster but is not recommended.
3. SubgraphStrategy leaking — I get subgraphs with Gremlin, but can
then directly interact with the vertex objects to see more than I should.
4. VertexProgram model — I write traversals with Traversal API, but
then develop VertexPrograms with the Blueprints API. That’s weird.
5. GremlinServer returning fat objects — Serializers are created
property-rich vertices and edges. The awkward HaltedTraversalStrategy solution.
6. … various permutations of these source problems.
I propose that we solve this problem once and for all in TinkerPop4 as follows:
There should be two “Graph APIs.”
1. Provider Graph API: This is the current Blueprints API with
Graph.addVertex(), Vertex.edges(), Edge.inVertex(), etc.
3. User Graph API: This is a ReferenceXXX API.
Lets talk about the second as its more novel and distinct from current
practices.
We should have ReferenceGraph which is simply a reference/dummy/proxy to the
provider Graph API. ReferenceGraph has the following API:
ReferenceGraph.open()
ReferenceGraph.close()
ReferenceGraph.tx() // assuming we like the current transaction model (??)
ReferenceGraph.traversal()
That is it. What does this entail? Assume the following traversal:
g = ReferenceGraph.open(config).traversal()
g.V(1).out(‘knows’)
ReferenceGraph is almost like a “RemoteGraph” (RemoteStrategy) in that it makes
a connection (remote or inter-JVM) to the provider Graph API. When
g.V(1).out(‘knows’) executes, it is really sending the bytecode to the provider
Graph for execution (as specified by the config of ReferenceGraph.open()).
Thus, once it hits the provider's graph, ProviderVertex, ProviderEdge, etc. are
the objects being processed. However, what the traversal’s Iterator<Vertex>
returns is ReferenceVertex! That is, it never returns ProviderVertex. In this
way, regardless if the user is going “over the wire” or within the same JVM or
against a different provider’s graph database or from Gremlin-Python/C#/etc.,
all the vertices are simply ‘reference vertices’ (id + label). This makes it so
that users never interact with the graph element objects themselves directly.
They can ONLY interact with the graph via traversals! At most they can
ReferenceVertex.id() and ReferenceVertex.label(). Thats it, — no mutations, not
walking edges, nada! And moreover, since ReferenceXXX has enough information to
re-attach to the source graph, they can always do the following to get more
information:
v = g.V(1).out(‘knows’).next()
g.V(v).values(‘name’)
This split into two Graph APIs will enables us to make a hard boundary between
what the provider (vendor) needs to implement and what the user (developer)
gets to access. This distinction should solve the problems articulated at the
start of this email.
Thoughts?,
Marko.
http://markorodriguez.com