Hi,
Stephen is interested in making sure that Graph.io() works cleanly for both
OLTP and OLAP. In particular, making sure that io().readGraph() and
io().writeGraph() can be used in both OLTP and OLAP situations seamlessly much
like Gremlin does for traversals.
------------
OLAP graph writing will occur via a (yet to be written)
BulkLoaderVertexProgram. BulkLoaderVertexProgram takes a Graph (with
vertices/edges) and writes to another Graph. In essence, two graphs, where the
first graph has the data and the second is empty. I always expected this to
typically happen via Hadoop (HadoopGraph) -> VendorDatabase (VendorGraph).
However, while most distributed graph database vendors will leverage
Hadoop/Giraph/Spark for their OLAP bulk loading operations because of HDFS, we
can't always assume this -- especially in the context of OLAP Graph.io().
Thus, BulkLoaderVertexProgram shouldn't just operate on Graph->Graph, but can
optionally stream in a file as well, File->Graph. This means we have to get
into the concept of "InputSplits" at the gremlin-core level. A quick and dirty
is to simply serially load the graph data from a file, this is not the optimal
solution, but can move us forward on the Graph.io() API.
To the API of Graph.io(). This would mean, like Traversal, the user can specify
a Computer to use to do the readGraph().
graph.io().readGraph(file, graph.compute(MyGraphComputer.class))
For writeGraph()
graph.io().writeGraph(file,graph.compute(MyGraphComputer.class))
Where, "file" can be a directory in both situations and each "worker" of the
GraphComputer reads/writes a split.
Thoughts?,
Marko.
http://markorodriguez.com