OLAP and Graph.io()

Marko Rodriguez Thu, 30 Apr 2015 09:03:10 -0700

Hi,

Stephen is interested in making sure that Graph.io() works cleanly for both 
OLTP and OLAP. In particular, making sure that io().readGraph() and 
io().writeGraph() can be used in both OLTP and OLAP situations seamlessly much 
like Gremlin does for traversals.


------------

OLAP graph writing will occur via a (yet to be written) 
BulkLoaderVertexProgram. BulkLoaderVertexProgram takes a Graph (with 
vertices/edges) and writes to another Graph. In essence, two graphs, where the 
first graph has the data and the second is empty. I always expected this to 
typically happen via Hadoop (HadoopGraph) -> VendorDatabase (VendorGraph). 
However, while most distributed graph database vendors will leverage 
Hadoop/Giraph/Spark for their OLAP bulk loading operations because of HDFS, we 
can't always assume this -- especially in the context of OLAP Graph.io().

Thus, BulkLoaderVertexProgram shouldn't just operate on Graph->Graph, but can 
optionally stream in a file as well, File->Graph. This means we have to get 
into the concept of "InputSplits" at the gremlin-core level. A quick and dirty 
is to simply serially load the graph data from a file, this is not the optimal 
solution, but can move us forward on the Graph.io() API.

To the API of Graph.io(). This would mean, like Traversal, the user can specify 
a Computer to use to do the readGraph().

        graph.io().readGraph(file, graph.compute(MyGraphComputer.class))

For writeGraph()

        graph.io().writeGraph(file,graph.compute(MyGraphComputer.class))
 

Where, "file" can be a directory in both situations and each "worker" of the 
GraphComputer reads/writes a split.

Thoughts?,
Marko.

http://markorodriguez.com

OLAP and Graph.io()

Reply via email to