It would be nice if this change could just be treated as an overload to
read/writeGraph() so in that sense it sounds good to me.  I presume that
the underlying work done by the BulkLoader/DumperVertexProgram would simply
be using the existing read/writeVertex functions on the GraphReader/Writer
implementations themselves.  In that way,
the BulkLoader/DumperVertexProgram would have access to any custom
serializers required by the Graph instance.

On Thu, Apr 30, 2015 at 12:01 PM, Marko Rodriguez <[email protected]>
wrote:

> Hi,
>
> Stephen is interested in making sure that Graph.io() works cleanly for
> both OLTP and OLAP. In particular, making sure that io().readGraph() and
> io().writeGraph() can be used in both OLTP and OLAP situations seamlessly
> much like Gremlin does for traversals.
>
> ------------
>
> OLAP graph writing will occur via a (yet to be written)
> BulkLoaderVertexProgram. BulkLoaderVertexProgram takes a Graph (with
> vertices/edges) and writes to another Graph. In essence, two graphs, where
> the first graph has the data and the second is empty. I always expected
> this to typically happen via Hadoop (HadoopGraph) -> VendorDatabase
> (VendorGraph). However, while most distributed graph database vendors will
> leverage Hadoop/Giraph/Spark for their OLAP bulk loading operations because
> of HDFS, we can't always assume this -- especially in the context of OLAP
> Graph.io().
>
> Thus, BulkLoaderVertexProgram shouldn't just operate on Graph->Graph, but
> can optionally stream in a file as well, File->Graph. This means we have to
> get into the concept of "InputSplits" at the gremlin-core level. A quick
> and dirty is to simply serially load the graph data from a file, this is
> not the optimal solution, but can move us forward on the Graph.io() API.
>
> To the API of Graph.io(). This would mean, like Traversal, the user can
> specify a Computer to use to do the readGraph().
>
>         graph.io().readGraph(file, graph.compute(MyGraphComputer.class))
>
> For writeGraph()
>
>         graph.io().writeGraph(file,graph.compute(MyGraphComputer.class))
>
>
> Where, "file" can be a directory in both situations and each "worker" of
> the GraphComputer reads/writes a split.
>
> Thoughts?,
> Marko.
>
> http://markorodriguez.com
>
>

Reply via email to