It would be nice if this change could just be treated as an overload to read/writeGraph() so in that sense it sounds good to me. I presume that the underlying work done by the BulkLoader/DumperVertexProgram would simply be using the existing read/writeVertex functions on the GraphReader/Writer implementations themselves. In that way, the BulkLoader/DumperVertexProgram would have access to any custom serializers required by the Graph instance.
On Thu, Apr 30, 2015 at 12:01 PM, Marko Rodriguez <[email protected]> wrote: > Hi, > > Stephen is interested in making sure that Graph.io() works cleanly for > both OLTP and OLAP. In particular, making sure that io().readGraph() and > io().writeGraph() can be used in both OLTP and OLAP situations seamlessly > much like Gremlin does for traversals. > > ------------ > > OLAP graph writing will occur via a (yet to be written) > BulkLoaderVertexProgram. BulkLoaderVertexProgram takes a Graph (with > vertices/edges) and writes to another Graph. In essence, two graphs, where > the first graph has the data and the second is empty. I always expected > this to typically happen via Hadoop (HadoopGraph) -> VendorDatabase > (VendorGraph). However, while most distributed graph database vendors will > leverage Hadoop/Giraph/Spark for their OLAP bulk loading operations because > of HDFS, we can't always assume this -- especially in the context of OLAP > Graph.io(). > > Thus, BulkLoaderVertexProgram shouldn't just operate on Graph->Graph, but > can optionally stream in a file as well, File->Graph. This means we have to > get into the concept of "InputSplits" at the gremlin-core level. A quick > and dirty is to simply serially load the graph data from a file, this is > not the optimal solution, but can move us forward on the Graph.io() API. > > To the API of Graph.io(). This would mean, like Traversal, the user can > specify a Computer to use to do the readGraph(). > > graph.io().readGraph(file, graph.compute(MyGraphComputer.class)) > > For writeGraph() > > graph.io().writeGraph(file,graph.compute(MyGraphComputer.class)) > > > Where, "file" can be a directory in both situations and each "worker" of > the GraphComputer reads/writes a split. > > Thoughts?, > Marko. > > http://markorodriguez.com > >
