Hi,
On Friday, I was working with Dan LaRocque on getting
Spark/GiraphGraphComputers working over Titan. Luckily, it was pretty trivial
to do. However, a few quirks emerged around "the resultant graph" that I think
would be nice to solve.
In GraphComputer, we have two enums: Persist{NOTHING, VERTEX_PROPERTIES, EDGES}
and ResultGraph{ORIGINAL,NEW}.
I think we should modify this a bit to make cross vendor use of GraphComputers
cleaner. Moreover, I think we can get a lot of leverage using the Attachable
interface of I/O. See what you think:
graph.compute().program(PageRankVertexProgram).result(configuration,
(destinationGraph, attachableVertex) ->
Attachable.Method.create(destinationGraph)).submit()
What does this mean?
We should get rid of the GraphComputer.persist() and
GraphComputer.resultGraph() methods and replace it with a
GraphComputer.result() method. This method takes a Configuration which is used
to construct a Graph via GraphFactory.open(configuration). Thus, OLTP databases
like Titan/Neo4j/etc., this would be a connection to the database. For,
TinkerGraph, the configuration would have the graph object in it (via
setProperty()) or a completely "new TinkerGraph()." For HDFS graphs, the
configuration would be the directory of the place to write the graph data (in
essence, just a standard HadoopGraph properties file). When we get ServerGraph
implemented, this would just be a GremlinServer connection. If no result() is
provided, then ComputerResult.getGraph() would just return
EmptyGraph.instance(). So, now this generalizes the ResultGraph.ORIGINAL/NEW
situation, where a GraphComputer's resultant graph can write to any Graph --
i.e., vendor agnostic. For instance, Titan's FulgoraGraphComputer could, in
principle, write its compute graph result out to Neo4j.
Next, the BiFunction provided is how to write the computed vertex to the result
graph. It would be great if this was all via Attachable, but then that assumes
all vendors are operating on Attachable vertices, which isn't the case for
TinkerGraph nor Titan. This is where Stephen and I would need to think, but, in
general, its simply a "getOrCreate"-style method for taking the vertex and
writing it to the destination. Like Attachable, we could have static methods
for common use cases -- XXX.writeVertexProperties(),
XXX.writeVertexProperties(Map<String,String> propertyConverter),
XXX.writeVertexPropertiesAndEdges(), etc. This way, its up to the end user to
determine how they want the results to be handled and again, its
vendor-agnostic (Graph -> Graph --- what those Graph instances are, who cares).
Thoughts?,
Marko.
http://markorodriguez.com