Hi,

On Friday, I was working with Dan LaRocque on getting 
Spark/GiraphGraphComputers working over Titan. Luckily, it was pretty trivial 
to do. However, a few quirks emerged around "the resultant graph" that I think 
would be nice to solve.

In GraphComputer, we have two enums: Persist{NOTHING, VERTEX_PROPERTIES, EDGES} 
and ResultGraph{ORIGINAL,NEW}.

I think we should modify this a bit to make cross vendor use of GraphComputers 
cleaner. Moreover, I think we can get a lot of leverage using the Attachable 
interface of I/O. See what you think:

        graph.compute().program(PageRankVertexProgram).result(configuration, 
(destinationGraph, attachableVertex) -> 
Attachable.Method.create(destinationGraph)).submit()

What does this mean? 

We should get rid of the GraphComputer.persist() and 
GraphComputer.resultGraph() methods and replace it with a 
GraphComputer.result() method. This method takes a Configuration which is used 
to construct a Graph via GraphFactory.open(configuration). Thus, OLTP databases 
like Titan/Neo4j/etc., this would be a connection to the database. For, 
TinkerGraph, the configuration would have the graph object in it (via 
setProperty()) or a completely "new TinkerGraph()." For HDFS graphs, the 
configuration would be the directory of the place to write the graph data (in 
essence, just a standard HadoopGraph properties file). When we get ServerGraph 
implemented, this would just be a GremlinServer connection. If no result() is 
provided, then ComputerResult.getGraph() would just return 
EmptyGraph.instance(). So, now this generalizes the ResultGraph.ORIGINAL/NEW 
situation, where a GraphComputer's resultant graph can write to any Graph -- 
i.e., vendor agnostic. For instance, Titan's FulgoraGraphComputer could, in 
principle, write its compute graph result out to Neo4j.

Next, the BiFunction provided is how to write the computed vertex to the result 
graph. It would be great if this was all via Attachable, but then that assumes 
all vendors are operating on Attachable vertices, which isn't the case for 
TinkerGraph nor Titan. This is where Stephen and I would need to think, but, in 
general, its simply a "getOrCreate"-style method for taking the vertex and 
writing it to the destination. Like Attachable, we could have static methods 
for common use cases -- XXX.writeVertexProperties(), 
XXX.writeVertexProperties(Map<String,String> propertyConverter), 
XXX.writeVertexPropertiesAndEdges(), etc. This way, its up to the end user to 
determine how they want the results to be handled and again, its 
vendor-agnostic (Graph -> Graph --- what those Graph instances are, who cares).

Thoughts?,
Marko.

http://markorodriguez.com

Reply via email to