Re: GraphComputer and the Resultant Graph

Matthias Broecheler Mon, 04 May 2015 13:05:38 -0700

Hi Marko,

I think in principle this could work but my reservations are:
1) Every GraphComputer implementation would have to implement a lot of
redundant logic to write graph data into an arbitrary graph
2) Can this be done efficiently at scale? There are lot of tweaks and
custom code that is typically needed to efficiently write lots of graph
data into a target graph database. In particular if you are thinking about
large graph data sets like those generated by Hadoop,Giraph or Spark. I
think it won't be reasonable to generalize the "fast bulk loading" logic
into each and ever GraphComputer implementation.
3) Redundancy: Most vendors will already have a BulkLoaderVertexProgram of
some sort.


Wouldn't it be better if we allowed chaining of VertexPrograms? In other
words, run a PageRank first and then write the results into whatever target
graph you want through a BulkLoaderVertexProgram that can be
tuned/customized for a particular target graph?

Then we could introduce optimizations, like "if you only want to persist
properties then you can only need one iteration of the BLVP and we run it
together with the last iteration of the previous VP".

Thoughts?
Matthias

On Mon, May 4, 2015 at 8:38 AM Marko Rodriguez <[email protected]> wrote:

> Hi,
>
> On Friday, I was working with Dan LaRocque on getting
> Spark/GiraphGraphComputers working over Titan. Luckily, it was pretty
> trivial to do. However, a few quirks emerged around "the resultant graph"
> that I think would be nice to solve.
>
> In GraphComputer, we have two enums: Persist{NOTHING, VERTEX_PROPERTIES,
> EDGES} and ResultGraph{ORIGINAL,NEW}.
>
> I think we should modify this a bit to make cross vendor use of
> GraphComputers cleaner. Moreover, I think we can get a lot of leverage
> using the Attachable interface of I/O. See what you think:
>
>
> graph.compute().program(PageRankVertexProgram).result(configuration,
> (destinationGraph, attachableVertex) ->
> Attachable.Method.create(destinationGraph)).submit()
>
> What does this mean?
>
> We should get rid of the GraphComputer.persist() and
> GraphComputer.resultGraph() methods and replace it with a
> GraphComputer.result() method. This method takes a Configuration which is
> used to construct a Graph via GraphFactory.open(configuration). Thus, OLTP
> databases like Titan/Neo4j/etc., this would be a connection to the
> database. For, TinkerGraph, the configuration would have the graph object
> in it (via setProperty()) or a completely "new TinkerGraph()." For HDFS
> graphs, the configuration would be the directory of the place to write the
> graph data (in essence, just a standard HadoopGraph properties file). When
> we get ServerGraph implemented, this would just be a GremlinServer
> connection. If no result() is provided, then ComputerResult.getGraph()
> would just return EmptyGraph.instance(). So, now this generalizes the
> ResultGraph.ORIGINAL/NEW situation, where a GraphComputer's resultant graph
> can write to any Graph -- i.e., vendor agnostic. For instance, Titan's
> FulgoraGraphComputer could, in principle, write its compute graph result
> out to Neo4j.
>
> Next, the BiFunction provided is how to write the computed vertex to the
> result graph. It would be great if this was all via Attachable, but then
> that assumes all vendors are operating on Attachable vertices, which isn't
> the case for TinkerGraph nor Titan. This is where Stephen and I would need
> to think, but, in general, its simply a "getOrCreate"-style method for
> taking the vertex and writing it to the destination. Like Attachable, we
> could have static methods for common use cases --
> XXX.writeVertexProperties(), XXX.writeVertexProperties(Map<String,String>
> propertyConverter), XXX.writeVertexPropertiesAndEdges(), etc. This way, its
> up to the end user to determine how they want the results to be handled and
> again, its vendor-agnostic (Graph -> Graph --- what those Graph instances
> are, who cares).
>
> Thoughts?,
> Marko.
>
> http://markorodriguez.com
>
>

Re: GraphComputer and the Resultant Graph

Reply via email to