On 12 Mar 2012, at 20:04, Avery Ching wrote:
> My suggestion would be the following:
> Run a MR job to join all your RDFs on the vertex key and you can either
> convert them to an easy format to parse with a custom VertexInputFormat of
> your choice. If these are one way relationships, you need not create the
> target vertex. If they are undirect relationships, when you are processing
> your RDFs in the MR job, add a directed relationship to both vertices.
Avery, thanks for the feedback.
I was not thinking about using Map-Reduce in that way, but I guess thats a very
However, besides the amount of pre-processing required for using Giraph/Hadoop,
the transient nature of the Giraph graph,
is also an issue. The scenario of which I am thinking, is that for each run of
my algorithm, just 1% or less of the data is changed.
So 99% stay the same every time, and they need to be loaded again for each run.
That wont be a problem if the computation of the algorithm itself is a lot
longer then loading the graph data.
However, that might not be always the case.
So right now I am trying to get a feeling for that trade-off, and for the
different alternatives to solving the main research problem ;)
Thanks again for the reply, cheers, Benjamin.