On 12 Mar 2012, at 20:04, Avery Ching wrote: > > My suggestion would be the following: > > Run a MR job to join all your RDFs on the vertex key and you can either > convert them to an easy format to parse with a custom VertexInputFormat of > your choice. If these are one way relationships, you need not create the > target vertex. If they are undirect relationships, when you are processing > your RDFs in the MR job, add a directed relationship to both vertices.
Avery, thanks for the feedback. I was not thinking about using Map-Reduce in that way, but I guess thats a very good idea. However, besides the amount of pre-processing required for using Giraph/Hadoop, the transient nature of the Giraph graph, is also an issue. The scenario of which I am thinking, is that for each run of my algorithm, just 1% or less of the data is changed. So 99% stay the same every time, and they need to be loaded again for each run. That wont be a problem if the computation of the algorithm itself is a lot longer then loading the graph data. However, that might not be always the case. So right now I am trying to get a feeling for that trade-off, and for the different alternatives to solving the main research problem ;) Thanks again for the reply, cheers, Benjamin.