On Thu, Sep 8, 2011 at 11:07 AM, Severin Andreas Corsten
> 1: Am I right in the assumption that Giraph does not split the input file by
> itself. Assume that I have got a graph in one single file,
> Giraph sends the whole graph to one worker while the rest of the workers is
> just idle.
Giraph uses the number of InputSplits returned by your
For VertextInputFormats that wrap Hadoop TextInputFormat, what you've
said will be true, and graph input in one, small file, will all be
sent to one worker. As a cheap work-around for this, we've the
FileInputFormat split size arbitrarily small :
FileInputFormat.setMaxInputSplitSize(bspJob, 1048576); //
number of bytes in one meg
Additionally, Giraph has re-balancing features that can give work to
under-used workers in subsequent supersteps, but I haven't played with
them. (I'm not sure they would have helped me anyway, as my graph,
even though it had a small input file, wouldn't fit into memory on one
> 2: I read through the source code and found a part saying that vertices must
> be presented in id-order. Is that a task the user has
> to do or is there a workaround to have vertices not in id-order?
Sorting the input into Id-order is for the user to do. There are open
JIRAs, like GIRAPH-11  to improve the situation here.
> 3: The VertexRange class provides the assignment between vertices and
> workers. Is there a way to override the
> standard implementation and use a custom assignment system?
I have no idea, but the work in GIRAPH-11 will probably give a clue
> Thanks in advance.
> Kind regards / Mit freundlichen Grüßen
> Severin Andreas Corsten
> DHBW-Student Business Informatics 2009 - University Programs
> IBM Sales & Distribution, Human Resources
> Phone: 1-408-927-2750
> Mobile (Germany): 49-160-98976935
> E-mail: severin.cors...@de.ibm.com
> Hechtsheimer Str. 2
> Mainz, 55131