Great answers and suggestion Joe. I just wanted to inline a few comments to what Joe wrote.

On 9/8/11 11:27 AM, Joseph Boyd wrote:
On Thu, Sep 8, 2011 at 11:07 AM, Severin Andreas Corsten
<severin.cors...@de.ibm.com>  wrote:

1: Am I right in the assumption that Giraph does not split the input file by 
itself. Assume that I have got a graph in one single file,
Giraph sends the whole graph to one worker while the rest of the workers is 
just idle.
Giraph uses the number of InputSplits returned by your
VertexInputFormat.getSplits() implementation.

For VertextInputFormats that wrap Hadoop TextInputFormat, what you've
said will be true, and graph input in one, small file, will all be
sent to one worker.  As a cheap work-around for this, we've the
FileInputFormat split size arbitrarily small :
           FileInputFormat.setMaxInputSplitSize(bspJob, 1048576); //
number of bytes in one meg

Additionally, Giraph has re-balancing features that can give work to
under-used workers in subsequent supersteps, but I haven't played with
them.  (I'm not sure they would have helped me anyway, as my graph,
even though it had a small input file, wouldn't fit into memory on one
worker).


2: I read through the source code and found a part saying that vertices must be 
presented in id-order. Is that a task the user has
to do or is there a workaround to have vertices not in id-order?
Sorting the input into Id-order is for the user to do.  There are open
JIRAs, like GIRAPH-11 [1] to improve the situation here.


3: The VertexRange class provides the assignment between vertices and workers. 
Is there  a way to override the
standard implementation and use a custom assignment system?
I have no idea, but the work in GIRAPH-11 will probably give a clue
what's involved.
GIRAPH-11 will change a lot about the way vertices are assigned. There will be an option for hashing, hash ranges, or user-defined ranges. There is also a way to control the assignment of vertex ranges to at some level right now (this will likely change a bit as well after GIRAPH-11).

In GiraphJob, there is a method

    /**
     * Set the vertex range balancer class (optional)
     *
     * @param vertexRangeBalancerClass Determines how vertex
     *        ranges are balanced prior to each superstep
     */
    final public void setVertexRangeBalancerClass(
            Class<?> vertexRangeBalancerClass) {
        getConfiguration().setClass(VERTEX_RANGE_BALANCER_CLASS,
                                    vertexRangeBalancerClass,
                                    VertexRangeBalancer.class);
    }

By default, we use the StaticBalancer, it doesn't move vertices at all. There is also an AutoBalancer that tries to balance the graph based on vertices or edges. You can also write you own. Hope that helps.


...joe


[1]  https://issues.apache.org/jira/browse/GIRAPH-11





Thanks in advance.

Kind regards / Mit freundlichen Grüßen

Severin Andreas Corsten
DHBW-Student Business Informatics 2009 - University Programs
IBM Sales&  Distribution, Human Resources
WI09N-M
________________________________
Phone: 1-408-927-2750
Mobile (Germany): 49-160-98976935
E-mail: severin.cors...@de.ibm.com

Hechtsheimer Str. 2
Mainz, 55131
Germany

Reply via email to