By default, you are using the HashPartitionerFactory. This will create
the partitions ahead of time and balance them equally by count to the
workers. Therefore, assuming you have a uniform distribution across the
VertexId space, the graph should be balanced across the workers evenly
according the number of vertices. If you look at PartitionBalancer, you
can try to rebalance the graph if you like as it is running. This is a
bit experimental, but should work. The choices for balancing are (no
balancing, balance by edges or balance by vertices).
Hope that helps,
On 2/10/12 1:25 PM, David Garcia wrote:
Hey guys. . .I have a questions about "dynamic" vertex instantiation vis
the sendMsg(. . .) method. I have a job that starts processing on a
sequenceFile with only two vertices in it. Each vertex has information in
it's value that tells it what vertices are adjacent to it. The primary
reason I'm doing this is to avoid loading the entire graph into the job.
There are many vertices that won't do any processing (no need to load
them). I would like to take my two vertices and "dynamically" build the
graph by sending messages. So far, my experimentation shows that this is
promising. . .but I have a question WRT load balancing for new vertex
instantiation. When I call sendMsg(newVertexID), where will the vertex be
instantiated? If I specify 20 mappers (but with only two vertices in my
sequence file), obviously there is going to be at least one mapper without
a vertex. Is it possible that sendMsg(newVertexID) will be instantiated
on an empty mapper? I would like this. . .for load balancing purposes.