`Even if you start with two vertices, the number of partitions is based`

`on the number of workers squared multiplied by a multiplier (see`

`HashMasterPartitioner#PARTITION_COUNT_MULTIPLIER). By default, the`

`multiplier is 1, so if you have say 10 workers, you'll have 100`

`partitions. There is a maximum number of partitions though due to the`

`max zknode size of about 2995. So everything should be fine for you.`

Avery

## Advertising

On 2/10/12 1:52 PM, David Garcia wrote:

Ah, so, I think I would like to balance by vertices. My main question is that my graph starts with two vertices. . .I would like to specify more than two mappers. My job will end up creating around 100,000 vertices. I would like to make sure that these extra vertices will be evenly distributed across all mappers (including the ones that don't have the initial two vertices). Does this make sense? Does Giraph support this out of the box, or do I need to add something? Thx. -David On 2/10/12 3:41 PM, "Avery Ching"<ach...@apache.org> wrote:By default, you are using the HashPartitionerFactory. This will create the partitions ahead of time and balance them equally by count to the workers. Therefore, assuming you have a uniform distribution across the VertexId space, the graph should be balanced across the workers evenly according the number of vertices. If you look at PartitionBalancer, you can try to rebalance the graph if you like as it is running. This is a bit experimental, but should work. The choices for balancing are (no balancing, balance by edges or balance by vertices). Hope that helps, Avery On 2/10/12 1:25 PM, David Garcia wrote:Hey guys. . .I have a questions about "dynamic" vertex instantiation vis the sendMsg(. . .) method. I have a job that starts processing on a sequenceFile with only two vertices in it. Each vertex has information in it's value that tells it what vertices are adjacent to it. The primary reason I'm doing this is to avoid loading the entire graph into the job. There are many vertices that won't do any processing (no need to load them). I would like to take my two vertices and "dynamically" build the graph by sending messages. So far, my experimentation shows that this is promising. . .but I have a question WRT load balancing for new vertex instantiation. When I call sendMsg(newVertexID), where will the vertex be instantiated? If I specify 20 mappers (but with only two vertices in my sequence file), obviously there is going to be at least one mapper without a vertex. Is it possible that sendMsg(newVertexID) will be instantiated on an empty mapper? I would like this. . .for load balancing purposes. -david