Awesome. Thx so much for the info. I'll let yall know how my testing goes.

On 2/10/12 4:04 PM, "Avery Ching" <ach...@apache.org> wrote: >Even if you start with two vertices, the number of partitions is based >on the number of workers squared multiplied by a multiplier (see >HashMasterPartitioner#PARTITION_COUNT_MULTIPLIER). By default, the >multiplier is 1, so if you have say 10 workers, you'll have 100 >partitions. There is a maximum number of partitions though due to the >max zknode size of about 2995. So everything should be fine for you. > >Avery > >On 2/10/12 1:52 PM, David Garcia wrote: >> Ah, so, I think I would like to balance by vertices. My main question >>is >> that my graph starts with two vertices. . .I would like to specify more >> than two mappers. My job will end up creating around 100,000 vertices. >> I >> would like to make sure that these extra vertices will be evenly >> distributed across all mappers (including the ones that don't have the >> initial two vertices). Does this make sense? Does Giraph support this >> out of the box, or do I need to add something? Thx. >> >> -David >> >> >> On 2/10/12 3:41 PM, "Avery Ching"<ach...@apache.org> wrote: >> >>> By default, you are using the HashPartitionerFactory. This will create >>> the partitions ahead of time and balance them equally by count to the >>> workers. Therefore, assuming you have a uniform distribution across >>>the >>> VertexId space, the graph should be balanced across the workers evenly >>> according the number of vertices. If you look at PartitionBalancer, >>>you >>> can try to rebalance the graph if you like as it is running. This is a >>> bit experimental, but should work. The choices for balancing are (no >>> balancing, balance by edges or balance by vertices). >>> >>> Hope that helps, >>> >>> Avery >>> >>> >>> On 2/10/12 1:25 PM, David Garcia wrote: >>>> Hey guys. . .I have a questions about "dynamic" vertex instantiation >>>>vis >>>> the sendMsg(. . .) method. I have a job that starts processing on a >>>> sequenceFile with only two vertices in it. Each vertex has >>>>information >>>> in >>>> it's value that tells it what vertices are adjacent to it. The >>>>primary >>>> reason I'm doing this is to avoid loading the entire graph into the >>>>job. >>>> There are many vertices that won't do any processing (no need to load >>>> them). I would like to take my two vertices and "dynamically" build >>>>the >>>> graph by sending messages. So far, my experimentation shows that this >>>> is >>>> promising. . .but I have a question WRT load balancing for new vertex >>>> instantiation. When I call sendMsg(newVertexID), where will the >>>>vertex >>>> be >>>> instantiated? If I specify 20 mappers (but with only two vertices in >>>>my >>>> sequence file), obviously there is going to be at least one mapper >>>> without >>>> a vertex. Is it possible that sendMsg(newVertexID) will be >>>>instantiated >>>> on an empty mapper? I would like this. . .for load balancing >>>>purposes. >>>> >>>> -david >>>> >