Awesome.  Thx so much for the info.  I'll let yall know how my testing
goes.

On 2/10/12 4:04 PM, "Avery Ching" <ach...@apache.org> wrote:

>Even if you start with two vertices, the number of partitions is based
>on the number of workers squared multiplied by a multiplier (see
>HashMasterPartitioner#PARTITION_COUNT_MULTIPLIER).  By default, the
>multiplier  is 1, so if you have say 10 workers, you'll have 100
>partitions.  There is a maximum number of partitions though due to the
>max zknode size of about 2995.  So everything should be fine for you.
>
>Avery
>
>On 2/10/12 1:52 PM, David Garcia wrote:
>> Ah, so, I think I would like to balance by vertices.  My main question
>>is
>> that my graph starts with two vertices. . .I would like to specify more
>> than two mappers.  My job will end up creating around 100,000 vertices.
>> I
>> would like to make sure that these extra vertices will be evenly
>> distributed across all mappers (including the ones that don't have the
>> initial two vertices).  Does this make sense?  Does Giraph support this
>> out of the box, or do I need to add something?  Thx.
>>
>> -David
>>
>>
>> On 2/10/12 3:41 PM, "Avery Ching"<ach...@apache.org>  wrote:
>>
>>> By default, you are using the HashPartitionerFactory.  This will create
>>> the partitions ahead of time and balance them equally by count to the
>>> workers.  Therefore, assuming you have a uniform distribution across
>>>the
>>> VertexId space, the graph should be balanced across the workers evenly
>>> according the number of vertices.  If you look at PartitionBalancer,
>>>you
>>> can try to rebalance the graph if you like as it is running.  This is a
>>> bit experimental, but should work.  The choices for balancing are (no
>>> balancing, balance by edges or balance by vertices).
>>>
>>> Hope that helps,
>>>
>>> Avery
>>>
>>>
>>> On 2/10/12 1:25 PM, David Garcia wrote:
>>>> Hey guys. . .I have a questions about "dynamic" vertex instantiation
>>>>vis
>>>> the sendMsg(. . .) method.  I have a job that starts processing on a
>>>> sequenceFile with only two vertices in it.  Each vertex has
>>>>information
>>>> in
>>>> it's value that tells it what vertices are adjacent to it.  The
>>>>primary
>>>> reason I'm doing this is to avoid loading the entire graph into the
>>>>job.
>>>> There are many vertices that won't do any processing (no need to load
>>>> them).  I would like to take my two vertices and "dynamically" build
>>>>the
>>>> graph by sending messages.  So far, my experimentation shows that this
>>>> is
>>>> promising. . .but I have a question WRT load balancing for new vertex
>>>> instantiation.  When I call sendMsg(newVertexID), where will the
>>>>vertex
>>>> be
>>>> instantiated?  If I specify 20 mappers (but with only two vertices in
>>>>my
>>>> sequence file), obviously there is going to be at least one mapper
>>>> without
>>>> a vertex.  Is it possible that sendMsg(newVertexID) will be
>>>>instantiated
>>>> on an empty mapper?  I would like this. . .for load balancing
>>>>purposes.
>>>>
>>>> -david
>>>>
>

Reply via email to