Niclas Jansson wrote:
> Anders Logg wrote:
>> On Fri, Sep 19, 2008 at 11:36:28AM +0200, Niclas Jansson wrote:
>>
>>>> I also wonder about the following in PXMLMesh::readVertices:
>>>>
>>>>   const uint L = floor( (real) num_vertices / (real) num_processes);
>>>>   const uint R = num_vertices % num_processes;
>>>>   const uint num_local = (num_vertices + num_processes -
>>>>   process_number - 1) / num_processes;
>>>>
>>>>   start_index = process_number * L + std::min(process_number, R);
>>>>   end_index = start_index + ( num_local - 1);
>>>>
>>>> I think I can guess what it does, but does it have to be this
>>>> complicated? Isn't it enough to do something like
>>>>
>>>>   const uint n = num_vertices / num_processors;
>>>>   start_index = n*process_number;
>>>>   end_index = start_index + n;
>>>>
>>>> and then a fix for the last processor:
>>>>
>>>>   if (process_number == num_processors - 1)
>>>>     end_index = num_vertices;
>>>>
>>>> ?
>>>>
>>> But shouldn't that give a bad load balance, for example when N is large,
>>> R << num_processes and (end_index - start_index) >> R.
>>>
>>> Niclas
>> I don't understand, but maybe I'm missing something.
>>
>> Say N = 1,000,000 and num_processes = 16. Then R = 0. With my scheme
>> above, then there will be 62500 vertices on each processor.
>>
>> If we change N to 1,000,001, then there will be 62500 on each
>> processor except the last which will have 62501.
>>
>> If we increase N further, we will have 62502, 62503 etc until 62515 on
>> the last processor, and after that 62501 on each processor etc.
>>
>> But maybe I'm missing something important?
>>
>> --
>> Anders
>>
> 
> Ok, it was a bad example. But the point is that the extra elements must 
> be distributed across all processors to even out the workload.
> 
> For example if N = num_processes**2 + num_processes - 1, the last 
> processor would get twice the amount of elements.
> 
> And even if the last processor only has small amount of extra elements, 
> for, let say 1024 processor, the efficiency would drop since 1023 
> processors would be wasting cycles waiting on the last one to finish.
>

To me the issue is not the correctness of the approach, it's that the 
code is a bit cryptic. It's hard for me to follow.

Garth



> Niclas
> _______________________________________________
> DOLFIN-dev mailing list
> [email protected]
> http://www.fenics.org/mailman/listinfo/dolfin-dev


_______________________________________________
DOLFIN-dev mailing list
[email protected]
http://www.fenics.org/mailman/listinfo/dolfin-dev

Reply via email to