Re: [OMPI devel] confusion between slot and procs on mca/rmaps

Ralph Castain Wed, 1 Dec 2010 07:52:23 -0500

Done in r24126

On Dec 1, 2010, at 5:11 AM, Damien Guinier wrote:


> oups
> 
> Ok, you can commit it. All problem is on "procs" word, on source code, 
> "processes" AND "cores" definition is used.
> 
> 
> Le 01/12/2010 11:37, Damien Guinier a écrit :
>> Ok, you can commit it. All problem is on "procs" work, on source code, 
>> "processes" AND "cores" definition is used.
>> 
>> Thank you for your help.
>> Damien
>> 
>> Le 01/12/2010 10:47, Ralph Castain a écrit :
>>> I just checked and it appears bycore does correctly translate to byslot. So 
>>> your patch does indeed appear to be correct. If you don't mind, I'm going 
>>> to apply it for you as I'm working on a correction for how we handle 
>>> oversubscribe flags, and I want to ensure the patch gets included so we 
>>> compute oversubscribe correctly.
>>> 
>>> Thanks for catching this!
>>> 
>>> On Nov 30, 2010, at 10:33 PM, Ralph Castain wrote:
>>> 
>>>> Afraid I don't speak much slurm any more (thank goodness!).
>>>> 
>>>> From your output, It looks like the system is mapping bynode instead of 
>>>> byslot. IIRC, isn't bycore just supposed to be a pseudonym for byslot? So 
>>>> perhaps the problem is that "bycore" causes us to set the "bynode" flag by 
>>>> mistake. Did you check that?
>>>> 
>>>> BTW: when running cpus-per-proc, a slot doesn't have X processes. I 
>>>> suspect this is just a language thing, but it will create confusion. A 
>>>> slot consists of X cpus - we still assign only one process to each slot.
>>>> 
>>>> On Nov 30, 2010, at 10:47 AM, Damien Guinier wrote:
>>>> 
>>>>> hi all,
>>>>> 
>>>>> Many time, there are no difference between "proc" and "slot". But when 
>>>>> you use "mpirun -cpus-per-proc X", slot have X procs.
>>>>> On orte/mca/rmaps/base/rmaps_base_common_mappers.c, there are a confusion 
>>>>> between proc and slot. this little error impact mapping action:
>>>>> 
>>>>> On OMPI last version with 32 cores compute node:
>>>>> salloc -n 8 -c 8 mpirun -bind-to-core -bycore ./a.out
>>>>> [rank:0]<stdout>: host:compute18
>>>>> [rank:1]<stdout>: host:compute19
>>>>> [rank:2]<stdout>: host:compute18
>>>>> [rank:3]<stdout>: host:compute19
>>>>> [rank:4]<stdout>: host:compute18
>>>>> [rank:5]<stdout>: host:compute19
>>>>> [rank:6]<stdout>: host:compute18
>>>>> [rank:7]<stdout>: host:compute19
>>>>> 
>>>>> with patch:
>>>>> [rank:0]<stdout>: host:compute18
>>>>> [rank:1]<stdout>: host:compute18
>>>>> [rank:2]<stdout>: host:compute18
>>>>> [rank:3]<stdout>: host:compute18
>>>>> [rank:4]<stdout>: host:compute19
>>>>> [rank:5]<stdout>: host:compute19
>>>>> [rank:6]<stdout>: host:compute19
>>>>> [rank:7]<stdout>: host:compute19
>>>>> 
>>>>> Can you say, if my patch is correct ?
>>>>> 
>>>>> Thanks you
>>>>> 
>>>>> Damien
>>>>> 
>>>>> <patch_cpu_per_rank.txt>_______________________________________________ 
>>>>> devel mailing list
>>>>> de...@open-mpi.org
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> 
>>> _______________________________________________
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> 
>>> 
>> 
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
>> 
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] confusion between slot and procs on mca/rmaps

Reply via email to