Done in r24126 On Dec 1, 2010, at 5:11 AM, Damien Guinier wrote:
> oups > > Ok, you can commit it. All problem is on "procs" word, on source code, > "processes" AND "cores" definition is used. > > > Le 01/12/2010 11:37, Damien Guinier a écrit : >> Ok, you can commit it. All problem is on "procs" work, on source code, >> "processes" AND "cores" definition is used. >> >> Thank you for your help. >> Damien >> >> Le 01/12/2010 10:47, Ralph Castain a écrit : >>> I just checked and it appears bycore does correctly translate to byslot. So >>> your patch does indeed appear to be correct. If you don't mind, I'm going >>> to apply it for you as I'm working on a correction for how we handle >>> oversubscribe flags, and I want to ensure the patch gets included so we >>> compute oversubscribe correctly. >>> >>> Thanks for catching this! >>> >>> On Nov 30, 2010, at 10:33 PM, Ralph Castain wrote: >>> >>>> Afraid I don't speak much slurm any more (thank goodness!). >>>> >>>> From your output, It looks like the system is mapping bynode instead of >>>> byslot. IIRC, isn't bycore just supposed to be a pseudonym for byslot? So >>>> perhaps the problem is that "bycore" causes us to set the "bynode" flag by >>>> mistake. Did you check that? >>>> >>>> BTW: when running cpus-per-proc, a slot doesn't have X processes. I >>>> suspect this is just a language thing, but it will create confusion. A >>>> slot consists of X cpus - we still assign only one process to each slot. >>>> >>>> On Nov 30, 2010, at 10:47 AM, Damien Guinier wrote: >>>> >>>>> hi all, >>>>> >>>>> Many time, there are no difference between "proc" and "slot". But when >>>>> you use "mpirun -cpus-per-proc X", slot have X procs. >>>>> On orte/mca/rmaps/base/rmaps_base_common_mappers.c, there are a confusion >>>>> between proc and slot. this little error impact mapping action: >>>>> >>>>> On OMPI last version with 32 cores compute node: >>>>> salloc -n 8 -c 8 mpirun -bind-to-core -bycore ./a.out >>>>> [rank:0]<stdout>: host:compute18 >>>>> [rank:1]<stdout>: host:compute19 >>>>> [rank:2]<stdout>: host:compute18 >>>>> [rank:3]<stdout>: host:compute19 >>>>> [rank:4]<stdout>: host:compute18 >>>>> [rank:5]<stdout>: host:compute19 >>>>> [rank:6]<stdout>: host:compute18 >>>>> [rank:7]<stdout>: host:compute19 >>>>> >>>>> with patch: >>>>> [rank:0]<stdout>: host:compute18 >>>>> [rank:1]<stdout>: host:compute18 >>>>> [rank:2]<stdout>: host:compute18 >>>>> [rank:3]<stdout>: host:compute18 >>>>> [rank:4]<stdout>: host:compute19 >>>>> [rank:5]<stdout>: host:compute19 >>>>> [rank:6]<stdout>: host:compute19 >>>>> [rank:7]<stdout>: host:compute19 >>>>> >>>>> Can you say, if my patch is correct ? >>>>> >>>>> Thanks you >>>>> >>>>> Damien >>>>> >>>>> <patch_cpu_per_rank.txt>_______________________________________________ >>>>> devel mailing list >>>>> de...@open-mpi.org >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> >>> _______________________________________________ >>> devel mailing list >>> de...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> >>> >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> >> > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel