I believe the problem is actually a little different than you described. The 
issue occurs whenever the #procs combined with PE exceeds the number of cores 
on a node. It is caused by the fact that we aren’t considering the PE number 
when mapping processes - we only appear to be looking at it when binding. I’ll 
try to poke at it a bit.


> On Sep 11, 2018, at 9:17 AM, Shrader, David Lee <dshra...@lanl.gov> wrote:
> 
> Here's the xml output from lstopo. Thank you for taking a look!
> David
> 
> From: devel <devel-boun...@lists.open-mpi.org> on behalf of Ralph H Castain 
> <r...@open-mpi.org>
> Sent: Monday, September 10, 2018 5:12 PM
> To: OpenMPI Devel
> Subject: Re: [OMPI devel] mpirun error when not using span
>  
> Could you please send the output from “lstopo --of xml foo.xml” (the file 
> foo.xml) so I can try to replicate here?
> 
> 
>> On Sep 4, 2018, at 12:35 PM, Shrader, David Lee <dshra...@lanl.gov 
>> <mailto:dshra...@lanl.gov>> wrote:
>> 
>> Hello,
>> 
>> I have run this issue by Howard, and he asked me to forward it on to the 
>> Open MPI devel mailing list. I get an error when trying to use PE=n with 
>> '--map-by numa' and not using span when using more than one node:
>> 
>> [dshrader@ba001 openmpi-3.1.2]$ mpirun -n 16 --map-by numa:PE=4 --bind-to 
>> core --report-bindings true
>> --------------------------------------------------------------------------
>> A request was made to bind to that would result in binding more
>> processes than cpus on a resource:
>> 
>>    Bind to:     CORE
>>    Node:        ba001
>>    #processes:  2
>>    #cpus:       1
>> 
>> You can override this protection by adding the "overload-allowed"
>> option to your binding directive.
>> --------------------------------------------------------------------------
>> 
>> The absolute values of the numbers passed to -n and PE don't really matter; 
>> the error pops up as soon as those numbers are combined in such a way that 
>> an MPI rank ends up on the second node.
>> 
>> If I add the "span" parameter, everything works as expected:
>> 
>> [dshrader@ba001 openmpi-3.1.2]$ mpirun -n 16 --map-by numa:PE=4,span 
>> --bind-to core --report-bindings true
>> [ba002.localdomain:58502] MCW rank 8 bound to socket 0[core 0[hwt 0]], 
>> socket 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], socket 0[core 3[hwt 0]]: 
>> [B/B/B/B/./././././././././././././.][./././././././././././././././././.]
>> [ba002.localdomain:58502] MCW rank 9 bound to socket 0[core 4[hwt 0]], 
>> socket 0[core 5[hwt 0]], socket 0[core 6[hwt 0]], socket 0[core 7[hwt 0]]: 
>> [././././B/B/B/B/./././././././././.][./././././././././././././././././.]
>> [ba002.localdomain:58502] MCW rank 10 bound to socket 0[core 8[hwt 0]], 
>> socket 0[core 9[hwt 0]], socket 0[core 10[hwt 0]], socket 0[core 11[hwt 0]]: 
>> [././././././././B/B/B/B/./././././.][./././././././././././././././././.]
>> [ba002.localdomain:58502] MCW rank 11 bound to socket 0[core 12[hwt 0]], 
>> socket 0[core 13[hwt 0]], socket 0[core 14[hwt 0]], socket 0[core 15[hwt 
>> 0]]: 
>> [././././././././././././B/B/B/B/./.][./././././././././././././././././.]
>> [ba002.localdomain:58502] MCW rank 12 bound to socket 1[core 18[hwt 0]], 
>> socket 1[core 19[hwt 0]], socket 1[core 20[hwt 0]], socket 1[core 21[hwt 
>> 0]]: 
>> [./././././././././././././././././.][B/B/B/B/./././././././././././././.]
>> [ba002.localdomain:58502] MCW rank 13 bound to socket 1[core 22[hwt 0]], 
>> socket 1[core 23[hwt 0]], socket 1[core 24[hwt 0]], socket 1[core 25[hwt 
>> 0]]: 
>> [./././././././././././././././././.][././././B/B/B/B/./././././././././.]
>> [ba002.localdomain:58502] MCW rank 14 bound to socket 1[core 26[hwt 0]], 
>> socket 1[core 27[hwt 0]], socket 1[core 28[hwt 0]], socket 1[core 29[hwt 
>> 0]]: 
>> [./././././././././././././././././.][././././././././B/B/B/B/./././././.]
>> [ba002.localdomain:58502] MCW rank 15 bound to socket 1[core 30[hwt 0]], 
>> socket 1[core 31[hwt 0]], socket 1[core 32[hwt 0]], socket 1[core 33[hwt 
>> 0]]: 
>> [./././././././././././././././././.][././././././././././././B/B/B/B/./.]
>> [ba001.localdomain:11700] MCW rank 0 bound to socket 0[core 0[hwt 0]], 
>> socket 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], socket 0[core 3[hwt 0]]: 
>> [B/B/B/B/./././././././././././././.][./././././././././././././././././.]
>> [ba001.localdomain:11700] MCW rank 1 bound to socket 0[core 4[hwt 0]], 
>> socket 0[core 5[hwt 0]], socket 0[core 6[hwt 0]], socket 0[core 7[hwt 0]]: 
>> [././././B/B/B/B/./././././././././.][./././././././././././././././././.]
>> [ba001.localdomain:11700] MCW rank 2 bound to socket 0[core 8[hwt 0]], 
>> socket 0[core 9[hwt 0]], socket 0[core 10[hwt 0]], socket 0[core 11[hwt 0]]: 
>> [././././././././B/B/B/B/./././././.][./././././././././././././././././.]
>> [ba001.localdomain:11700] MCW rank 3 bound to socket 0[core 12[hwt 0]], 
>> socket 0[core 13[hwt 0]], socket 0[core 14[hwt 0]], socket 0[core 15[hwt 
>> 0]]: 
>> [././././././././././././B/B/B/B/./.][./././././././././././././././././.]
>> [ba001.localdomain:11700] MCW rank 4 bound to socket 1[core 18[hwt 0]], 
>> socket 1[core 19[hwt 0]], socket 1[core 20[hwt 0]], socket 1[core 21[hwt 
>> 0]]: 
>> [./././././././././././././././././.][B/B/B/B/./././././././././././././.]
>> [ba001.localdomain:11700] MCW rank 5 bound to socket 1[core 22[hwt 0]], 
>> socket 1[core 23[hwt 0]], socket 1[core 24[hwt 0]], socket 1[core 25[hwt 
>> 0]]: 
>> [./././././././././././././././././.][././././B/B/B/B/./././././././././.]
>> [ba001.localdomain:11700] MCW rank 6 bound to socket 1[core 26[hwt 0]], 
>> socket 1[core 27[hwt 0]], socket 1[core 28[hwt 0]], socket 1[core 29[hwt 
>> 0]]: 
>> [./././././././././././././././././.][././././././././B/B/B/B/./././././.]
>> [ba001.localdomain:11700] MCW rank 7 bound to socket 1[core 30[hwt 0]], 
>> socket 1[core 31[hwt 0]], socket 1[core 32[hwt 0]], socket 1[core 33[hwt 
>> 0]]: 
>> [./././././././././././././././././.][././././././././././././B/B/B/B/./.]
>> 
>> I would have expected the first command to work in the sense that processes 
>> are at least mapped and bound somewhere across the two nodes; is there a 
>> particular reason why that doesn't happen?
>> 
>> I am using Open MPI 3.1.2 in the above examples with only "--prefix" to 
>> configure. I am running on two nodes that each have two sockets with 18 
>> processors per socket (36 processors per node, no hyper-threading). Hwloc 
>> reports that the numa domain is equivalent to a socket on these hosts (thus, 
>> replacing "numa" with "socket" in the above examples exhibits the same 
>> behavior for me). The interconnect is Omnipath.
>> 
>> Thank you very much for your time,
>> David
>> _______________________________________________
>> devel mailing list
>> devel@lists.open-mpi.org <mailto:devel@lists.open-mpi.org>
>> https://lists.open-mpi.org/mailman/listinfo/devel 
>> <https://lists.open-mpi.org/mailman/listinfo/devel>
> <lstopo_output.xml.tar.bz2>_______________________________________________
> devel mailing list
> devel@lists.open-mpi.org <mailto:devel@lists.open-mpi.org>
> https://lists.open-mpi.org/mailman/listinfo/devel 
> <https://lists.open-mpi.org/mailman/listinfo/devel>
_______________________________________________
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

Reply via email to