I believe the problem is actually a little different than you described. The issue occurs whenever the #procs combined with PE exceeds the number of cores on a node. It is caused by the fact that we aren’t considering the PE number when mapping processes - we only appear to be looking at it when binding. I’ll try to poke at it a bit.
> On Sep 11, 2018, at 9:17 AM, Shrader, David Lee <dshra...@lanl.gov> wrote: > > Here's the xml output from lstopo. Thank you for taking a look! > David > > From: devel <devel-boun...@lists.open-mpi.org> on behalf of Ralph H Castain > <r...@open-mpi.org> > Sent: Monday, September 10, 2018 5:12 PM > To: OpenMPI Devel > Subject: Re: [OMPI devel] mpirun error when not using span > > Could you please send the output from “lstopo --of xml foo.xml” (the file > foo.xml) so I can try to replicate here? > > >> On Sep 4, 2018, at 12:35 PM, Shrader, David Lee <dshra...@lanl.gov >> <mailto:dshra...@lanl.gov>> wrote: >> >> Hello, >> >> I have run this issue by Howard, and he asked me to forward it on to the >> Open MPI devel mailing list. I get an error when trying to use PE=n with >> '--map-by numa' and not using span when using more than one node: >> >> [dshrader@ba001 openmpi-3.1.2]$ mpirun -n 16 --map-by numa:PE=4 --bind-to >> core --report-bindings true >> -------------------------------------------------------------------------- >> A request was made to bind to that would result in binding more >> processes than cpus on a resource: >> >> Bind to: CORE >> Node: ba001 >> #processes: 2 >> #cpus: 1 >> >> You can override this protection by adding the "overload-allowed" >> option to your binding directive. >> -------------------------------------------------------------------------- >> >> The absolute values of the numbers passed to -n and PE don't really matter; >> the error pops up as soon as those numbers are combined in such a way that >> an MPI rank ends up on the second node. >> >> If I add the "span" parameter, everything works as expected: >> >> [dshrader@ba001 openmpi-3.1.2]$ mpirun -n 16 --map-by numa:PE=4,span >> --bind-to core --report-bindings true >> [ba002.localdomain:58502] MCW rank 8 bound to socket 0[core 0[hwt 0]], >> socket 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], socket 0[core 3[hwt 0]]: >> [B/B/B/B/./././././././././././././.][./././././././././././././././././.] >> [ba002.localdomain:58502] MCW rank 9 bound to socket 0[core 4[hwt 0]], >> socket 0[core 5[hwt 0]], socket 0[core 6[hwt 0]], socket 0[core 7[hwt 0]]: >> [././././B/B/B/B/./././././././././.][./././././././././././././././././.] >> [ba002.localdomain:58502] MCW rank 10 bound to socket 0[core 8[hwt 0]], >> socket 0[core 9[hwt 0]], socket 0[core 10[hwt 0]], socket 0[core 11[hwt 0]]: >> [././././././././B/B/B/B/./././././.][./././././././././././././././././.] >> [ba002.localdomain:58502] MCW rank 11 bound to socket 0[core 12[hwt 0]], >> socket 0[core 13[hwt 0]], socket 0[core 14[hwt 0]], socket 0[core 15[hwt >> 0]]: >> [././././././././././././B/B/B/B/./.][./././././././././././././././././.] >> [ba002.localdomain:58502] MCW rank 12 bound to socket 1[core 18[hwt 0]], >> socket 1[core 19[hwt 0]], socket 1[core 20[hwt 0]], socket 1[core 21[hwt >> 0]]: >> [./././././././././././././././././.][B/B/B/B/./././././././././././././.] >> [ba002.localdomain:58502] MCW rank 13 bound to socket 1[core 22[hwt 0]], >> socket 1[core 23[hwt 0]], socket 1[core 24[hwt 0]], socket 1[core 25[hwt >> 0]]: >> [./././././././././././././././././.][././././B/B/B/B/./././././././././.] >> [ba002.localdomain:58502] MCW rank 14 bound to socket 1[core 26[hwt 0]], >> socket 1[core 27[hwt 0]], socket 1[core 28[hwt 0]], socket 1[core 29[hwt >> 0]]: >> [./././././././././././././././././.][././././././././B/B/B/B/./././././.] >> [ba002.localdomain:58502] MCW rank 15 bound to socket 1[core 30[hwt 0]], >> socket 1[core 31[hwt 0]], socket 1[core 32[hwt 0]], socket 1[core 33[hwt >> 0]]: >> [./././././././././././././././././.][././././././././././././B/B/B/B/./.] >> [ba001.localdomain:11700] MCW rank 0 bound to socket 0[core 0[hwt 0]], >> socket 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], socket 0[core 3[hwt 0]]: >> [B/B/B/B/./././././././././././././.][./././././././././././././././././.] >> [ba001.localdomain:11700] MCW rank 1 bound to socket 0[core 4[hwt 0]], >> socket 0[core 5[hwt 0]], socket 0[core 6[hwt 0]], socket 0[core 7[hwt 0]]: >> [././././B/B/B/B/./././././././././.][./././././././././././././././././.] >> [ba001.localdomain:11700] MCW rank 2 bound to socket 0[core 8[hwt 0]], >> socket 0[core 9[hwt 0]], socket 0[core 10[hwt 0]], socket 0[core 11[hwt 0]]: >> [././././././././B/B/B/B/./././././.][./././././././././././././././././.] >> [ba001.localdomain:11700] MCW rank 3 bound to socket 0[core 12[hwt 0]], >> socket 0[core 13[hwt 0]], socket 0[core 14[hwt 0]], socket 0[core 15[hwt >> 0]]: >> [././././././././././././B/B/B/B/./.][./././././././././././././././././.] >> [ba001.localdomain:11700] MCW rank 4 bound to socket 1[core 18[hwt 0]], >> socket 1[core 19[hwt 0]], socket 1[core 20[hwt 0]], socket 1[core 21[hwt >> 0]]: >> [./././././././././././././././././.][B/B/B/B/./././././././././././././.] >> [ba001.localdomain:11700] MCW rank 5 bound to socket 1[core 22[hwt 0]], >> socket 1[core 23[hwt 0]], socket 1[core 24[hwt 0]], socket 1[core 25[hwt >> 0]]: >> [./././././././././././././././././.][././././B/B/B/B/./././././././././.] >> [ba001.localdomain:11700] MCW rank 6 bound to socket 1[core 26[hwt 0]], >> socket 1[core 27[hwt 0]], socket 1[core 28[hwt 0]], socket 1[core 29[hwt >> 0]]: >> [./././././././././././././././././.][././././././././B/B/B/B/./././././.] >> [ba001.localdomain:11700] MCW rank 7 bound to socket 1[core 30[hwt 0]], >> socket 1[core 31[hwt 0]], socket 1[core 32[hwt 0]], socket 1[core 33[hwt >> 0]]: >> [./././././././././././././././././.][././././././././././././B/B/B/B/./.] >> >> I would have expected the first command to work in the sense that processes >> are at least mapped and bound somewhere across the two nodes; is there a >> particular reason why that doesn't happen? >> >> I am using Open MPI 3.1.2 in the above examples with only "--prefix" to >> configure. I am running on two nodes that each have two sockets with 18 >> processors per socket (36 processors per node, no hyper-threading). Hwloc >> reports that the numa domain is equivalent to a socket on these hosts (thus, >> replacing "numa" with "socket" in the above examples exhibits the same >> behavior for me). The interconnect is Omnipath. >> >> Thank you very much for your time, >> David >> _______________________________________________ >> devel mailing list >> devel@lists.open-mpi.org <mailto:devel@lists.open-mpi.org> >> https://lists.open-mpi.org/mailman/listinfo/devel >> <https://lists.open-mpi.org/mailman/listinfo/devel> > <lstopo_output.xml.tar.bz2>_______________________________________________ > devel mailing list > devel@lists.open-mpi.org <mailto:devel@lists.open-mpi.org> > https://lists.open-mpi.org/mailman/listinfo/devel > <https://lists.open-mpi.org/mailman/listinfo/devel>
_______________________________________________ devel mailing list devel@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/devel