Hi Ralph,

this is a follow-up on Siegmar's post that started at https://www.mail-archive.com/users@lists.open-mpi.org/msg31177.html


mpiexec -np 3 --host loki:2,exin hello_1_mpi
--------------------------------------------------------------------------
There are not enough slots available in the system to satisfy the 3 slots
that were requested by the application:
   hello_1_mpi

Either request fewer slots for your application, or make more slots available
for use.
--------------------------------------------------------------------------


loki is a physical machine with 2 NUMA, 2 sockets, ...

*but* exin is a virtual machine with *no* NUMA, 2 sockets, ...


my guess is that mpirun is able to find some NUMA objects on 'loki', so it uses the default mapping policy

(aka --map-by numa). unfortunatly exin has no NUMA objects, and mpirun fails with an error message

that is hard to interpret.


as a workaround, it is possible to

mpirun --map-by socket


so if i understand and remember correctly, mpirun should make the decision to map by numa *after* it receives the topology from exin and not before.

does that make sense ?

can you please take care of that ?


fwiw, i ran

lstopo --of xml > /tmp/topo.xml

on two nodes, and manually remove the NUMANode and Bridge objects from the topology of the second node, and then

mpirun --mca --mca hwloc_base_topo_file /tmp/topo.xml --host n0:2,n1 -np 3 hostname

in order to reproduce the issue.


Cheers,


Gilles

_______________________________________________
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Reply via email to