Good question. The reason for this behavior is that the Open MPI coll(ective) 
framework does not require that every component (e.g., 'basic', 'tuned', 
'libnbc') implement all of the collective operations. It requires instead that 
the composition of the available components (e.g., basic + libnbc) provides the 
full set of collective operations.

This is nice for a collective implementor since they can focus on the 
collective operations they want in their component, but it does mean that the 
end-user needs to know about this composition behavior.

The command below will show you all of the available collective components in 
your Open MPI build.
ompi_info | grep " coll"

'self' and 'libnbc' probably need to be included in all of your runs, maybe 
'inter' as well. The others like 'tuned' and 'basic' may be able to be swapped 
out.

To compare 'basic' vs 'tuned' you can run:
 --mca coll basic,libnbc,self
and
 --mca coll tuned,libnbc,self

It is worth noting that some of the components like 'sync' are utilities that 
add functionality on top of the other collectives - in the case of 'sync' it 
will add a barrier before/after N collective calls.



On Tue, Apr 7, 2020 at 10:54 AM Luis Cebamanos via devel 
<devel@lists.open-mpi.org <mailto:devel@lists.open-mpi.org> > wrote:
Hello developers,

I am trying to debug the mca choices the library is taking for
collective operations. The reason is because I want to force the library

to choose a particular module and compare it with a different one.
One thing I have notice is that I can do:

mpirun --mca coll basic,libnbc  --np 4 ./iallreduce

for an "iallreduce" operation, but I get an error if I do

mpirun --mca coll libnbc  --np 4 ./iallreduce
or
mpirun --mca coll basic  --np 4 ./iallreduce

--------------------------------------------------------------------------
Although some coll components are available on your system, none of
them said that they could be used for iallgather on a new communicator.

This is extremely unusual -- either the "basic", "libnbc" or "self"
components
should be able to be chosen for any communicator.  As such, this
likely means that something else is wrong (although you should double
check that the "basic", "libnbc" and "self" coll components are available on
your system -- check the output of the "ompi_info" command).
A coll module failed to finalize properly when a communicator that was
using it was destroyed.

This is somewhat unusual: the module itself may be at fault, or this
may be a symptom of another issue (e.g., a memory problem).

  mca_coll_base_comm_select(MPI_COMM_WORLD) failed
   --> Returned "Not found" (-13) instead of "Success" (0)


Can you please help?

Regards,
Luis
The University of Edinburgh is a charitable body, registered in Scotland, with 
registration number SC005336.


-- 
Josh Hursey
IBM Spectrum MPI Developer

Reply via email to