Hi Josh, It makes sense, thanks. Is there a debug flag that prints out which component is chosen?
Regards, Luis On 07/04/2020 19:42, Josh Hursey via devel wrote: > Good question. The reason for this behavior is that the Open MPI > coll(ective) framework does not require that every component (e.g., > 'basic', 'tuned', 'libnbc') implement all of the collective > operations. It requires instead that the composition of the available > components (e.g., basic + libnbc) provides the full set of collective > operations. > > This is nice for a collective implementor since they can focus on the > collective operations they want in their component, but it does mean > that the end-user needs to know about this composition behavior. > > The command below will show you all of the available collective > components in your Open MPI build. > ompi_info | grep " coll" > > 'self' and 'libnbc' probably need to be included in all of your > runs, maybe 'inter' as well. The others like 'tuned' and 'basic' may > be able to be swapped out. > > To compare 'basic' vs 'tuned' you can run: > --mca coll basic,libnbc,self > and > --mca coll tuned,libnbc,self > > It is worth noting that some of the components like 'sync' are > utilities that add functionality on top of the other collectives - in > the case of 'sync' it will add a barrier before/after N collective calls. > > > > On Tue, Apr 7, 2020 at 10:54 AM Luis Cebamanos via devel > <devel@lists.open-mpi.org <mailto:devel@lists.open-mpi.org>> wrote: > > Hello developers, > > I am trying to debug the mca choices the library is taking for > collective operations. The reason is because I want to force the > library > to choose a particular module and compare it with a different one. > One thing I have notice is that I can do: > > mpirun --mca coll basic,libnbc --np 4 ./iallreduce > > for an "iallreduce" operation, but I get an error if I do > > mpirun --mca coll libnbc --np 4 ./iallreduce > or > mpirun --mca coll basic --np 4 ./iallreduce > > -------------------------------------------------------------------------- > Although some coll components are available on your system, none of > them said that they could be used for iallgather on a new > communicator. > > This is extremely unusual -- either the "basic", "libnbc" or "self" > components > should be able to be chosen for any communicator. As such, this > likely means that something else is wrong (although you should double > check that the "basic", "libnbc" and "self" coll components are > available on > your system -- check the output of the "ompi_info" command). > A coll module failed to finalize properly when a communicator that was > using it was destroyed. > > This is somewhat unusual: the module itself may be at fault, or this > may be a symptom of another issue (e.g., a memory problem). > > mca_coll_base_comm_select(MPI_COMM_WORLD) failed > --> Returned "Not found" (-13) instead of "Success" (0) > > > Can you please help? > > Regards, > Luis > The University of Edinburgh is a charitable body, registered in > Scotland, with registration number SC005336. > > > > -- > Josh Hursey > IBM Spectrum MPI Developer