Hi Josh,

It makes sense, thanks. Is there a debug flag that prints out which
component is chosen?

Regards,
Luis


On 07/04/2020 19:42, Josh Hursey via devel wrote:
> Good question. The reason for this behavior is that the Open MPI
> coll(ective) framework does not require that every component (e.g.,
> 'basic', 'tuned', 'libnbc') implement all of the collective
> operations. It requires instead that the composition of the available
> components (e.g., basic + libnbc) provides the full set of collective
> operations.
>
> This is nice for a collective implementor since they can focus on the
> collective operations they want in their component, but it does mean
> that the end-user needs to know about this composition behavior.
>
> The command below will show you all of the available collective
> components in your Open MPI build.
> ompi_info | grep " coll"
>
> 'self' and 'libnbc' probably need to be included in all of your
> runs, maybe 'inter' as well. The others like 'tuned' and 'basic' may
> be able to be swapped out.
>
> To compare 'basic' vs 'tuned' you can run:
>  --mca coll basic,libnbc,self
> and
>  --mca coll tuned,libnbc,self
>
> It is worth noting that some of the components like 'sync' are
> utilities that add functionality on top of the other collectives - in
> the case of 'sync' it will add a barrier before/after N collective calls.
>
>
>
> On Tue, Apr 7, 2020 at 10:54 AM Luis Cebamanos via devel
> <devel@lists.open-mpi.org <mailto:devel@lists.open-mpi.org>> wrote:
>
>     Hello developers,
>
>     I am trying to debug the mca choices the library is taking for
>     collective operations. The reason is because I want to force the
>     library
>     to choose a particular module and compare it with a different one.
>     One thing I have notice is that I can do:
>
>     mpirun --mca coll basic,libnbc  --np 4 ./iallreduce
>
>     for an "iallreduce" operation, but I get an error if I do
>
>     mpirun --mca coll libnbc  --np 4 ./iallreduce
>     or
>     mpirun --mca coll basic  --np 4 ./iallreduce
>
>     --------------------------------------------------------------------------
>     Although some coll components are available on your system, none of
>     them said that they could be used for iallgather on a new
>     communicator.
>
>     This is extremely unusual -- either the "basic", "libnbc" or "self"
>     components
>     should be able to be chosen for any communicator.  As such, this
>     likely means that something else is wrong (although you should double
>     check that the "basic", "libnbc" and "self" coll components are
>     available on
>     your system -- check the output of the "ompi_info" command).
>     A coll module failed to finalize properly when a communicator that was
>     using it was destroyed.
>
>     This is somewhat unusual: the module itself may be at fault, or this
>     may be a symptom of another issue (e.g., a memory problem).
>
>       mca_coll_base_comm_select(MPI_COMM_WORLD) failed
>        --> Returned "Not found" (-13) instead of "Success" (0)
>
>
>     Can you please help?
>
>     Regards,
>     Luis
>     The University of Edinburgh is a charitable body, registered in
>     Scotland, with registration number SC005336.
>
>
>
> -- 
> Josh Hursey
> IBM Spectrum MPI Developer

Reply via email to