All the collective decisions are done on the first collective on each communicator. So basically you can change the MCA or pvar before the first collective in a communicator to affect how the decision selection is made. I have posted few examples over the years on the mailing list.
George. On Tue, Apr 7, 2020 at 3:44 PM Josh Hursey via devel < devel@lists.open-mpi.org> wrote: > If you run with "--mca coll_base_verbose 10" it will display a priority > list of the components chosen per communicator created. You will see > something like: > coll:base:comm_select: new communicator: MPI_COMM_WORLD (cid 0) > coll:base:comm_select: selecting basic, priority 10, Enabled > coll:base:comm_select: selecting libnbc, priority 10, Enabled > coll:base:comm_select: selecting tuned, priority 30, Enabled > > Where the 'tuned' component has the highest priority - so OMPI will pick > its version of a collective operation (e.g., MPI_Bcast), if present, over > the collective operation of lower priority component. > > I'm not sure if there is something finer-grained in each of the components > on which specific collective function is being used or not. > > -- Josh > > > On Tue, Apr 7, 2020 at 1:59 PM Luis Cebamanos <l.cebama...@epcc.ed.ac.uk> > wrote: > >> Hi Josh, >> >> It makes sense, thanks. Is there a debug flag that prints out which >> component is chosen? >> >> Regards, >> Luis >> >> >> On 07/04/2020 19:42, Josh Hursey via devel wrote: >> >> Good question. The reason for this behavior is that the Open MPI >> coll(ective) framework does not require that every component (e.g., >> 'basic', 'tuned', 'libnbc') implement all of the collective operations. It >> requires instead that the composition of the available components (e.g., >> basic + libnbc) provides the full set of collective operations. >> >> This is nice for a collective implementor since they can focus on the >> collective operations they want in their component, but it does mean that >> the end-user needs to know about this composition behavior. >> >> The command below will show you all of the available collective >> components in your Open MPI build. >> ompi_info | grep " coll" >> >> 'self' and 'libnbc' probably need to be included in all of your >> runs, maybe 'inter' as well. The others like 'tuned' and 'basic' may be >> able to be swapped out. >> >> To compare 'basic' vs 'tuned' you can run: >> --mca coll basic,libnbc,self >> and >> --mca coll tuned,libnbc,self >> >> It is worth noting that some of the components like 'sync' are utilities >> that add functionality on top of the other collectives - in the case of >> 'sync' it will add a barrier before/after N collective calls. >> >> >> >> On Tue, Apr 7, 2020 at 10:54 AM Luis Cebamanos via devel < >> devel@lists.open-mpi.org> wrote: >> >>> Hello developers, >>> >>> I am trying to debug the mca choices the library is taking for >>> collective operations. The reason is because I want to force the library >>> to choose a particular module and compare it with a different one. >>> One thing I have notice is that I can do: >>> >>> mpirun --mca coll basic,libnbc --np 4 ./iallreduce >>> >>> for an "iallreduce" operation, but I get an error if I do >>> >>> mpirun --mca coll libnbc --np 4 ./iallreduce >>> or >>> mpirun --mca coll basic --np 4 ./iallreduce >>> >>> >>> -------------------------------------------------------------------------- >>> Although some coll components are available on your system, none of >>> them said that they could be used for iallgather on a new communicator. >>> >>> This is extremely unusual -- either the "basic", "libnbc" or "self" >>> components >>> should be able to be chosen for any communicator. As such, this >>> likely means that something else is wrong (although you should double >>> check that the "basic", "libnbc" and "self" coll components are >>> available on >>> your system -- check the output of the "ompi_info" command). >>> A coll module failed to finalize properly when a communicator that was >>> using it was destroyed. >>> >>> This is somewhat unusual: the module itself may be at fault, or this >>> may be a symptom of another issue (e.g., a memory problem). >>> >>> mca_coll_base_comm_select(MPI_COMM_WORLD) failed >>> --> Returned "Not found" (-13) instead of "Success" (0) >>> >>> >>> Can you please help? >>> >>> Regards, >>> Luis >>> The University of Edinburgh is a charitable body, registered in >>> Scotland, with registration number SC005336. >>> >> >> >> -- >> Josh Hursey >> IBM Spectrum MPI Developer >> >> >> > > -- > Josh Hursey > IBM Spectrum MPI Developer >