Florent, Long story short, yes, this is a known limitation. btl/sm cannot be used for intra-nodes communications between processes from different "jobs" (MPI_Comm_spawn() creates a new job), so you will use the interconnect (if it allows it) or btl/tcp (assuming pml/ob1 had been selected).
IIRC, the issue for using btl/sm is the size of the shared memory used for the inter-node communications. A straightforward implementation requires the maximum size must be known when the application is started. I think the idea to improve it (allocating for "n" slots per node, which means up to n MPI tasks at any given time can use btl/sm) was evoked but I do not remember someone tried a proof-of-concept. I will let the developers shed some more light on that topic. Cheers, Gilles On Wed, Apr 9, 2025 at 11:52 PM 'Florent GERMAIN' via Open MPI devel < devel@lists.open-mpi.org> wrote: > Hi, > > I have a question regarding MPI_Comm_spawn and proc flags. > > What I understand about procs and spawns in ompi: > Processes are identified by the proc structure. > proc structure stores proc_name and proc_flags (and many other things). > proc_flags defines locality related to the actual process. > proc_name is a unique couple (jobid, vpid) that identifies an ompi process. > > proc_name.jobid is the generation id of the process. > In spawn case, origin processes and spawned processes have different > jobids. (saw it in ompi4.x, hope it is still the case in ompi5.x) > > In btl/sm add_procs function ( > https://github.com/open-mpi/ompi/blob/main/opal/mca/btl/sm/btl_sm_module.c#L266), > on this part > > for (int32_t proc = 0; proc < (int32_t) nprocs; ++proc) { > /* check to see if this proc can be reached via shmem (i.e., > if they're on my local host and in my job) */ > * if (procs[proc]->proc_name.jobid != my_proc->proc_name.jobid* > * || !OPAL_PROC_ON_LOCAL_NODE(procs[proc]->proc_flags)) {* > * peers[proc] = NULL;* > * continue;* > * }* > > if (my_proc != procs[proc] && NULL != reachability) { > /* add this proc to shared memory accessibility list */ > rc = opal_bitmap_set_bit(reachability, proc); > if (OPAL_SUCCESS != rc) { > return rc; > } > } > > /* setup endpoint */ > rc = init_sm_endpoint(peers + proc, procs[proc]); > if (OPAL_SUCCESS != rc) { > break; > } > } > > It prevents btl/sm to be selected between processes that are not in the > same spawn generation (procs[proc]->proc_name.jobid != > my_proc->proc_name.jobid). > A simple spawn test results in this error (mono-node test). > > -------------------------------------------------------------------------- > At least one pair of MPI processes are unable to reach each other for > MPI communications. This means that no Open MPI device has indicated > that it can be used to communicate between these processes. This is > an error; Open MPI requires that all MPI processes be able to reach > each other. This error can sometimes be the result of forgetting to > specify the "self" BTL. > > Process 1 ([[58931,2],20]) is on host: pm0-nod48 > Process 2 ([[58931,1],0]) is on host: unknown! > BTLs attempted: vader self > > Your MPI job is now going to abort; sorry. > -------------------------------------------------------------------------- > > > It also seems like proc_flags are not valid: > OPAL_PROC_ON_LOCAL_NODE(procs[proc]->proc_flags) returns true for a > process spawned on another node. > > The ompi tested is based on 4.1.7 (+ some of our code), configured with > pmix-5.0.3 and hwloc=internal, ran with salloc ... mpirun ... > > (And the questions) > > Is it intended? > Should I try to reproduce with ompi-5 and open an issue? > > Thanks, > > *Florent GERMAIN* > > Ingénieur de développement – BDS-R&D > 2 rue de la Piquetterie – Bruyères le Chatel – France > eviden.com > [image: LinkedIn icon] <https://www.linkedin.com/company/eviden> [image: > Twitter icon] <https://twitter.com/EvidenLive> [image: Instagram icon] > <https://www.instagram.com/evidenlive> [image: YouTube icon] > <https://www.youtube.com/@EvidenLive> > > [image: Eviden logo] > > an atos business > > > > > > To unsubscribe from this group and stop receiving emails from it, send an > email to devel+unsubscr...@lists.open-mpi.org. > To unsubscribe from this group and stop receiving emails from it, send an email to devel+unsubscr...@lists.open-mpi.org.