Florent,

Long story short, yes, this is a known limitation.
btl/sm cannot be used for intra-nodes communications between processes from
different "jobs" (MPI_Comm_spawn() creates a new job),
so you will use the interconnect (if it allows it) or btl/tcp (assuming
pml/ob1 had been selected).

IIRC, the issue for using btl/sm is the size of the shared memory used for
the inter-node communications.
A straightforward implementation requires the maximum size must be known
when the application is started.
I think the idea to improve it (allocating for "n" slots per node, which
means up to n MPI tasks at any given time can use btl/sm) was evoked but I
do not remember someone tried a proof-of-concept.

I will let the developers shed some more light on that topic.

Cheers,

Gilles

On Wed, Apr 9, 2025 at 11:52 PM 'Florent GERMAIN' via Open MPI devel <
devel@lists.open-mpi.org> wrote:

> Hi,
>
> I have a question regarding MPI_Comm_spawn and proc flags.
>
> What I understand about procs and spawns in ompi:
> Processes are identified by the proc structure.
> proc structure stores proc_name and proc_flags (and many other things).
> proc_flags defines locality related to the actual process.
> proc_name is a unique couple (jobid, vpid) that identifies an ompi process.
>
> proc_name.jobid is the generation id of the process.
> In spawn case, origin processes and spawned processes have different
> jobids. (saw it in ompi4.x, hope it is still the case in ompi5.x)
>
> In btl/sm add_procs function (
> https://github.com/open-mpi/ompi/blob/main/opal/mca/btl/sm/btl_sm_module.c#L266),
> on this part
>
>     for (int32_t proc = 0; proc < (int32_t) nprocs; ++proc) {
>         /* check to see if this proc can be reached via shmem (i.e.,
>            if they're on my local host and in my job) */
> *        if (procs[proc]->proc_name.jobid != my_proc->proc_name.jobid*
> *            || !OPAL_PROC_ON_LOCAL_NODE(procs[proc]->proc_flags)) {*
> *            peers[proc] = NULL;*
> *            continue;*
> *        }*
>
>         if (my_proc != procs[proc] && NULL != reachability) {
>             /* add this proc to shared memory accessibility list */
>             rc = opal_bitmap_set_bit(reachability, proc);
>             if (OPAL_SUCCESS != rc) {
>                 return rc;
>             }
>         }
>
>         /* setup endpoint */
>         rc = init_sm_endpoint(peers + proc, procs[proc]);
>         if (OPAL_SUCCESS != rc) {
>             break;
>         }
>     }
>
> It prevents btl/sm to be selected between processes that are not in the
> same spawn generation (procs[proc]->proc_name.jobid !=
> my_proc->proc_name.jobid).
> A simple spawn test results in this error (mono-node test).
>
> --------------------------------------------------------------------------
> At least one pair of MPI processes are unable to reach each other for
> MPI communications.  This means that no Open MPI device has indicated
> that it can be used to communicate between these processes.  This is
> an error; Open MPI requires that all MPI processes be able to reach
> each other.  This error can sometimes be the result of forgetting to
> specify the "self" BTL.
>
>   Process 1 ([[58931,2],20]) is on host: pm0-nod48
>   Process 2 ([[58931,1],0]) is on host: unknown!
>   BTLs attempted: vader self
>
> Your MPI job is now going to abort; sorry.
> --------------------------------------------------------------------------
>
>
> It also seems like proc_flags are not valid:
> OPAL_PROC_ON_LOCAL_NODE(procs[proc]->proc_flags) returns true for a
> process spawned on another node.
>
> The ompi tested is based on 4.1.7 (+ some of our code), configured with
> pmix-5.0.3 and hwloc=internal, ran with salloc ... mpirun ...
>
> (And the questions)
>
> Is it intended?
> Should I try to reproduce with ompi-5 and open an issue?
>
> Thanks,
>
> *Florent GERMAIN*
>
> Ingénieur de développement – BDS-R&D
> 2 rue de la Piquetterie – Bruyères le Chatel – France
> eviden.com
> [image: LinkedIn icon] <https://www.linkedin.com/company/eviden> [image:
> Twitter icon] <https://twitter.com/EvidenLive> [image: Instagram icon]
> <https://www.instagram.com/evidenlive> [image: YouTube icon]
> <https://www.youtube.com/@EvidenLive>
>
> [image: Eviden logo]
>
> an atos business
>
>
>
>
>
> To unsubscribe from this group and stop receiving emails from it, send an
> email to devel+unsubscr...@lists.open-mpi.org.
>

To unsubscribe from this group and stop receiving emails from it, send an email 
to devel+unsubscr...@lists.open-mpi.org.

Reply via email to