Hi folks, Question about PMIx in the 2.x tree: on 1.x I used to be able to start N individual jobs through mpirun with -np1 and have them gradually join a single intercommunicator through MPI_Comm_accept, MPI_Comm_connect, MPI_Intercomm_create, and MPI_Intercomm_merge. The port that one of the processes would listen on included its IP address and others would connect to that. I tried porting this code to the 2.x tree and found the port is now just an integer. Reading up on the changelogs and commit history, I found PMIx replaced DPM starting with 2.x. Reading up on PMIx and OpenMPI, my understanding is that OpenMPI ships with a PMIx server implementation, and that all processes in the job have to be connected to this PMIx server at start. It looks like MPI_Comm_accept and MPI_Comm_connect communicate through k/v pairs in the PMIx server.
This means it's no longer possible to start jobs through multiple mpirun executions and then join them into a single intercommunicator at runtime. Is my understanding correct? Thank you, Pieter
_______________________________________________ devel mailing list devel@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/devel