I took a look at the following: >> A remark to pmix at this point: pmix_bfrops_base_value_load() does >> silently not handle PMIX_DATA_ARRAY type leading to not working makros >> PMIX_VALUE_LOAD and PMIX_INFO_LOAD with that type. I think this is >> unlucky and took me a while to figure out why it comes to a segfault >> when pmix tried to process my PMIX_PROC_DATA infos.
It appears that this was true in the v2.x release series, but has since been fixed - thus, the v3.x series is okay. I’ll backport the support to the v2.x for their next releases. Thanks for point it out! Ralph > On Oct 12, 2018, at 6:15 AM, Ralph H Castain <r...@open-mpi.org> wrote: > > Hi Stephan > > >> On Oct 12, 2018, at 2:25 AM, Stephan Krempel <krem...@par-tec.com >> <mailto:krem...@par-tec.com>> wrote: >> >> Hallo Ralph, >> >>> I assume this (--with-ompi-mpix-rte) is a typo as the correct option >>> is —with-ompi-pmix-rte? >> >> You were right, this was a typo, with the correct option I now managed >> to start an MPI helloworld program using OpenMPI and our own process >> manager with pmix server. > > Hooray! If you want me to show support for your PM on our web site, please > send me a little info about it. You are welcome to send it off-list if you > prefer. > >> >>> It all looks okay to me for the client, but I wonder if you >>> remembered to call register_nspace and register_client on your server >>> prior to starting the client? If not, the connection will be dropped >>> - you could add PMIX_MCA_ptl_base_verbose=100 to your environment to >>> see the detailed connection handshake. >> >> This has been a point that I could finally figure out from the prrte >> code. To make it working you do not only need to call register_nspace >> but also pass some specific information to it that OpenMPI considers to >> be available (e.g. proc info with lrank). > > My apologies - we will document this better on the PMIx web site and provide > some link to it on the OMPI web site. We actually do publish the info OMPI is > expecting, but it isn’t in an obvious enough place. > >> >> A remark to pmix at this point: pmix_bfrops_base_value_load() does >> silently not handle PMIX_DATA_ARRAY type leading to not working makros >> PMIX_VALUE_LOAD and PMIX_INFO_LOAD with that type. I think this is >> unlucky and took me a while to figure out why it comes to a segfault >> when pmix tried to process my PMIX_PROC_DATA infos. > > I’ll check that out - I don’t know why we wouldn’t handle it, so it is likely > just an oversight. Regardless, it should return an error if it isn’t doing it. > >> >> So thank you again for your help so far. >> >> >> One point that remains open and is interesting for me is if I can >> achieve the same with the 3.1.2 release of OpenMPI. Is it somehow >> possible to configure it as there were the "--with-ompi-pmix-rte" >> switch from version 4.x? > > I’m afraid we didn’t backport that capability to the v3.x branches. I’ll ask > the relevant release managers if they’d like us to do so. > > Ralph > >> >> Regards, >> >> Stephan >> >> >>> >>>> On Oct 9, 2018, at 3:14 PM, Stephan Krempel <krem...@par-tec.com >>>> <mailto:krem...@par-tec.com>> >>>> wrote: >>>> >>>> Hi Ralf, >>>> >>>> After studying prrte a little bit, I tried something new and >>>> followed >>>> the description here using openmpi 4: >>>> https://pmix.org/code/building-the-pmix-reference-server/ >>>> <https://pmix.org/code/building-the-pmix-reference-server/> >>>> >>>> I configured openmpi 4.0.0rc3: >>>> >>>> ../configure --enable-debug --prefix [...] --with-pmix=[...] \ >>>> --with-libevent=/usr --with-ompi-mpix-rte >>>> >>>> (I also tried to set --with-orte=no, but it then claims not to have >>>> a >>>> suitable rte and does not finish) >>>> >>>> I then started my own PMIx and spawned a client compiled with mpicc >>>> of >>>> the new openmpi installation with this environment: >>>> >>>> PMIX_NAMESPACE=namespace_3228_0_0 >>>> PMIX_RANK=0 >>>> PMIX_SERVER_URI2=pmix-server.3234;tcp4://127.0.0.1:49637 >>>> PMIX_SERVER_URI21=pmix-server.3234;tcp4://127.0.0.1:49637 >>>> PMIX_SERVER_URI=pmix-server:3234:/tmp/pmix-3234 >>>> PMIX_SERVER_URI2USOCK=pmix-server:3234:/tmp/pmix-3234 >>>> PMIX_SECURITY_MODE=native,none >>>> PMIX_PTL_MODULE=tcp,usock >>>> PMIX_BFROP_BUFFER_TYPE=PMIX_BFROP_BUFFER_FULLY_DESC >>>> PMIX_GDS_MODULE=ds12,hash >>>> PMIX_DSTORE_ESH_BASE_PATH=/tmp/pmix_dstor_3234 >>>> >>>> The client is not connecting to my pmix server and it's environment >>>> after MPI_Init looks like that: >>>> >>>> PMIX_SERVER_URI2USOCK=pmix-server:3234:/tmp/pmix-3234 >>>> PMIX_RANK=0 >>>> PMIX_PTL_MODULE=tcp,usock >>>> PMIX_SERVER_URI=pmix-server:3234:/tmp/pmix-3234 >>>> PMIX_MCA_mca_base_component_show_load_errors=1 >>>> PMIX_BFROP_BUFFER_TYPE=PMIX_BFROP_BUFFER_FULLY_DESC >>>> PMIX_DSTORE_ESH_BASE_PATH=/tmp/ompi.landroval.1001/pid.3243/pmix_ds >>>> tor_ >>>> 3243 >>>> PMIX_SERVER_URI2=864157696.0;tcp4://127.0.0.1:33619 >>>> PMIX_SERVER_URI21=864157696.0;tcp4://127.0.0.1:33619 >>>> PMIX_SECURITY_MODE=native,none >>>> PMIX_NAMESPACE=864157697 >>>> PMIX_GDS_MODULE=ds12,hash >>>> ORTE_SCHIZO_DETECTION=ORTE >>>> OMPI_COMMAND=./hello_env >>>> OMPI_MCA_orte_precondition_transports=f28d6577f6b6ac08- >>>> d92c0e73869e1cfa >>>> OMPI_MCA_orte_launch=1 >>>> OMPI_APP_CTX_NUM_PROCS=1 >>>> OMPI_MCA_pmix=^s1,s2,cray,isolated >>>> OMPI_MCA_ess=singleton >>>> OMPI_MCA_orte_ess_num_procs=1 >>>> >>>> So something goes wrong but I do not have an idea what I am >>>> missing. Do >>>> you have an idea what I need to change? Do I have to set an MCA >>>> parameter to tell OpenMPI not to start orted, or does it need >>>> another >>>> hint in the client environment beside the stuff comming from the >>>> PMIx >>>> server helper library? >>>> >>>> >>>> Stephan >>>> >>>> >>>> On Tuesday, Oct 10 2018, 08:33 -0700 Ralph H Castain wrote: >>>>> Hi Stephan >>>>> >>>>> Thanks for the clarification - that helps a great deal. You are >>>>> correct that OMPI’s orted daemons do more than just host the PMIx >>>>> server library. However, they are only active if you launch the >>>>> OMPI >>>>> processes using mpirun. This is probably the source of the >>>>> trouble >>>>> you are seeing. >>>>> >>>>> Since you have a process launcher and have integrated the PMIx >>>>> server >>>>> support into your RM’s daemons, you really have no need for >>>>> mpirun at >>>>> all. You should just be able to launch the processes directly >>>>> using >>>>> your own launcher. The PMIx support will take care of the startup >>>>> requirements. The application procs will not use the orted in >>>>> such >>>>> cases. >>>>> >>>>> So if your system is working fine with the PMIx example programs, >>>>> then just launch the OMPI apps the same way and it should just >>>>> work. >>>>> >>>>> On the Slurm side: I’m surprised that it doesn’t work without the >>>>> —with-slurm option. An application proc doesn’t care about any of >>>>> the >>>>> Slurm-related code if PMIx is available. I might have access to a >>>>> machine where I can check it… >>>>> >>>>> Ralph >>>>> >>>>> >>>>>> On Oct 9, 2018, at 3:26 AM, Stephan Krempel <krem...@par-tec.co >>>>>> m> >>>>>> wrote: >>>>>> >>>>>> Ralph, Gilles, >>>>>> >>>>>> thanks for your input. >>>>>> >>>>>> Before I answer, let me shortly explain what my general >>>>>> intention >>>>>> is. >>>>>> We do have our own resource manager and process launcher that >>>>>> supports >>>>>> different MPI implementations in different ways. I want to >>>>>> adapt it >>>>>> to >>>>>> PMIx to cleanly support OpenMPI and hopefully other MPI >>>>>> implementation >>>>>> supporting PMIx in the future, too. >>>>>> >>>>>>> It sounds like what you really want to do is replace the >>>>>>> orted, >>>>>>> and >>>>>>> have your orted open your PMIx server? In other words, you >>>>>>> want >>>>>>> to >>>>>>> use the PMIx reference library to handle all the PMIx stuff, >>>>>>> and >>>>>>> provide your own backend functions to support the PMIx server >>>>>>> calls? >>>>>> >>>>>> You are right, that was my original plan, and I already did it >>>>>> so >>>>>> far. >>>>>> In my environment I already can launch processes that >>>>>> successfully >>>>>> call >>>>>> PMIx client functions like put, get, fence and so on, all >>>>>> handled >>>>>> by my >>>>>> servers using the PMIx server helper library. As far as I >>>>>> implemented >>>>>> the server functions now, all the example programs coming with >>>>>> the >>>>>> pmix >>>>>> library are working fine. >>>>>> >>>>>> Then I tried to use that with OpenMPI and stumbled. >>>>>> My first idea was to simply replace orted but after taking a >>>>>> closer >>>>>> look into OpenMPI it seems to me, that it uses/needs orted not >>>>>> only >>>>>> for >>>>>> spawning and exchange of process information, but also for its >>>>>> general >>>>>> communication and collectives. Am I wrong with that? >>>>>> >>>>>> So replacing it completely is perhaps not what I want since I >>>>>> do >>>>>> not >>>>>> intent to replace OpenMPIs whole communication stuff. But >>>>>> perhaps I >>>>>> do >>>>>> mix up orte and orted here, not certain about that. >>>>>> >>>>>>> If so, then your best bet would be to edit the PRRTE code in >>>>>>> orte/orted/pmix and replace it with your code. You’ll have to >>>>>>> deal >>>>>>> with the ORTE data objects and PRRTE’s launch procedure, but >>>>>>> that >>>>>>> is >>>>>>> likely easier than trying to write your own version of >>>>>>> “orted” >>>>>>> from >>>>>>> scratch. >>>>>> >>>>>> I think one problem here is, that I do not really understand >>>>>> which >>>>>> purposes orted fulfills overall especially beside implementing >>>>>> the >>>>>> PMIx >>>>>> server side. Can you please give me a short overview? >>>>>> >>>>>>> As for Slurm: it behaves the same way as PRRTE. It has a >>>>>>> plugin >>>>>>> that >>>>>>> implements the server backend functions, and the Slurm >>>>>>> daemons >>>>>>> “host” >>>>>>> the plugin. What you would need to do is replace that plugin >>>>>>> with >>>>>>> your own. >>>>>> >>>>>> I understand that, but it also seems to need some special >>>>>> support >>>>>> by >>>>>> the several slurm modules on the OpenMPI side that I do not >>>>>> understand, >>>>>> yet. At least when I tried OpenMPI without slurm support and >>>>>> `srun --mpi=pmix_v2` it does not work but generates a message >>>>>> that >>>>>> slurm support in opemmpi is missing. >>>>>> >>>>>> >>>>>> >>>>>> Stephan >>>>>> >>>>>> >>>>>> >>>>>>>> On Oct 8, 2018, at 5:36 PM, Gilles Gouaillardet <gilles@ris >>>>>>>> t.or >>>>>>>> .jp> >>>>>>>> wrote: >>>>>>>> >>>>>>>> Stephan, >>>>>>>> >>>>>>>> >>>>>>>> Have you already checked https://github.com/pmix/prrte ? >>>>>>>> >>>>>>>> >>>>>>>> This is the PMIx Reference RunTime Environment (PPRTE), >>>>>>>> which >>>>>>>> was >>>>>>>> built on top of orted. >>>>>>>> >>>>>>>> Long story short, it deploys the PMIx server and then you >>>>>>>> start >>>>>>>> your MPI app with prun >>>>>>>> An example is available at https://github.com/pmix/prrte/bl >>>>>>>> ob/m >>>>>>>> aste >>>>>>>> r/contrib/travis/test_client.sh >>>>>>>> >>>>>>>> >>>>>>>> Cheers, >>>>>>>> >>>>>>>> Gilles >>>>>>>> >>>>>>>> >>>>>>>> On 10/9/2018 8:45 AM, Stephan Krempel wrote: >>>>>>>>> Hallo everyone, >>>>>>>>> >>>>>>>>> I am currently implementing a PMIx server and I try to >>>>>>>>> use it >>>>>>>>> with >>>>>>>>> OpenMPI. I do have an own mpiexec which starts my PMIx >>>>>>>>> server >>>>>>>>> and >>>>>>>>> launches the processes. >>>>>>>>> >>>>>>>>> If I launch an executable linked against OpenMPI, during >>>>>>>>> MPI_Init() the >>>>>>>>> ORTE layer starts another PMIx server and overrides my >>>>>>>>> PMIX_* >>>>>>>>> environment so this new server is used instead of mine. >>>>>>>>> >>>>>>>>> So I am looking for a method to prevent orte(d) from >>>>>>>>> starting >>>>>>>>> a >>>>>>>>> PMIx >>>>>>>>> server. >>>>>>>>> >>>>>>>>> I already tried to understand what the slurm support is >>>>>>>>> doing, >>>>>>>>> since >>>>>>>>> this is (at least in parts) what I think I need. Somehow >>>>>>>>> when >>>>>>>>> starting >>>>>>>>> a job with srun --mpi=pmix_v2 the ess module pmi is >>>>>>>>> started, >>>>>>>>> but >>>>>>>>> I was >>>>>>>>> not able to enforce that manually by setting an MCA >>>>>>>>> parameter >>>>>>>>> (oss >>>>>>>>> should be the correct one?!?) >>>>>>>>> And I do not yet have a clue how the slurm support is >>>>>>>>> working. >>>>>>>>> >>>>>>>>> So does anyone has a hint for me where I can find >>>>>>>>> documentation >>>>>>>>> or >>>>>>>>> information concerning that or is there an easy way to >>>>>>>>> achieve >>>>>>>>> what I >>>>>>>>> am trying to do that I missed? >>>>>>>>> >>>>>>>>> Thank you in advance. >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> >>>>>>>>> Stephan >>>>>>>>> _______________________________________________ >>>>>>>>> devel mailing list >>>>>>>>> devel@lists.open-mpi.org >>>>>>>>> https://lists.open-mpi.org/mailman/listinfo/devel >>>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> devel mailing list >>>>>>>> devel@lists.open-mpi.org >>>>>>>> https://lists.open-mpi.org/mailman/listinfo/devel >>>>>>> >>>>>>> _______________________________________________ >>>>>>> devel mailing list >>>>>>> devel@lists.open-mpi.org >>>>>>> https://lists.open-mpi.org/mailman/listinfo/devel >>>>>> >>>>>> _______________________________________________ >>>>>> devel mailing list >>>>>> devel@lists.open-mpi.org >>>>>> https://lists.open-mpi.org/mailman/listinfo/devel >>>>> >>>>> _______________________________________________ >>>>> devel mailing list >>>>> devel@lists.open-mpi.org >>>>> https://lists.open-mpi.org/mailman/listinfo/devel >>>> >>>> _______________________________________________ >>>> devel mailing list >>>> devel@lists.open-mpi.org >>>> https://lists.open-mpi.org/mailman/listinfo/devel >>> >>> _______________________________________________ >>> devel mailing list >>> devel@lists.open-mpi.org <mailto:devel@lists.open-mpi.org> >>> https://lists.open-mpi.org/mailman/listinfo/devel >> -- >> -- >> Stephan Krempel >> HPC Software Engineer >> >> ParTec Cluster Competence Center GmbH >> Possartstraße 20 >> 81679 München, Germany_______________________________________________ >> devel mailing list >> devel@lists.open-mpi.org <mailto:devel@lists.open-mpi.org> >> https://lists.open-mpi.org/mailman/listinfo/devel >> <https://lists.open-mpi.org/mailman/listinfo/devel> > _______________________________________________ > devel mailing list > devel@lists.open-mpi.org <mailto:devel@lists.open-mpi.org> > https://lists.open-mpi.org/mailman/listinfo/devel > <https://lists.open-mpi.org/mailman/listinfo/devel>
_______________________________________________ devel mailing list devel@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/devel