I took a look at the following:

>> A remark to pmix at this point: pmix_bfrops_base_value_load() does
>> silently not handle PMIX_DATA_ARRAY type leading to not working makros
>> PMIX_VALUE_LOAD and PMIX_INFO_LOAD with that type. I think this is
>> unlucky and took me a while to figure out why it comes to a segfault
>> when pmix tried to process my PMIX_PROC_DATA infos.

It appears that this was true in the v2.x release series, but has since been 
fixed - thus, the v3.x series is okay. I’ll backport the support to the v2.x 
for their next releases.

Thanks for point it out!
Ralph

> On Oct 12, 2018, at 6:15 AM, Ralph H Castain <r...@open-mpi.org> wrote:
> 
> Hi Stephan
> 
> 
>> On Oct 12, 2018, at 2:25 AM, Stephan Krempel <krem...@par-tec.com 
>> <mailto:krem...@par-tec.com>> wrote:
>> 
>> Hallo Ralph,
>> 
>>> I assume this (--with-ompi-mpix-rte) is a typo as the correct option
>>> is —with-ompi-pmix-rte?
>> 
>> You were right, this was a typo, with the correct option I now managed
>> to start an MPI helloworld program using OpenMPI and our own process
>> manager with pmix server.
> 
> Hooray! If you want me to show support for your PM on our web site, please 
> send me a little info about it. You are welcome to send it off-list if you 
> prefer.
> 
>> 
>>> It all looks okay to me for the client, but I wonder if you
>>> remembered to call register_nspace and register_client on your server
>>> prior to starting the client? If not, the connection will be dropped
>>> - you could add PMIX_MCA_ptl_base_verbose=100 to your environment to
>>> see the detailed connection handshake.
>> 
>> This has been a point that I could finally figure out from the prrte
>> code. To make it working you do not only need to call register_nspace
>> but also pass some specific information to it that OpenMPI considers to
>> be available (e.g. proc info with lrank).
> 
> My apologies - we will document this better on the PMIx web site and provide 
> some link to it on the OMPI web site. We actually do publish the info OMPI is 
> expecting, but it isn’t in an obvious enough place.
> 
>> 
>> A remark to pmix at this point: pmix_bfrops_base_value_load() does
>> silently not handle PMIX_DATA_ARRAY type leading to not working makros
>> PMIX_VALUE_LOAD and PMIX_INFO_LOAD with that type. I think this is
>> unlucky and took me a while to figure out why it comes to a segfault
>> when pmix tried to process my PMIX_PROC_DATA infos.
> 
> I’ll check that out - I don’t know why we wouldn’t handle it, so it is likely 
> just an oversight. Regardless, it should return an error if it isn’t doing it.
> 
>> 
>> So thank you again for your help so far.
>> 
>> 
>> One point that remains open and is interesting for me is if I can
>> achieve the same with the 3.1.2 release of OpenMPI. Is it somehow
>> possible to configure it as there were the "--with-ompi-pmix-rte"
>> switch from version 4.x?
> 
> I’m afraid we didn’t backport that capability to the v3.x branches. I’ll ask 
> the relevant release managers if they’d like us to do so.
> 
> Ralph
> 
>> 
>> Regards,
>> 
>> Stephan
>> 
>> 
>>> 
>>>> On Oct 9, 2018, at 3:14 PM, Stephan Krempel <krem...@par-tec.com 
>>>> <mailto:krem...@par-tec.com>>
>>>> wrote:
>>>> 
>>>> Hi Ralf,
>>>> 
>>>> After studying prrte a little bit, I tried something new and
>>>> followed
>>>> the description here using openmpi 4:
>>>> https://pmix.org/code/building-the-pmix-reference-server/ 
>>>> <https://pmix.org/code/building-the-pmix-reference-server/>
>>>> 
>>>> I configured openmpi 4.0.0rc3:
>>>> 
>>>> ../configure --enable-debug --prefix [...] --with-pmix=[...] \
>>>>  --with-libevent=/usr --with-ompi-mpix-rte
>>>> 
>>>> (I also tried to set --with-orte=no, but it then claims not to have
>>>> a
>>>> suitable rte and does not finish)
>>>> 
>>>> I then started my own PMIx and spawned a client compiled with mpicc
>>>> of
>>>> the new openmpi installation with this environment:
>>>> 
>>>> PMIX_NAMESPACE=namespace_3228_0_0
>>>> PMIX_RANK=0
>>>> PMIX_SERVER_URI2=pmix-server.3234;tcp4://127.0.0.1:49637
>>>> PMIX_SERVER_URI21=pmix-server.3234;tcp4://127.0.0.1:49637
>>>> PMIX_SERVER_URI=pmix-server:3234:/tmp/pmix-3234
>>>> PMIX_SERVER_URI2USOCK=pmix-server:3234:/tmp/pmix-3234
>>>> PMIX_SECURITY_MODE=native,none
>>>> PMIX_PTL_MODULE=tcp,usock
>>>> PMIX_BFROP_BUFFER_TYPE=PMIX_BFROP_BUFFER_FULLY_DESC
>>>> PMIX_GDS_MODULE=ds12,hash
>>>> PMIX_DSTORE_ESH_BASE_PATH=/tmp/pmix_dstor_3234
>>>> 
>>>> The client is not connecting to my pmix server and it's environment
>>>> after MPI_Init looks like that:
>>>> 
>>>> PMIX_SERVER_URI2USOCK=pmix-server:3234:/tmp/pmix-3234
>>>> PMIX_RANK=0
>>>> PMIX_PTL_MODULE=tcp,usock
>>>> PMIX_SERVER_URI=pmix-server:3234:/tmp/pmix-3234
>>>> PMIX_MCA_mca_base_component_show_load_errors=1
>>>> PMIX_BFROP_BUFFER_TYPE=PMIX_BFROP_BUFFER_FULLY_DESC
>>>> PMIX_DSTORE_ESH_BASE_PATH=/tmp/ompi.landroval.1001/pid.3243/pmix_ds
>>>> tor_
>>>> 3243
>>>> PMIX_SERVER_URI2=864157696.0;tcp4://127.0.0.1:33619
>>>> PMIX_SERVER_URI21=864157696.0;tcp4://127.0.0.1:33619
>>>> PMIX_SECURITY_MODE=native,none
>>>> PMIX_NAMESPACE=864157697
>>>> PMIX_GDS_MODULE=ds12,hash
>>>> ORTE_SCHIZO_DETECTION=ORTE
>>>> OMPI_COMMAND=./hello_env
>>>> OMPI_MCA_orte_precondition_transports=f28d6577f6b6ac08-
>>>> d92c0e73869e1cfa
>>>> OMPI_MCA_orte_launch=1
>>>> OMPI_APP_CTX_NUM_PROCS=1
>>>> OMPI_MCA_pmix=^s1,s2,cray,isolated
>>>> OMPI_MCA_ess=singleton
>>>> OMPI_MCA_orte_ess_num_procs=1
>>>> 
>>>> So something goes wrong but I do not have an idea what I am
>>>> missing. Do
>>>> you have an idea what I need to change? Do I have to set an MCA
>>>> parameter to tell OpenMPI not to start orted, or does it need
>>>> another
>>>> hint in the client environment beside the stuff comming from the
>>>> PMIx
>>>> server helper library?
>>>> 
>>>> 
>>>> Stephan
>>>> 
>>>> 
>>>> On Tuesday, Oct 10 2018, 08:33 -0700 Ralph H Castain wrote:
>>>>> Hi Stephan
>>>>> 
>>>>> Thanks for the clarification - that helps a great deal. You are
>>>>> correct that OMPI’s orted daemons do more than just host the PMIx
>>>>> server library. However, they are only active if you launch the
>>>>> OMPI
>>>>> processes using mpirun. This is probably the source of the
>>>>> trouble
>>>>> you are seeing.
>>>>> 
>>>>> Since you have a process launcher and have integrated the PMIx
>>>>> server
>>>>> support into your RM’s daemons, you really have no need for
>>>>> mpirun at
>>>>> all. You should just be able to launch the processes directly
>>>>> using
>>>>> your own launcher. The PMIx support will take care of the startup
>>>>> requirements. The application procs will not use the orted in
>>>>> such
>>>>> cases.
>>>>> 
>>>>> So if your system is working fine with the PMIx example programs,
>>>>> then just launch the OMPI apps the same way and it should just
>>>>> work.
>>>>> 
>>>>> On the Slurm side: I’m surprised that it doesn’t work without the
>>>>> —with-slurm option. An application proc doesn’t care about any of
>>>>> the
>>>>> Slurm-related code if PMIx is available. I might have access to a
>>>>> machine where I can check it…
>>>>> 
>>>>> Ralph
>>>>> 
>>>>> 
>>>>>> On Oct 9, 2018, at 3:26 AM, Stephan Krempel <krem...@par-tec.co
>>>>>> m>
>>>>>> wrote:
>>>>>> 
>>>>>> Ralph, Gilles,
>>>>>> 
>>>>>> thanks for your input.
>>>>>> 
>>>>>> Before I answer, let me shortly explain what my general
>>>>>> intention
>>>>>> is.
>>>>>> We do have our own resource manager and process launcher that
>>>>>> supports
>>>>>> different MPI implementations in different ways. I want to
>>>>>> adapt it
>>>>>> to
>>>>>> PMIx to cleanly support OpenMPI and hopefully other MPI
>>>>>> implementation
>>>>>> supporting PMIx in the future, too. 
>>>>>> 
>>>>>>> It sounds like what you really want to do is replace the
>>>>>>> orted,
>>>>>>> and
>>>>>>> have your orted open your PMIx server? In other words, you
>>>>>>> want
>>>>>>> to
>>>>>>> use the PMIx reference library to handle all the PMIx stuff,
>>>>>>> and
>>>>>>> provide your own backend functions to support the PMIx server
>>>>>>> calls? 
>>>>>> 
>>>>>> You are right, that was my original plan, and I already did it
>>>>>> so
>>>>>> far.
>>>>>> In my environment I already can launch processes that
>>>>>> successfully
>>>>>> call
>>>>>> PMIx client functions like put, get, fence and so on, all
>>>>>> handled
>>>>>> by my
>>>>>> servers using the PMIx server helper library. As far as I
>>>>>> implemented
>>>>>> the server functions now, all the example programs coming with
>>>>>> the
>>>>>> pmix
>>>>>> library are working fine.
>>>>>> 
>>>>>> Then I tried to use that with OpenMPI and stumbled.
>>>>>> My first idea was to simply replace orted but after taking a
>>>>>> closer
>>>>>> look into OpenMPI it seems to me, that it uses/needs orted not
>>>>>> only
>>>>>> for
>>>>>> spawning and exchange of process information, but also for its
>>>>>> general
>>>>>> communication and collectives. Am I wrong with that?
>>>>>> 
>>>>>> So replacing it completely is perhaps not what I want since I
>>>>>> do
>>>>>> not
>>>>>> intent to replace OpenMPIs whole communication stuff. But
>>>>>> perhaps I
>>>>>> do
>>>>>> mix up orte and orted here, not certain about that.
>>>>>> 
>>>>>>> If so, then your best bet would be to edit the PRRTE code in
>>>>>>> orte/orted/pmix and replace it with your code. You’ll have to
>>>>>>> deal
>>>>>>> with the ORTE data objects and PRRTE’s launch procedure, but
>>>>>>> that
>>>>>>> is
>>>>>>> likely easier than trying to write your own version of
>>>>>>> “orted”
>>>>>>> from
>>>>>>> scratch.
>>>>>> 
>>>>>> I think one problem here is, that I do not really understand
>>>>>> which
>>>>>> purposes orted fulfills overall especially beside implementing
>>>>>> the
>>>>>> PMIx
>>>>>> server side. Can you please give me a short overview?
>>>>>> 
>>>>>>> As for Slurm: it behaves the same way as PRRTE. It has a
>>>>>>> plugin
>>>>>>> that
>>>>>>> implements the server backend functions, and the Slurm
>>>>>>> daemons
>>>>>>> “host”
>>>>>>> the plugin. What you would need to do is replace that plugin
>>>>>>> with
>>>>>>> your own.
>>>>>> 
>>>>>> I understand that, but it also seems to need some special
>>>>>> support
>>>>>> by
>>>>>> the several slurm modules on the OpenMPI side that I do not
>>>>>> understand,
>>>>>> yet. At least when I tried OpenMPI without slurm support and
>>>>>> `srun --mpi=pmix_v2` it does not work but generates a message
>>>>>> that
>>>>>> slurm support in opemmpi is missing.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Stephan
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>>> On Oct 8, 2018, at 5:36 PM, Gilles Gouaillardet <gilles@ris
>>>>>>>> t.or
>>>>>>>> .jp>
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>> Stephan,
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Have you already checked https://github.com/pmix/prrte ?
>>>>>>>> 
>>>>>>>> 
>>>>>>>> This is the PMIx Reference RunTime Environment (PPRTE),
>>>>>>>> which
>>>>>>>> was
>>>>>>>> built on top of orted.
>>>>>>>> 
>>>>>>>> Long story short, it deploys the PMIx server and then you
>>>>>>>> start
>>>>>>>> your MPI app with prun
>>>>>>>> An example is available at https://github.com/pmix/prrte/bl
>>>>>>>> ob/m
>>>>>>>> aste
>>>>>>>> r/contrib/travis/test_client.sh
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Cheers,
>>>>>>>> 
>>>>>>>> Gilles
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On 10/9/2018 8:45 AM, Stephan Krempel wrote:
>>>>>>>>> Hallo everyone,
>>>>>>>>> 
>>>>>>>>> I am currently implementing a PMIx server and I try to
>>>>>>>>> use it
>>>>>>>>> with
>>>>>>>>> OpenMPI. I do have an own mpiexec which starts my PMIx
>>>>>>>>> server
>>>>>>>>> and
>>>>>>>>> launches the processes.
>>>>>>>>> 
>>>>>>>>> If I launch an executable linked against OpenMPI, during
>>>>>>>>> MPI_Init() the
>>>>>>>>> ORTE layer starts another PMIx server and overrides my
>>>>>>>>> PMIX_*
>>>>>>>>> environment so this new server is used instead of mine.
>>>>>>>>> 
>>>>>>>>> So I am looking for a method to prevent orte(d) from
>>>>>>>>> starting
>>>>>>>>> a
>>>>>>>>> PMIx
>>>>>>>>> server.
>>>>>>>>> 
>>>>>>>>> I already tried to understand what the slurm support is
>>>>>>>>> doing,
>>>>>>>>> since
>>>>>>>>> this is (at least in parts) what I think I need. Somehow
>>>>>>>>> when
>>>>>>>>> starting
>>>>>>>>> a job with srun --mpi=pmix_v2 the ess module pmi is
>>>>>>>>> started,
>>>>>>>>> but
>>>>>>>>> I was
>>>>>>>>> not able to enforce that manually by setting an MCA
>>>>>>>>> parameter
>>>>>>>>> (oss
>>>>>>>>> should be the correct one?!?)
>>>>>>>>> And I do not yet have a clue how the slurm support is
>>>>>>>>> working.
>>>>>>>>> 
>>>>>>>>> So does anyone has a hint for me where I can find
>>>>>>>>> documentation
>>>>>>>>> or
>>>>>>>>> information concerning that or is there an easy way to
>>>>>>>>> achieve
>>>>>>>>> what I
>>>>>>>>> am trying to do that I missed?
>>>>>>>>> 
>>>>>>>>> Thank you in advance.
>>>>>>>>> 
>>>>>>>>> Regards,
>>>>>>>>> 
>>>>>>>>> Stephan
>>>>>>>>> _______________________________________________
>>>>>>>>> devel mailing list
>>>>>>>>> devel@lists.open-mpi.org
>>>>>>>>> https://lists.open-mpi.org/mailman/listinfo/devel
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> _______________________________________________
>>>>>>>> devel mailing list
>>>>>>>> devel@lists.open-mpi.org
>>>>>>>> https://lists.open-mpi.org/mailman/listinfo/devel
>>>>>>> 
>>>>>>> _______________________________________________
>>>>>>> devel mailing list
>>>>>>> devel@lists.open-mpi.org
>>>>>>> https://lists.open-mpi.org/mailman/listinfo/devel
>>>>>> 
>>>>>> _______________________________________________
>>>>>> devel mailing list
>>>>>> devel@lists.open-mpi.org
>>>>>> https://lists.open-mpi.org/mailman/listinfo/devel
>>>>> 
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> devel@lists.open-mpi.org
>>>>> https://lists.open-mpi.org/mailman/listinfo/devel
>>>> 
>>>> _______________________________________________
>>>> devel mailing list
>>>> devel@lists.open-mpi.org
>>>> https://lists.open-mpi.org/mailman/listinfo/devel
>>> 
>>> _______________________________________________
>>> devel mailing list
>>> devel@lists.open-mpi.org <mailto:devel@lists.open-mpi.org>
>>> https://lists.open-mpi.org/mailman/listinfo/devel
>> -- 
>> -- 
>> Stephan Krempel
>> HPC Software Engineer
>> 
>> ParTec Cluster Competence Center GmbH
>> Possartstraße 20
>> 81679 München, Germany_______________________________________________
>> devel mailing list
>> devel@lists.open-mpi.org <mailto:devel@lists.open-mpi.org>
>> https://lists.open-mpi.org/mailman/listinfo/devel 
>> <https://lists.open-mpi.org/mailman/listinfo/devel>
> _______________________________________________
> devel mailing list
> devel@lists.open-mpi.org <mailto:devel@lists.open-mpi.org>
> https://lists.open-mpi.org/mailman/listinfo/devel 
> <https://lists.open-mpi.org/mailman/listinfo/devel>
_______________________________________________
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

Reply via email to