Take a look in opal/mca/common/pmi - we already do a bunch of #if PMI2 stuff in 
there. All we are talking about doing here is:

* making those selections be runtime based on an MCA param, compiling if PMI2 
is available but selected at runtime

* moving some additional functions into that code area and out of the 
individual components


On May 7, 2014, at 5:08 PM, Artem Polyakov <artpo...@gmail.com> wrote:

> I like #2 too. 
> But my question was slightly different. Can we incapsulate PMI logic that 
> OMPI use in common/pmi as #2 suggests but have 2 different implementations of 
> this component say common/pmi and common/pmi2? I am asking because I have 
> concerns that this kind of component is not supposed to be duplicated.
> In this case we could have one common MCA parameter and 2 components as it 
> was suggested by Jeff.
> 
> 
> 2014-05-08 7:01 GMT+07:00 Ralph Castain <r...@open-mpi.org>:
> The desired solution is to have the ability to select pmi-1 vs pmi-2 at 
> runtime. This can be done in two ways:
> 
> 1. you could have separate pmi1 and pmi2 components in each framework. You'd 
> want to define only one common MCA param to direct the selection, however.
> 
> 2. you could have a single pmi component in each framework, calling code in 
> the appropriate common/pmi location. You would then need a runtime MCA param 
> to select whether pmi-1 or pmi-2 was going to be used, and have the common 
> code check before making the desired calls.
> 
> The choice of method is left up to you. They each have their negatives. If it 
> were me, I'd probably try #2 first, assuming the codes are mostly common in 
> the individual frameworks.
> 
> 
> On May 7, 2014, at 4:51 PM, Artem Polyakov <artpo...@gmail.com> wrote:
> 
>> Just reread your suggestions in our out-of-list discussion and found that I 
>> misunderstand it. So no parallel PMI! Take all possible code into 
>> opal/mca/common/pmi.
>> To additionally clarify what is the preferred way:
>> 1. to create one joined PMI module having a switches to decide what 
>> functiononality to implement.
>> 2. or to have 2 separate common modules for PMI1 and one for PMI2, and does 
>> this fit opal/mca/common/ ideology at all?
>> 
>> 
>> 2014-05-08 6:44 GMT+07:00 Artem Polyakov <artpo...@gmail.com>:
>> 
>> 2014-05-08 5:54 GMT+07:00 Ralph Castain <r...@open-mpi.org>:
>> 
>> Ummm....no, I don't think that's right. I believe we decided to instead 
>> create the separate components, default to PMI-2 if available, print nice 
>> error message if not, otherwise use PMI-1.
>> 
>> I don't want to initialize both PMIs in parallel as most installations won't 
>> support it.
>> 
>> Ok, I agree. Beside the lack of support there can be a performance hit 
>> caused by PMI1 initialization at scale. This is not a case of SLURM PMI1 
>> since it is quite simple and local. But I didn't consider other 
>> implementations.
>> 
>> On May 7, 2014, at 3:49 PM, Artem Polyakov <artpo...@gmail.com> wrote:
>> 
>>> We discussed with Ralph Joshuas concerns and decided to try automatic PMI2 
>>> correctness first as it was initially intended. Here is my idea. The 
>>> universal way to decide if PMI2 is correct is to compare PMI_Init(.., 
>>> &rank, &size, ...) and PMI2_Init(.., &rank, &size, ...). Size and rank 
>>> should be equal. In this case we proceed with PMI2 finalizing PMI1. 
>>> Otherwise we finalize PMI2 and proceed with PMI1.
>>> I need to clarify with SLURM guys if parallel initialization of both PMIs 
>>> are legal. If not - we'll do that sequentially. 
>>> In other places we'll just use the flag saying what PMI version to use.
>>> Does that sounds reasonable?
>>> 
>>> 2014-05-07 23:10 GMT+07:00 Artem Polyakov <artpo...@gmail.com>:
>>> That's a good point. There is actually a bunch of modules in ompi, opal and 
>>> orte that has to be duplicated.
>>> 
>>> среда, 7 мая 2014 г. пользователь Joshua Ladd написал:
>>> +1 Sounds like a good idea - but decoupling the two and adding all the 
>>> right selection mojo might be a bit of a pain. There are several places in 
>>> OMPI where the distinction between PMI1and PMI2 is made, not only in 
>>> grpcomm. DB and ESS frameworks off the top of my head.
>>> 
>>> Josh
>>> 
>>> 
>>> On Wed, May 7, 2014 at 11:48 AM, Artem Polyakov <artpo...@gmail.com> wrote:
>>> Good idea :)!
>>> 
>>> среда, 7 мая 2014 г. пользователь Ralph Castain написал:
>>> 
>>> Jeff actually had a useful suggestion (gasp!).He proposed that we separate 
>>> the PMI-1 and PMI-2 codes into separate components so you could select them 
>>> at runtime. Thus, we would build both (assuming both PMI-1 and 2 libs are 
>>> found), default to PMI-1, but users could select to try PMI-2. If the PMI-2 
>>> component failed, we would emit a show_help indicating that they probably 
>>> have a broken PMI-2 version and should try PMI-1.
>>> 
>>> Make sense?
>>> Ralph
>>> 
>>> On May 7, 2014, at 8:00 AM, Ralph Castain <r...@open-mpi.org> wrote:
>>> 
>>>> 
>>>> On May 7, 2014, at 7:56 AM, Joshua Ladd <jladd.m...@gmail.com> wrote:
>>>> 
>>>>> Ah, I see. Sorry for the reactionary comment - but this feature falls 
>>>>> squarely within my "jurisdiction", and we've invested a lot in improving 
>>>>> OMPI jobstart under srun. 
>>>>> 
>>>>> That being said (now that I've taken some deep breaths and carefully read 
>>>>> your original email :)), what you're proposing isn't a bad idea. I think 
>>>>> it would be good to maybe add a "--with-pmi2" flag to configure since 
>>>>> "--with-pmi" automagically uses PMI2 if it finds the header and lib. This 
>>>>> way, we could experiment with PMI1/PMI2 without having to rebuild SLURM 
>>>>> or hack the installation. 
>>>> 
>>>> That would be a much simpler solution than what Artem proposed (off-list) 
>>>> where we would try PMI2 and then if it didn't work try to figure out how 
>>>> to fall back to PMI1. I'll add this for now, and if Artem wants to try his 
>>>> more automagic solution and can make it work, then we can reconsider that 
>>>> option.
>>>> 
>>>> Thanks
>>>> Ralph
>>>> 
>>>>> 
>>>>> Josh  
>>>>> 
>>>>> 
>>>>> On Wed, May 7, 2014 at 10:45 AM, Ralph Castain <r...@open-mpi.org> wrote:
>>>>> Okay, then we'll just have to develop a workaround for all those Slurm 
>>>>> releases where PMI-2 is borked :-(
>>>>> 
>>>>> FWIW: I think people misunderstood my statement. I specifically did *not* 
>>>>> propose to *lose* PMI-2 support. I suggested that we change it to 
>>>>> "on-by-request" instead of the current "on-by-default" so we wouldn't 
>>>>> keep getting asked about PMI-2 bugs in Slurm. Once the Slurm 
>>>>> implementation stabilized, then we could reverse that policy.
>>>>> 
>>>>> However, given that both you and Chris appear to prefer to keep it 
>>>>> "on-by-default", we'll see if we can find a way to detect that PMI-2 is 
>>>>> broken and then fall back to PMI-1.
>>>>> 
>>>>> 
>>>>> On May 7, 2014, at 7:39 AM, Joshua Ladd <jladd.m...@gmail.com> wrote:
>>>>> 
>>>>>> Just saw this thread, and I second Chris' observations: at scale we are 
>>>>>> seeing huge gains in jobstart performance with PMI2 over PMI1. We CANNOT 
>>>>>> loose this functionality. For competitive reasons, I cannot provide 
>>>>>> exact numbers, but let's say the difference is in the ballpark of a full 
>>>>>> order-of-magnitude on 20K ranks versus PMI1. PMI1 is completely 
>>>>>> unacceptable/unusable at scale. Certainly PMI2 still has scaling issues, 
>>>>>> but there is no contest between PMI1 and PMI2.  We (MLNX) are actively 
>>>>>> working to resolve some of the scalability issues in PMI2. 
>>>>>> 
>>>>>> Josh
>>>>>> 
>>>>>> Joshua S. Ladd
>>>>>> Staff Engineer, HPC Software
>>>>>> Mellanox Technologies
>>>>>> 
>>>>>> Email: josh...@mellanox.com
>>>>>> 
>>>>>> 
>>>>>> On Wed, May 7, 2014 at 4:00 AM, Ralph Castain <r...@open-mpi.org> wrote:
>>>>>> Interesting - how many nodes were involved? As I said, the bad scaling 
>>>>>> becomes more evident at a fairly high node count.
>>>>>> 
>>>>>> On May 7, 2014, at 12:07 AM, Christopher Samuel <sam...@unimelb.edu.au> 
>>>>>> wrote:
>>>>>> 
>>>>>> > -----BEGIN PGP SIGNED MESSAGE-----
>>>>>> > Hash: SHA1
>>>>>> >
>>>>>> > Hiya Ralph,
>>>>>> >
>>>>>> > On 07/05/14 14:49, Ralph Castain wrote:
>>>>>> >
>>>>>> >> I should have looked closer to see the numbers you posted, Chris -
>>>>>> >> those include time for MPI wireup. So what you are seeing is that
>>>>>> >> mpirun is much more efficient at exchanging the MPI endpoint info
>>>>>> >> than PMI. I suspect that PMI2 is not much better as the primary
>>>>>> >> reason for the difference is that mpriun sends blobs, while PMI
>>>>>> >> requires that everything b
>>> _______________________________________________
>>> 
>>> devel mailing list
>>> de...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> Link to this post: 
>>> http://www.open-mpi.org/community/lists/devel/2014/05/14716.php
>>> 
>>> 
>>> 
>>> 
>>> -- 
>>> С Уважением, Поляков Артем Юрьевич
>>> Best regards, Artem Y. Polyakov
>>> _______________________________________________
>>> devel mailing list
>>> de...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> Link to this post: 
>>> http://www.open-mpi.org/community/lists/devel/2014/05/14725.php
>> 
>> 
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2014/05/14726.php
>> 
>> 
>> 
>> -- 
>> С Уважением, Поляков Артем Юрьевич
>> Best regards, Artem Y. Polyakov
>> 
>> 
>> 
>> -- 
>> С Уважением, Поляков Артем Юрьевич
>> Best regards, Artem Y. Polyakov
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2014/05/14728.php
> 
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/05/14729.php
> 
> 
> 
> -- 
> С Уважением, Поляков Артем Юрьевич
> Best regards, Artem Y. Polyakov
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/05/14730.php

Reply via email to