FWIW: we see varying reports about the scalability of Slurm, especially at 
large cluster sizes. Last I saw/tested, there is a quadratic term that begins 
to dominate above 2k nodes. Others swear it is better <shrug>. Guess I'd be 
cautious and definitely test things before investing in a move - I'm not 
convinced.


On May 6, 2014, at 8:37 PM, Moody, Adam T. <mood...@llnl.gov> wrote:

> Hi Chris,
> I'm interested in SLURM / OpenMPI startup numbers, but I haven't done this 
> testing myself.  We're stuck with an older version of SLURM for various 
> internal reasons, and I'm wondering whether it's worth the effort to back 
> port the PMI2 support.  Can you share some of the differences in times at 
> different scales?
> Thanks,
> -Adam
> ________________________________________
> From: devel [devel-boun...@open-mpi.org] on behalf of Christopher Samuel 
> [sam...@unimelb.edu.au]
> Sent: Tuesday, May 06, 2014 8:32 PM
> To: de...@open-mpi.org
> Subject: Re: [OMPI devel] RFC: Force Slurm to use PMI-1 unless PMI-2 is 
> specifically requested
> 
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> On 07/05/14 12:53, Ralph Castain wrote:
> 
>> We have been seeing a lot of problems with the Slurm PMI-2 support
>> (not in OMPI - it's the code in Slurm that is having problems). At
>> this time, I'm unaware of any advantage in using PMI-2 over PMI-1
>> in Slurm - the scaling is equally poor, and PMI-2 does not supports
>> any additional functionality.
>> 
>> I know that Cray PMI-2 has a definite advantage, so I'm proposing
>> that we turn PMI-2 "off" when under Slurm unless the user
>> specifically requests we use it.
> 
> Our local testing has shown that PMI-2 in 1.7.x gives a massive
> improvement in scaling when starting jobs with srun over using srun
> with OMPI 1.6.x and now that OMPI 1.8.x is out we're planning on
> moving to using PMI2 with OMPI and srun.
> 
> Using mpirun gives good performance with OMPI 1.6.x but Slurm then
> gets all its memory stats wrong and if you run with CR_Core_Memory in
> Slurm you have a very high risk your job will get killed incorrectly.
> 
> All the best,
> Chris
> - --
> Christopher Samuel        Senior Systems Administrator
> VLSCI - Victorian Life Sciences Computation Initiative
> Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
> http://www.vlsci.org.au/      http://twitter.com/vlsci
> 
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.14 (GNU/Linux)
> Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
> 
> iEYEARECAAYFAlNpqUwACgkQO2KABBYQAh/igwCfQSB/v3tI37Rq4z5z/0xT/BYU
> 6ToAn3Qt6tOt46LQD25eHhlx+3z/sjnQ
> =LEHf
> -----END PGP SIGNATURE-----
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/05/14691.php
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/05/14692.php

Reply via email to