-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 23/07/13 19:34, Joshua Ladd wrote:
> Hi, Chris Hi Joshua, I've quoted you in full as I don't think your message made it through to the slurm-dev list (at least I've not received it from there yet). > Funny you should mention this now. We identified and diagnosed the > issue some time ago as a combination of SLURM's PMI1 > implementation and some of, what I'll call, OMPI's topology > requirements (probably not the right word.) Here's what is > happening, in a nutshell, when you launch with srun: > > 1. Each process pushes his endpoint data up to the PMI "cloud" via > PMI put (I think it's about five or six puts, bottom line, O(1).) > 2. Then executes a PMI commit and PMI barrier to ensure all other > processes have finished committing their data to the "cloud". 3. > Subsequent to this, each process executes O(N) (N is the number of > procs in the job) PMI gets in order to get all of the endpoint > data for every process regardless of whether or not the process > communicates with that endpoint. > > "We" (MLNX et al.) undertook an in-depth scaling study of this and > identified several poorly scaling pieces with the worst offenders > being: > > 1. PMI Barrier scales worse than linear. 2. At scale, the PMI get > phase starts to look quadratic. > > The proposed solution that "we" (OMPI + SLURM) have come up with is > to modify OMPI to support PMI2 and to use SLURM 2.6 which has > support for PMI2 and is (allegedly) much more scalable than PMI1. > Several folks in the combined communities are working hard, as we > speak, trying to get this functional to see if it indeed makes a > difference. Stay tuned, Chris. Hopefully we will have some data by > the end of the week. Wonderful, great to know that what we're seeing is actually real and not just pilot error on our part! We're happy enough to tell users to keep on using mpirun as they will be used to from our other Intel systems and to only use srun if the code requires it (one or two commercial apps that use Intel MPI). Can I ask, if the PMI2 ideas work out is that likely to get backported to OMPI 1.6.x ? All the best, Chris - -- Christopher Samuel Senior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iEYEARECAAYFAlHvEZIACgkQO2KABBYQAh9QogCeMuR/E4oPivdsX3r671+z7EWd Hv8An1N8csHMby7bouT/gC07i/J2PW+i =gZsB -----END PGP SIGNATURE-----