Hi,
See thread below.
I've just uploaded 3.1.2-5 which I believe fixes the hangs due to
OpenMPI ( non-atomic handling of sending a 64-bit tag, occuring mostly
on archs with 32-bit atomics).
With this, I think it is appropriate to start the ball rolling on making
mpich the default MPI for buster.
Any objections?
Any ideas on how to write the ben tracker script? I think it would work
by looking for packages with binaries linked to openmpi rather than
mpich, but there are a number of packages that would be false positives
(HDF5, open-coarrays, etc. ) that build against both.
regards
Alastair
On 31/08/2018 11:17, Alastair McKinstry wrote:
On 31/08/2018 11:04, Drew Parsons wrote:
On 2018-08-30 14:18, Alastair McKinstry wrote:
On 30/08/2018 09:39, Drew Parsons wrote:
If you want a break from the openmpi angst then go ahead and drop
mpich 3.3b3 into unstable. It won't make the overall MPI situation
any worse... :)
Drew
Ok, I've pushed 3.3b3 to unstable.
Great!
For me there are two concerns:
(1) The current setup (openmpi default) shakes out issues in openmpi3
that should be fixed. It would be good to get that done.
That's fair. If we're going to "drop" openmpi, it's a good policy to
leave it in as stable a state as possible.
At this stage it appears there is a remaining "hang" / threading issue
thats affecting 32-bit platforms
(See #907267). Once thats fixed, I'm favouring no further updates
before Buster - ie ship openmpi 3.1.2 with pmix 3.0.1
(openmpi now has a dependency on libpmix, the Process Management
Interface for exascale, that handles the launching of processes (up to
millions, hierarchically).
the openmpi /pmix interface has been flaky, I suspect, and not well
tested on non-traditional HPC architectures (eg. I suspect its the
source of the 32-bit issue).
mpich _can_ be built with pmix but I'm recommending not doing so for
Buster.
(2) moving to mpich as default is a transition and should be pushed
before the deadline - say setting 30 Sept?
This is probably a good point to confer with the Release Team, so I'm
cc:ing them.
Release Team: we have nearly completed the openmpi3 transition. But
there is a broader question of switching mpi-defaults to mpich
instead of openmpi. mpich is reported to be more stable than openmpi
and is recommended by several upstream authors of the HPC software
libraries. We have some consensus that switching to mpich is
probably a good idea, it's just a question of timing at this point.
Does an MPI / mpich transition overlap with other transitions planned
for Buster - say hwloc, hdf5 ?
hdf5 already builds against both openmpi and mpich, so it should not
be a particular problem. It has had more build failures on the minor
arches (with the new hdf5 version in experimental), but there's no
reason to blame mpich for that.
I don't know about hwloc, but the builds in experimental look clean.
Drew
--
Alastair McKinstry, <[email protected]>, <[email protected]>,
https://diaspora.sceal.ie/u/amckinstry
Misentropy: doubting that the Universe is becoming more disordered.