The Open MPI community is pleased to announce the Open MPI v4.1.1 release.  
This release contains a number of bug fixes and minor improvements.

Open MPI v4.1.1 can be downloaded from the Open MPI website:

Changes in v4.1.1 compared to v4.1.0:

- Fix a number of datatype issues, including an issue with
  improper handling of partial datatypes that could lead to
  an unexpected application failure.
- Change UCX PML to not warn about MPI_Request leaks during
  MPI_FINALIZE by default.  The old behavior can be restored with
  the mca_pml_ucx_request_leak_check MCA parameter.
- Reverted temporary solution that worked around launch issues in
  SLURM v20.11.{0,1,2}. SchedMD encourages users to avoid these
  versions and to upgrade to v20.11.3 or newer.
- Updated PMIx to v3.2.2.
- Fixed configuration issue on Apple Silicon observed with
  Homebrew. Thanks to François-Xavier Coudert for reporting the issue.
- Disabled gcc built-in atomics by default on aarch64 platforms.
- Disabled UCX PML when UCX v1.8.0 is detected. UCX version 1.8.0 has a bug that
  may cause data corruption when its TCP transport is used in conjunction with
  the shared memory transport. UCX versions prior to v1.8.0 are not affected by
  this issue. Thanks to @ksiazekm for reporting the issue.
- Fixed detection of available UCX transports/devices to better inform PML
- Fixed SLURM support to mark ORTE daemons as non-MPI tasks.
- Improved AVX detection to more accurately detect supported
  platforms.  Also improved the generated AVX code, and switched to
  using word-based MCA params for the op/avx component (vs. numeric
  big flags).
- Improved OFI compatibility support and fixed memory leaks in error
  handling paths.
- Improved HAN collectives with support for Barrier and Scatter. Thanks
  to @EmmanuelBRELLE for these changes and the relevant bug fixes.
- Fixed MPI debugger support (i.e., the MPIR_Breakpoint() symbol).
  Thanks to @louisespellacy-arm for reporting the issue.
- Fixed ORTE bug that prevented debuggers from reading MPIR_Proctable.
- Removed PML uniformity check from the UCX PML to address performance
- Fixed MPI_Init_thread(3) statement about C++ binding and update
  references about MPI_THREAD_MULTIPLE.  Thanks to Andreas Lösel for
  bringing the outdated docs to our attention.
- Added fence_nb to Flux PMIx support to address segmentation faults.
- Ensured progress of AIO requests in the POSIX FBTL component to
  prevent exceeding maximum number of pending requests on MacOS.
- Used OPAL's mutli-thread support in the orted to leverage atomic
  operations for object refcounting.
- Fixed segv when launching with static TCP ports.
- Fixed --debug-daemons mpirun CLI option.
- Fixed bug where mpirun did not honor --host in a managed job
- Made a managed allocation filter a hostfile/hostlist.
- Fixed bug to marked a generalized request as pending once initiated.
- Fixed external PMIx v4.x check.
- Fixed OSHMEM build with `--enable-mem-debug`.
- Fixed a performance regression observed with older versions of GCC when
  __ATOMIC_SEQ_CST is used. Thanks to @BiplabRaut for reporting the issue.
- Fixed buffer allocation bug in the binomial tree scatter algorithm when
  non-contiguous datatypes are used. Thanks to @sadcat11 for reporting the 
- Fixed bugs related to the accumulate and atomics functionality in the
  osc/rdma component.
- Fixed race condition in MPI group operations observed with
  MPI_THREAD_MULTIPLE threading level.
- Fixed a deadlock in the TCP BTL's connection matching logic.
- Fixed pml/ob1 compilation error when CUDA support is enabled.
- Fixed a build issue with Lustre caused by unnecessary header includes.
- Fixed a build issue with IMB LSF workload manager.
- Fixed linker error with UCX SPML.

Jeff Squyres

announce mailing list

Reply via email to