I performed some tests on our Omnipath cluster, and I have a mixed bag of results with 4.0.0rc1
1. Good news, the problems with the psm2 mtl that I reported in June/July seem to be fixed. I still get however a warning every time I run a job with 4.0.0, e.g. compute-1-1.local.4351PSM2 has not been initialized compute-1-0.local.3826PSM2 has not been initialized although based on the performance, it is very clear that psm2 is being used. I double checked with 3.0 series, I do not get the same warnings on the same set of nodes. The unfortunate part about this error message is, that it seems that applications seem to return an error (although tests and applications seem to finish correctly otherwise) -------------------------------------------------------------------------- mpirun detected that one or more processes exited with non-zero status, thus causing the job to be terminated. The first process to do so was: Process name: [[38418,1],1] Exit code: 255 ---------------------------------------------------------------------------- 2. The ofi mtl does not work at all on our Omnipath cluster. If I try to force it using ‘mpirun –mca mtl ofi …’ I get the following error message. [compute-1-0:03988] *** An error occurred in MPI_Barrier [compute-1-0:03988] *** reported by process [2712141825,0] [compute-1-0:03988] *** on communicator MPI_COMM_WORLD [compute-1-0:03988] *** MPI_ERR_OTHER: known error not in list [compute-1-0:03988] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, [compute-1-0:03988] *** and potentially your MPI job) [sabine.cacds.uh.edu:21046] 1 more process has sent help message help-mpi-errors.txt / mpi_errors_are_fatal [sabine.cacds.uh.edu:21046] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages I once again double checked that this works correctly in the 3.0 (and 3.1, although I did not run that test this time). 3. The openib btl component is always getting in the way with annoying warnings. It is not really used, but constantly complains: [sabine.cacds.uh.edu:25996] 1 more process has sent help message help-mpi-btl-openib.txt / ib port not selected [sabine.cacds.uh.edu:25996] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages [sabine.cacds.uh.edu:25996] 1 more process has sent help message help-mpi-btl-openib.txt / error in device init So bottom line, if I do mpirun –mca btl^openib –mca mtl^ofi …. my tests finish correctly, although mpirun will still return an error. Thanks Edgar From: devel [mailto:devel-boun...@lists.open-mpi.org] On Behalf Of Geoffrey Paulsen Sent: Sunday, September 16, 2018 2:31 PM To: devel@lists.open-mpi.org Subject: [OMPI devel] Announcing Open MPI v4.0.0rc1 The first release candidate for the Open MPI v4.0.0 release is posted at https://www.open-mpi.org/software/ompi/v4.0/ Major changes include: 4.0.0 -- September, 2018 ------------------------ - OSHMEM updated to the OpenSHMEM 1.4 API. - Do not build Open SHMEM layer when there are no SPMLs available. Currently, this means the Open SHMEM layer will only build if a MXM or UCX library is found. - A UCX BTL was added for enhanced MPI RMA support using UCX - With this release, OpenIB BTL now only supports iWarp and RoCE by default. - Updated internal HWLOC to 2.0.1 - Updated internal PMIx to 3.0.1 - Change the priority for selecting external verses internal HWLOC and PMIx packages to build. Starting with this release, configure by default selects available external HWLOC and PMIx packages over the internal ones. - Updated internal ROMIO to 3.2.1. - Removed support for the MXM MTL. - Improved CUDA support when using UCX. - Improved support for two phase MPI I/O operations when using OMPIO. - Added support for Software-based Performance Counters, see https://github.com/davideberius/ompi/wiki/How-to-Use-Software-Based-Performance-Counters-(SPCs)-in-Open-MPI- Various improvements to MPI RMA performance when using RDMA capable interconnects. - Update memkind component to use the memkind 1.6 public API. - Fix problems with use of newer map-by mpirun options. Thanks to Tony Reina for reporting. - Fix rank-by algorithms to properly rank by object and span - Allow for running as root of two environment variables are set. Requested by Axel Huebl. - Fix a problem with building the Java bindings when using Java 10. Thanks to Bryce Glover for reporting. Our goal is to release 4.0.0 by mid Oct, so any testing is appreciated.
_______________________________________________ devel mailing list devel@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/devel