Mattias, IIRC, OFI BTL only create one EP. If you move it to add_proc, you might need to add some checks to not re-creating EP over and over. Do you think moving EP creation from component_init to component_open will solve the problem?
Arm > On Sep 19, 2018, at 1:08 PM, Cabral, Matias A <matias.a.cab...@intel.com> > wrote: > > Hi Edgar, <> > > I also saw some similar issues, not exactly the same, but look very similar > (may be because of different version of libpsm2 ). 1 and 2 are related to the > introduction of the OFI BTL and the fact that it opens an OFI EP in its init > function. I see that all btls call the init function during transport > selection time. Moreover, this happens even when you explicitly ask for a > different one (-mca pml cm -mca mtl psm2). Workaround: -mca btl ^ofi. My > current idea is to update the OFI BTL and move the EPs opening to add_procs. > Feedback? > > Number 3 is goes beyond me. > > Thanks, > > _MAC > > <>From: devel [mailto:devel-boun...@lists.open-mpi.org] On Behalf Of > Gabriel, Edgar > Sent: Wednesday, September 19, 2018 9:25 AM > To: Open MPI Developers <devel@lists.open-mpi.org> > Subject: Re: [OMPI devel] Announcing Open MPI v4.0.0rc1 > > I performed some tests on our Omnipath cluster, and I have a mixed bag of > results with 4.0.0rc1 > > 1. Good news, the problems with the psm2 mtl that I reported in > June/July seem to be fixed. I still get however a warning every time I run a > job with 4.0.0, e.g. > > compute-1-1.local.4351PSM2 has not been initialized > compute-1-0.local.3826PSM2 has not been initialized > > although based on the performance, it is very clear that psm2 is being used. > I double checked with 3.0 series, I do not get the same warnings on the same > set of nodes. The unfortunate part about this error message is, that it > seems that applications seem to return an error (although tests and > applications seem to > finish correctly otherwise) > > -------------------------------------------------------------------------- > mpirun detected that one or more processes exited with non-zero status, thus > causing > the job to be terminated. The first process to do so was: > > Process name: [[38418,1],1] > Exit code: 255 > > ---------------------------------------------------------------------------- > > 2. The ofi mtl does not work at all on our Omnipath cluster. If I try > to force it using ‘mpirun –mca mtl ofi …’ I get the following error message. > > [compute-1-0:03988] *** An error occurred in MPI_Barrier > [compute-1-0:03988] *** reported by process [2712141825,0] > [compute-1-0:03988] *** on communicator MPI_COMM_WORLD > [compute-1-0:03988] *** MPI_ERR_OTHER: known error not in list > [compute-1-0:03988] *** MPI_ERRORS_ARE_FATAL (processes in this communicator > will now abort, > [compute-1-0:03988] *** and potentially your MPI job) > [sabine.cacds.uh.edu:21046] 1 more process has sent help message > help-mpi-errors.txt / mpi_errors_are_fatal > [sabine.cacds.uh.edu:21046] Set MCA parameter "orte_base_help_aggregate" to 0 > to see all help / error messages > > I once again double checked that this works correctly in the 3.0 (and 3.1, > although I did not run that test this time). > > 3. The openib btl component is always getting in the way with annoying > warnings. It is not really used, but constantly complains: > > > [sabine.cacds.uh.edu:25996] 1 more process has sent help message > help-mpi-btl-openib.txt / ib port not selected > [sabine.cacds.uh.edu:25996] Set MCA parameter "orte_base_help_aggregate" to 0 > to see all help / error messages > [sabine.cacds.uh.edu:25996] 1 more process has sent help message > help-mpi-btl-openib.txt / error in device init > > So bottom line, if I do > > mpirun –mca btl^openib –mca mtl^ofi …. > > my tests finish correctly, although mpirun will still return an error. > > Thanks > Edgar > > > From: devel [mailto:devel-boun...@lists.open-mpi.org > <mailto:devel-boun...@lists.open-mpi.org>] On Behalf Of Geoffrey Paulsen > Sent: Sunday, September 16, 2018 2:31 PM > To: devel@lists.open-mpi.org <mailto:devel@lists.open-mpi.org> > Subject: [OMPI devel] Announcing Open MPI v4.0.0rc1 > > The first release candidate for the Open MPI v4.0.0 release is posted at > https://www.open-mpi.org/software/ompi/v4.0/ > <https://www.open-mpi.org/software/ompi/v4.0/> > Major changes include: > > 4.0.0 -- September, 2018 > ------------------------ > > - OSHMEM updated to the OpenSHMEM 1.4 API. > - Do not build Open SHMEM layer when there are no SPMLs available. > Currently, this means the Open SHMEM layer will only build if > a MXM or UCX library is found. > - A UCX BTL was added for enhanced MPI RMA support using UCX > - With this release, OpenIB BTL now only supports iWarp and RoCE by default. > - Updated internal HWLOC to 2.0.1 > - Updated internal PMIx to 3.0.1 > - Change the priority for selecting external verses internal HWLOC > and PMIx packages to build. Starting with this release, configure > by default selects available external HWLOC and PMIx packages over > the internal ones. > - Updated internal ROMIO to 3.2.1. > - Removed support for the MXM MTL. > - Improved CUDA support when using UCX. > - Improved support for two phase MPI I/O operations when using OMPIO. > - Added support for Software-based Performance Counters, see > > https://github.com/davideberius/ompi/wiki/How-to-Use-Software-Based-Performance-Counters-(SPCs)-in-Open-MPI > > <https://github.com/davideberius/ompi/wiki/How-to-Use-Software-Based-Performance-Counters-(SPCs)-in-Open-MPI>- > Various improvements to MPI RMA performance when using RDMA > capable interconnects. > - Update memkind component to use the memkind 1.6 public API. > - Fix problems with use of newer map-by mpirun options. Thanks to > Tony Reina for reporting. > - Fix rank-by algorithms to properly rank by object and span > - Allow for running as root of two environment variables are set. > Requested by Axel Huebl. > - Fix a problem with building the Java bindings when using Java 10. > Thanks to Bryce Glover for reporting. > Our goal is to release 4.0.0 by mid Oct, so any testing is appreciated. > > > _______________________________________________ > devel mailing list > devel@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/devel
_______________________________________________ devel mailing list devel@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/devel