Ugh, apparently my one-line patch to openmpi-4.1.0/config/ompi_check_ucx.m4 wasn't sufficient on a fresh install... debugging...
On Tue, Jan 26, 2021 at 10:16 AM Tim Mattox <tmat...@gmail.com> wrote: > > My environment modules were already setting LD_LIBRARY_PATH to point > to my UCX lib directory. > > The real problem was that OMPI's config/ompi_check_ucx.m4 was > recording the full path to the UCX library if it wasn't found in a > standard system location (e.g. /lib, /lib64, /usr/lib, etc.). > That is normally a good thing to do, since the chances that the > average mpirun user will have setup their LD_LIBRARY_PATH is lower > than the software installer would have done so correctly (I hope). > In my case however, I'm setting up the environment modules to enforce > the LD_LIBRARY_PATH to have an ABI compatible UCX available. > > I found two ways to override ompi_check_ucx.m4 to just let the linker > find the UCX libraries on its own: > 1) Modify UCX's lib/pkgconfig/ucx.pc to have "prefix = /usr" so that > ompi_check_ucx.m4 would think UCX was in a standard system location > (which is a lie). > 2) Patch the OMPI configure system to allow me to force it to not > hard-code the path to UCX into the OMPI .so files such as > libmca_common_ucx.so > > I chose the latter, since there might be other users of my UCX > installs, and I didn't want to possibly break them by having > pkg_config lie. > And, I was already having to patch an OMPI config/*.m4 file for a > different reason, and was thus already having to run "./autogen.pl -f" > anyway. > So, with the patch below, I can now do an OMPI configure line with > "... --with-ucx=from_runtime_env ..." and the resulting build will use > whatever UCX is found at link time. > Normally, that would be scary, since who knows what ancient version of > UCX might be sitting around from the Linux distro, but my environment > modules are enforcing the dependencies so that my UCX libraries will > be found, even if a user does a "module swap". Hey, a cool benefit of > lmod or modules v4+, but I digress. > > Anyway, if there is some other OMPI sanctioned way to do this, please > let me know, or suggest a better way so I can upstream a patch. I > have vague memories of being able to force this kind of behavior by > doing something like "configure ... --with-ucx=/usr ...", but I > couldn't find any documentation for that, and in doing code inspection > of the m4 files revealed that such a feature (if it was an intended > feature) had bitrotted. > > --- openmpi-4.1.0/config/ompi_check_ucx.m4.orig 2021-01-25 > 18:23:17.112499399 -0600 > +++ openmpi-4.1.0/config/ompi_check_ucx.m4 2021-01-25 20:25:15.919338784 -0600 > @@ -41,7 +41,7 @@ > [ompi_check_ucx_dir=])], > [true])]) > ompi_check_ucx_happy="no" > - AS_IF([test -z "$ompi_check_ucx_dir"], > + AS_IF([test -z "$ompi_check_ucx_dir" || test > "$ompi_check_ucx_dir" = "from_runtime_env"], > [OPAL_CHECK_PACKAGE([ompi_check_ucx], > [ucp/api/ucp.h], > [ucp], > > Oddly, the Open MPI configure script already has overrides for > ucx_CFLAGS, ucx_LIBS, and ucx_STATIC_LIBS, but nothing for something > like "ucx_LDFLAGS=''". > I didn't see a simple way to add support for such an override without > some more extensive changes to multiple m4 files. > > On Sun, Jan 24, 2021 at 7:08 PM Gilles Gouaillardet via devel > <devel@lists.open-mpi.org> wrote: > > > > Tim, > > > > Have you tried using LD_LIBRARY_PATH? > > I guess "hardcoding the full path" means "link with -rpath", and IIRC, > > LD_LIBRARY_PATH > > overrides this setting. > > > > > > If this does not work, here something you can try (disclaimer: I did not) > > > > export LD_LIBRARY_PATH=/same/install/prefix/ucx/1.9.0/lib > > configure ... --with-ucx > > CPPFLAGS=-I/same/install/prefix/ucx/1.9.0/include > > LDFLAGS=-L/same/install/prefix/ucx/1.9.0/lib > > > > I expect the UCX components use libuct.so instead of > > /same/install/prefix/ucx/1.9.0/lib/libuct.so. > > If your users want the debug version, then you can simply change your > > LD_LIBRARY_PATH > > (module swap ucx should do the trick) > > > > Three caveats you should keep in mind: > > - it is your responsibility to ensure the debug and prod versions of > > UCX are ABI compatible > > - it will be mandatory to load a ucx module (otherwise Open MPI won't > > find UCX libraries) > > - this is a guess and I did not test this. > > > > > > An other option (I did not try) would be to install UCX on your build > > machine in /usr > > (since I expect /usr/lib/libuct.so is not hardcoded) and then use > > LD_LIBRARY_PATH > > (I assume your ucx module set it) to point to the UCX flavor of your > > choice). > > > > Cheers, > > > > Gilles > > > > On Mon, Jan 25, 2021 at 7:43 AM Tim Mattox via devel > > <devel@lists.open-mpi.org> wrote: > > > > > > I'm specifically wanting my users to be able to load a "debug" vs. > > > "tuned" UCX module, without me having to make two different Open MPI > > > installs... the combinatorics get bad after a few versions.... (I'm > > > already having multiple versions of Open MPI to handle the differences > > > in Fortran mpi mod files for various compilers.) > > > Here are the differences in the configure options between the two UCX > > > modules: > > > debug version: --enable-logging --enable-debug --enable-assertions > > > --enable-params-check --prefix=/same/install/prefix/ucx/1.9.0/debug > > > tuned version: --disable-logging --disable-debug --disable-assertions > > > --disable-params-check --prefix=/same/install/prefix/ucx/1.9.0/tuned > > > > > > We noticed that the --enable-debug option for UCX has a pretty > > > dramatic performance hit for one application (so far). > > > I've already tested that everything works fine if I replace UCX's .so > > > files manually in the filesystem, and the "new/changed" ones get > > > loaded, but a user can't make that kind of swap. > > > My hope is a user could type "module swap ucx/1.9.0/tuned > > > ucx/1.9.0/debug" when they want to enable debugging at the UCX layer. > > > > > > On Sun, Jan 24, 2021 at 4:43 PM Yossi Itigin <yos...@nvidia.com> wrote: > > > > > > > > Hi, > > > > > > > > One option is to use LD_PRELOAD to load all ucx libraries from a > > > > specific location > > > > For example: mpirun -x > > > > LD_PRELOAD=<path-to-libucp.so>:<path-to-libuct.so>:<path-to-libucs.so>:<path-to-libucm.so> > > > > ... <exe> <args> > > > > > > > > BTW, what is different about the other UCX configuration? Maybe this is > > > > something which can be resolved another way. > > > > > > > > --Yossi > > > > > > > > -----Original Message----- > > > > From: devel <devel-boun...@lists.open-mpi.org> On Behalf Of Tim Mattox > > > > via devel > > > > Sent: Sunday, 24 January 2021 23:18 > > > > To: devel@lists.open-mpi.org > > > > Cc: Tim Mattox <tmat...@gmail.com> > > > > Subject: [OMPI devel] How to build Open MPI so the UCX used can be > > > > changed at runtime? > > > > > > > > Hello, > > > > I've run into an application that has its performance dramatically > > > > affected by some configuration options to the underlying UCX library. > > > > Is there a way to configure/build Open MPI so that which UCX library is > > > > used is determined at runtime (e.g. by an environment module), rather > > > > than having to configure/build different instances of Open MPI? > > > > > > > > When I configure Open MPI 4.1.0 with "--with-ucx" it is hardcoding the > > > > full path to the UCX .so library files to the UCX version it found at > > > > configure time. > > > > -- > > > > Tim Mattox, Ph.D. - tmat...@gmail.com > > > > > > > > > > > > -- > > > Tim Mattox, Ph.D. - tmat...@gmail.com > > > > -- > Tim Mattox, Ph.D. - tmat...@gmail.com -- Tim Mattox, Ph.D. - tmat...@gmail.com