Ugh, apparently my one-line patch to
openmpi-4.1.0/config/ompi_check_ucx.m4 wasn't sufficient on a fresh
install... debugging...

On Tue, Jan 26, 2021 at 10:16 AM Tim Mattox <tmat...@gmail.com> wrote:
>
> My environment modules were already setting LD_LIBRARY_PATH to point
> to my UCX lib directory.
>
> The real problem was that OMPI's config/ompi_check_ucx.m4 was
> recording the full path to the UCX library if it wasn't found in a
> standard system location (e.g. /lib, /lib64, /usr/lib, etc.).
> That is normally a good thing to do, since the chances that the
> average mpirun user will have setup their LD_LIBRARY_PATH is lower
> than the software installer would have done so correctly (I hope).
> In my case however, I'm setting up the environment modules to enforce
> the LD_LIBRARY_PATH to have an ABI compatible UCX available.
>
> I found two ways to override ompi_check_ucx.m4 to just let the linker
> find the UCX libraries on its own:
> 1) Modify UCX's lib/pkgconfig/ucx.pc to have "prefix = /usr" so that
> ompi_check_ucx.m4 would think UCX was in a standard system location
> (which is a lie).
> 2) Patch the OMPI configure system to allow me to force it to not
> hard-code the path to UCX into the OMPI .so files such as
> libmca_common_ucx.so
>
> I chose the latter, since there might be other users of my UCX
> installs, and I didn't want to possibly break them by having
> pkg_config lie.
> And, I was already having to patch an OMPI config/*.m4 file for a
> different reason, and was thus already having to run "./autogen.pl -f"
> anyway.
> So, with the patch below, I can now do an OMPI configure line with
> "... --with-ucx=from_runtime_env ..." and the resulting build will use
> whatever UCX is found at link time.
> Normally, that would be scary, since who knows what ancient version of
> UCX might be sitting around from the Linux distro, but my environment
> modules are enforcing the dependencies so that my UCX libraries will
> be found, even if a user does a "module swap".  Hey, a cool benefit of
> lmod or modules v4+, but I digress.
>
> Anyway, if there is some other OMPI sanctioned way to do this, please
> let me know, or suggest a better way so I can upstream a patch.  I
> have vague memories of being able to force this kind of behavior by
> doing  something like "configure ... --with-ucx=/usr ...", but I
> couldn't find any documentation for that, and in doing code inspection
> of the m4 files revealed that such a feature (if it was an intended
> feature) had bitrotted.
>
> --- openmpi-4.1.0/config/ompi_check_ucx.m4.orig 2021-01-25
> 18:23:17.112499399 -0600
> +++ openmpi-4.1.0/config/ompi_check_ucx.m4 2021-01-25 20:25:15.919338784 -0600
> @@ -41,7 +41,7 @@
>                                                      [ompi_check_ucx_dir=])],
>                                               [true])])
>                    ompi_check_ucx_happy="no"
> -                  AS_IF([test -z "$ompi_check_ucx_dir"],
> +                  AS_IF([test -z "$ompi_check_ucx_dir" || test
> "$ompi_check_ucx_dir" = "from_runtime_env"],
>                          [OPAL_CHECK_PACKAGE([ompi_check_ucx],
>                                     [ucp/api/ucp.h],
>                                     [ucp],
>
> Oddly, the Open MPI configure script already has overrides for
> ucx_CFLAGS, ucx_LIBS, and ucx_STATIC_LIBS, but nothing for something
> like "ucx_LDFLAGS=''".
> I didn't see a simple way to add support for such an override without
> some more extensive changes to multiple m4 files.
>
> On Sun, Jan 24, 2021 at 7:08 PM Gilles Gouaillardet via devel
> <devel@lists.open-mpi.org> wrote:
> >
> > Tim,
> >
> > Have you tried using LD_LIBRARY_PATH?
> > I guess "hardcoding the full path" means "link with -rpath", and IIRC,
> > LD_LIBRARY_PATH
> > overrides this setting.
> >
> >
> > If this does not work, here something you can try (disclaimer: I did not)
> >
> > export LD_LIBRARY_PATH=/same/install/prefix/ucx/1.9.0/lib
> > configure ... --with-ucx
> > CPPFLAGS=-I/same/install/prefix/ucx/1.9.0/include
> > LDFLAGS=-L/same/install/prefix/ucx/1.9.0/lib
> >
> > I expect the UCX components use libuct.so instead of
> > /same/install/prefix/ucx/1.9.0/lib/libuct.so.
> > If your users want the debug version, then you can simply change your
> > LD_LIBRARY_PATH
> > (module swap ucx should do the trick)
> >
> > Three caveats you should keep in mind:
> >  - it is your responsibility to ensure the debug and prod versions of
> > UCX are ABI compatible
> >  - it will be mandatory to load a ucx module (otherwise Open MPI won't
> > find UCX libraries)
> >  - this is a guess and I did not test this.
> >
> >
> > An other option (I did not try) would be to install UCX on your build
> > machine in /usr
> > (since I expect /usr/lib/libuct.so is not hardcoded) and then use
> > LD_LIBRARY_PATH
> > (I assume your ucx module set it) to point to the UCX flavor of your 
> > choice).
> >
> > Cheers,
> >
> > Gilles
> >
> > On Mon, Jan 25, 2021 at 7:43 AM Tim Mattox via devel
> > <devel@lists.open-mpi.org> wrote:
> > >
> > > I'm specifically wanting my users to be able to load a "debug" vs.
> > > "tuned" UCX module, without me having to make two different Open MPI
> > > installs... the combinatorics get bad after a few versions.... (I'm
> > > already having multiple versions of Open MPI to handle the differences
> > > in Fortran mpi mod files for various compilers.)
> > > Here are the differences in the configure options between the two UCX 
> > > modules:
> > > debug version: --enable-logging --enable-debug --enable-assertions
> > > --enable-params-check --prefix=/same/install/prefix/ucx/1.9.0/debug
> > > tuned version: --disable-logging --disable-debug --disable-assertions
> > > --disable-params-check --prefix=/same/install/prefix/ucx/1.9.0/tuned
> > >
> > > We noticed that the --enable-debug option for UCX has a pretty
> > > dramatic performance hit for one application (so far).
> > > I've already tested that everything works fine if I replace UCX's .so
> > > files manually in the filesystem, and the "new/changed" ones get
> > > loaded, but a user can't make that kind of swap.
> > > My hope is a user could type "module swap ucx/1.9.0/tuned
> > > ucx/1.9.0/debug" when they want to enable debugging at the UCX layer.
> > >
> > > On Sun, Jan 24, 2021 at 4:43 PM Yossi Itigin <yos...@nvidia.com> wrote:
> > > >
> > > > Hi,
> > > >
> > > > One option is to use LD_PRELOAD to load all ucx libraries from a 
> > > > specific location
> > > > For example: mpirun -x 
> > > > LD_PRELOAD=<path-to-libucp.so>:<path-to-libuct.so>:<path-to-libucs.so>:<path-to-libucm.so>
> > > >  ... <exe> <args>
> > > >
> > > > BTW, what is different about the other UCX configuration? Maybe this is 
> > > > something which can be resolved another way.
> > > >
> > > > --Yossi
> > > >
> > > > -----Original Message-----
> > > > From: devel <devel-boun...@lists.open-mpi.org> On Behalf Of Tim Mattox 
> > > > via devel
> > > > Sent: Sunday, 24 January 2021 23:18
> > > > To: devel@lists.open-mpi.org
> > > > Cc: Tim Mattox <tmat...@gmail.com>
> > > > Subject: [OMPI devel] How to build Open MPI so the UCX used can be 
> > > > changed at runtime?
> > > >
> > > > Hello,
> > > > I've run into an application that has its performance dramatically 
> > > > affected by some configuration options to the underlying UCX library.
> > > > Is there a way to configure/build Open MPI so that which UCX library is 
> > > > used is determined at runtime (e.g. by an environment module), rather 
> > > > than having to configure/build different instances of Open MPI?
> > > >
> > > > When I configure Open MPI 4.1.0 with "--with-ucx" it is hardcoding the 
> > > > full path to the UCX .so library files to the UCX version it found at 
> > > > configure time.
> > > > --
> > > > Tim Mattox, Ph.D. - tmat...@gmail.com
> > >
> > >
> > >
> > > --
> > > Tim Mattox, Ph.D. - tmat...@gmail.com
>
>
>
> --
> Tim Mattox, Ph.D. - tmat...@gmail.com



-- 
Tim Mattox, Ph.D. - tmat...@gmail.com

Reply via email to