Package: libopenmpi3
Version: 4.1.0-2
Severity: normal

The new openmpi version seems to have introduced a new missing symbol.
It ignores the problem, but that causes problems with runtime then
failing to proceed.  Not clear to me if this should be considered an
error or a warning that needs to be worked around.

The issue appears in dolfinx python demos (e.g. the demo_poisson.py demo)
stderr gets the message:

    E               subprocess.CalledProcessError: Command 
'['/usr/bin/python3', 'demo_poisson.py']' died with <Signals.SIGABRT: 6>.
    
    /usr/lib/python3.9/subprocess.py:524: CalledProcessError
    ----------------------------- Captured stderr call 
-----------------------------
    [monte:518022] mca_base_component_repository_open: unable to open 
mca_op_avx: /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_op_avx.so: 
undefined symbol: ompi_op_base_module_t_class (ignored)
    A process has executed an operation involving a call
    to the fork() system call to create a child process.
    
    As a result, the libfabric EFA provider is operating in
    a condition that could result in memory corruption or
    other system errors.
    
    For the libfabric EFA provider to work safely when fork()
    is called, you will need to set the following environment
    variable:
              RDMAV_FORK_SAFE
    
    However, setting this environment variable can result in
    signficant performance impact to your application due to
    increased cost of memory registration.
    
    You may want to check with your application vendor to see
    if an application-level alternative (of not using fork)
    exists.
    
    Your job will now abort.




If I set environment variable RDMAV_FORK_SAFE as instructed, then the
process will run successfully.  But is this the best workaround or
should some other action be taken?

It does seem odd that mca_op_avx.so references an undefined symbol,
ompi_op_base_module_t_class.



-- System Information:
Debian Release: bullseye/sid
  APT prefers unstable
  APT policy: (500, 'unstable'), (1, 'experimental')
Architecture: amd64 (x86_64)
Foreign Architectures: i386

Kernel: Linux 5.9.0-5-amd64 (SMP w/8 CPU threads)
Kernel taint flags: TAINT_WARN
Locale: LANG=en_AU.UTF-8, LC_CTYPE=en_AU.UTF-8 (charmap=UTF-8), 
LANGUAGE=en_AU:en
Shell: /bin/sh linked to /usr/bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled

Versions of packages libopenmpi3 depends on:
ii  libc6                    2.31-6
ii  libevent-core-2.1-7      2.1.12-stable-1
ii  libevent-pthreads-2.1-7  2.1.12-stable-1
ii  libfabric1               1.11.0-2
ii  libgcc-s1                10.2.1-3
ii  libhwloc-plugins         2.4.0+dfsg-2
ii  libhwloc15               2.4.0+dfsg-2
ii  libibverbs1              32.0-1+b1
ii  libnl-3-200              3.4.0-1+b1
ii  libpmix2                 4.0.0~rc1-2
ii  libpsm-infinipath1       3.3+20.604758e7-6+b1
ii  libpsm2-2                11.2.185-1
ii  libstdc++6               10.2.1-3
ii  libucx0                  1.9.0~rc1-2
ii  zlib1g                   1:1.2.11.dfsg-2

libopenmpi3 recommends no packages.

libopenmpi3 suggests no packages.

-- no debconf information

Reply via email to