Your message dated Thu, 13 May 2021 14:33:30 +0000
with message-id <[email protected]>
and subject line Bug#984956: fixed in openmpi 4.1.0-9
has caused the Debian Bug report #984956,
regarding openmpi-bin: with mpirun --host <remote>: orte crashes with
FORCE-TERMINATE [...] plm_base_launch_support.c
to be marked as done.
This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.
(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact [email protected]
immediately.)
--
984956: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=984956
Debian Bug Tracking System
Contact [email protected] with problems
--- Begin Message ---
Package: openmpi-bin
Version: 4.1.0-7
Severity: normal
X-Debbugs-Cc: [email protected]
Dear Maintainer,
mpirun crashes when trying to schedule a task on a foreign host:
$ mpirun --host bob hostname
[alice:705956] [[31919,0],0] ORTE_ERROR_LOG: Error in file
../../../../../orte/mca/odls/base/odls_base_default_fns.c at line 226
[alice:705956] [[31919,0],0] ORTE_ERROR_LOG: Error in file
../../../../../orte/mca/plm/base/plm_base_launch_support.c at line 552
--------------------------------------------------------------------------
An internal error has occurred in ORTE:
[[31919,0],0] FORCE-TERMINATE AT (null):1 - error
../../../../../orte/mca/plm/base/plm_base_launch_support.c(553)
This is something that should be reported to the developers.
--------------------------------------------------------------------------
Here, the mpirun command was issued on computer "alice" and "bob" is a foreign
host reachable via ssh.
Steps to reproduce:
===================
I originally encountered this issue on a small cluster (that I am currently
setting up). But, I was able to reproduce this locally by setting up two lxc
containers. Thus, the following should work to reproduce the issue:
- use two debian computers with a local user that can ssh (via pubkey) from
one machine to another
- make sure that no firewall drops packets between the two.
- install openmpi-bin and run
# mpirun --host <remote host ip> hostname
What was the outcome of this action?
====================================
An internal error in ORTE terminated mpirun.
What outcome did you expect instead?
====================================
mpirun --host bob should print "bob" and succeed (or complain loudly that I am
using it wrongly).
-- System Information:
Debian Release: bullseye/sid
APT prefers testing
APT policy: (500, 'testing')
Architecture: amd64 (x86_64)
Kernel: Linux 5.10.0-3-amd64 (SMP w/64 CPU threads)
Kernel taint flags: TAINT_PROPRIETARY_MODULE, TAINT_OOT_MODULE,
TAINT_UNSIGNED_MODULE
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), LANGUAGE not set
Shell: /bin/sh linked to /usr/bin/dash
Init: systemd (via /run/systemd/system)
Versions of packages openmpi-bin depends on:
ii libc6 2.31-9
ii libevent-core-2.1-7 2.1.12-stable-1
ii libopenmpi3 4.1.0-7
ii openmpi-common 4.1.0-7
ii openssh-client [ssh-client] 1:8.4p1-4
openmpi-bin recommends no packages.
Versions of packages openmpi-bin suggests:
ii gfortran [fortran-compiler] 4:10.2.1-1
-- no debconf information
--- End Message ---
--- Begin Message ---
Source: openmpi
Source-Version: 4.1.0-9
Done: Alastair McKinstry <[email protected]>
We believe that the bug you reported is fixed in the latest version of
openmpi, which is due to be installed in the Debian FTP archive.
A summary of the changes between this version and the previous one is
attached.
Thank you for reporting the bug, which will now be closed. If you
have further comments please address them to [email protected],
and the maintainer will reopen the bug report if appropriate.
Debian distribution maintenance software
pp.
Alastair McKinstry <[email protected]> (supplier of updated openmpi package)
(This message was generated automatically at their request; if you
believe that there is a problem with it please contact the archive
administrators by mailing [email protected])
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256
Format: 1.8
Date: Thu, 13 May 2021 14:01:44 +0100
Source: openmpi
Architecture: source
Version: 4.1.0-9
Distribution: unstable
Urgency: medium
Maintainer: Alastair McKinstry <[email protected]>
Changed-By: Alastair McKinstry <[email protected]>
Closes: 984956 987261
Changes:
openmpi (4.1.0-9) unstable; urgency=medium
.
* Pull fix for Memory leak in MPI_Allreduce when using a repeatedly created
and freed MPI_Datatype. Closes: #987261
* Workaround for mpirun crashes when trying to schedule a task on
a foreign host. Closes: #984956
Checksums-Sha1:
008a33fc5547e74b63100954b776997757a11e48 2670 openmpi_4.1.0-9.dsc
17f5e7488a4acb004e0ac0e56ec26407af13cc4e 69304 openmpi_4.1.0-9.debian.tar.xz
Checksums-Sha256:
93db2dc295b1dbd09b6d795aad6c4fad6ce82de996e104885c354578c0bf9695 2670
openmpi_4.1.0-9.dsc
80e34f5b19dc86c717918bb801ed3e8df7bbcc59324eecb552521d4f12ab96e7 69304
openmpi_4.1.0-9.debian.tar.xz
Files:
3d7e6a57c1a31d051949fb29a2f0b0e0 2670 net optional openmpi_4.1.0-9.dsc
ed508068c6a475acdf88358ed4ea002e 69304 net optional
openmpi_4.1.0-9.debian.tar.xz
-----BEGIN PGP SIGNATURE-----
iQIzBAEBCAAdFiEEgjg86RZbNHx4cIGiy+a7Tl2a06UFAmCdM5gACgkQy+a7Tl2a
06UZ2A/+IzUCzKN48qgm5M6CRpvHcaM3G5qseTiw+3h/VTuXoxid/BGGTH9txX2e
l+cOfrXNIjsbzc78CRIZC6DpKjkPjYPWqriNzv47PtSwe2wT0GxackaELkAsAlsf
SPFfHX6u7tS8wfFtFsSC+korRbRFFnC/6Fz7EjkNFVAcIMZWVYbmYx5SwnGzPpxs
UslBDq9bIrHIYgP6V1zpwjG93qU1g0u6YEHrWDWelOPJuK+PFZ3phRqNzvxlIrXl
dytq6s7tjkYdga8DsaZOaBHQKsm0xPh1r10l+nCMrdwz0WvVY6ZJmPIVQQys0zDq
4+uudGit5iTyQn/ozDVOTf8B++wq0XMaGd3yD3XMtlgnW+poG+X8sIYtE3jdMjy9
RFz8XAmPGYO+7kpY1TeaGeaMzgvso9SwaOL5xUDoaqHmt3LNfNORNs6DufMb0/zS
APrQ4QzK37sL2TkozJcHqC4IvTLpayVHSEehsnNstOltgGYJqXCKUF94QOXaHfDQ
NOIVWn4Be36We5bo79G42crnINMVGRWav2VNCw5d1wcL0Bw4Z44Rxi9AZ/QHdwzO
KRfIqWcaUWzspcuaVNZT8aVEcZw8T5G8y/aj6nNkD27vAswdVpvwADi96bWASXbN
zAKoEqrC/Q3bXg8Uznu1c2hvB1cDRdx6ItmdrriGwHpoXmKrlVU=
=6FIj
-----END PGP SIGNATURE-----
--- End Message ---