Source: openmpi
Version: openmpi-2.0.2~git.20161225-8
Severity: normal

Dear Maintainer,

In this debian automated build of mpgrafic-0.3.7.6-2 (debian
downstream version) on an s390x architecture, `make check' calls
`regression-test-0.3.7.sh', which calls mpgrafic with a standard
input file, expecting a standard output file, but instead gives this fatal error:

* lines 878-884 of the html source of
https://buildd.debian.org/status/fetch.php?pkg=mpgrafic&arch=s390x&ver=0.3.7.6-2&stamp=1484854191&raw=0

   878  This looks like a debian openmpi system.
   879  [zandonai:4650] *** An error occurred in MPI_Comm_dup
   880  [zandonai:4650] *** reported by process [4180410369,0]
   881  [zandonai:4650] *** on communicator MPI_COMM_WORLD
   882  [zandonai:4650] *** MPI_ERR_COMM: invalid communicator
   883  [zandonai:4650] *** MPI_ERRORS_ARE_FATAL (processes in this 
communicator will now abort,
   884  [zandonai:4650] ***    and potentially your MPI job)

The ppc64 and sparc64 give similar messages, though ppc64 gives a longer 
traceback:

[ookuninushi:9401] *** An error occurred in MPI_Comm_dup
[ookuninushi:9401] *** reported by process [1293680641,0]
[ookuninushi:9401] *** on communicator MPI_COMM_WORLD
[ookuninushi:9401] *** MPI_ERR_COMM: invalid communicator
[ookuninushi:9401] *** MPI_ERRORS_ARE_FATAL (processes in this communicator 
will now abort,
[ookuninushi:9401] ***    and potentially your MPI job)
[ookuninushi:09397] *** Process received signal ***
[ookuninushi:09397] Signal: Segmentation fault (11)
[ookuninushi:09397] Signal code: Address not mapped (1)
[ookuninushi:09397] Failing at address: 0x30
[ookuninushi:09397] [ 0] 
linux-vdso64.so.1(__kernel_sigtramp_rt64+0x0)[0x3fffb1250478]
[ookuninushi:09397] [ 1] 
/usr/lib/powerpc64-linux-gnu/libpmix.so.0(+0x29e54)[0x3fffad789e54]
[ookuninushi:09397] [ 2] 
/usr/lib/powerpc64-linux-gnu/libpmix.so.0(+0x29c8c)[0x3fffad789c8c]
[ookuninushi:09397] [ 3] 
/usr/lib/powerpc64-linux-gnu/libpmix.so.0(+0x2a16c)[0x3fffad78a16c]
[ookuninushi:09397] [ 4] 
/usr/lib/powerpc64-linux-gnu/libpmix.so.0(pmix_rte_finalize-0x3bfb4)[0x3fffad7f1cac]
[ookuninushi:09397] [ 5] 
/usr/lib/powerpc64-linux-gnu/libpmix.so.0(OPAL_MCA_PMIX3X_PMIx_server_finalize-0x70d00)[0x3fffad7bb298]
[ookuninushi:09397] [ 6] 
/usr/lib/powerpc64-linux-gnu/openmpi/lib/openmpi/mca_pmix_pmix3x.so(pmix3x_server_finalize-0x1bf9c)[0x3fffad853a1c]
[ookuninushi:09397] [ 7] 
/usr/lib/powerpc64-linux-gnu/libopen-rte.so.20(pmix_server_finalize-0x7fa00)[0x3fffb11ab2c0]
[ookuninushi:09397] [ 8] 
/usr/lib/powerpc64-linux-gnu/openmpi/lib/openmpi/mca_ess_hnp.so(+0x4064)[0x3fffb09d4064]
[ookuninushi:09397] [ 9] 
/usr/lib/powerpc64-linux-gnu/libopen-rte.so.20(orte_finalize-0xbf7d8)[0x3fffb11697f0]
[ookuninushi:09397] [10] mpirun[0x10001730]
[ookuninushi:09397] [11] mpirun[0x10000f38]
[ookuninushi:09397] [12] 
/lib/powerpc64-linux-gnu/libc.so.6(+0x46388)[0x3fffb0c86388]
[ookuninushi:09397] [13] 
/lib/powerpc64-linux-gnu/libc.so.6(__libc_start_main-0x187a18)[0x3fffb0c865d8]
[ookuninushi:09397] *** End of error message ***
Segmentation fault


I suspect that this comes from openmpi or fftw2, since MPI_Comm_dup is not 
called
directly from mpgrafic.

https://anonscm.debian.org/git/debian-science/packages/fftw.git/tree/mpi/transpose_mpi.c

   102       /* create a new "clone" communicator so that transpose
   103          communications do not interfere with caller communications. */
   104       MPI_Comm_dup(transpose_comm, &comm);

Openmpi has an autoconf parameter to enable dealing with endianness
bugs:

line 930 of openmpi-2.0.2~git.20161225/configure.ac is:
AC_C_BIGENDIAN

My hypothesis of where the bug might lie is that #ifdef WORDS_BIGENDIAN
is needed somewhere in ompi_comm_dup_with_info, or at least somewhere
in relation to lines 967-1025 of openmpi-2.0.2~git.20161225/ompi/communicator/comm.c

Thanks to James Clarke for help in the attempted bug trace above!

Cheers
Boud

-- System Information:
Debian Release: sid
Architecture: s390x

Reply via email to