On Thu, Aug 02, 2007 at 10:51:13AM +0200, Adrian Knoth wrote: > > We (as in the Debian maintainer for Open MPI) got this bug report from > > Uwe who sees mpi apps segfault on Debian systems with the FreeBSD > > kernel. > > Any input would be greatly appreciated! > I'll follow the QEMU instructions on your website and investigate on > my own ;)
I was able to get OMPI running on kfreebsd-amd64. I used a nightly snapshot from the trunk, so the problem is "more or less fixed by upstream" ;) adi@debian:~$ ./ompi/bin/mpirun -np 2 ring 0: sending message (0) to 1 0: sent message 1: waiting for message 1: got message (1) from 0, sending to 0 0: got message (1) from 1 adi@debian:~$ ./ompi/bin/ompi_info Open MPI: 1.3a1r15820 Open MPI SVN revision: r15820 Open RTE: 1.3a1r15820 Open RTE SVN revision: r15820 OPAL: 1.3a1r15820 OPAL SVN revision: r15820 Prefix: /home/adi/ompi Configured architecture: x86_64-unknown-kfreebsd6.2-gnu I'll now compile the 1.2.3 release tarball and see if I can reproduce the segfaults. On the other hand, I guess nobody is using OMPI on GNU/kFreeBSD, so upgrading the openmpi-package to a subversion snapshot would also fix the problem (think of "fixed in experimental"). JFTR: It's currently not possible to compile OMPI on amd64 (out of the box). Though it compiles on i386 http://experimental.debian.net/fetch.php?&pkg=openmpi&ver=1.2.3-3&arch=kfreebsd-i386&stamp=1187000200&file=log&as=raw it fails on amd64: http://experimental.debian.net/fetch.php?&pkg=openmpi&ver=1.2.3-3&arch=kfreebsd-amd64&stamp=1186969782&file=log&as=raw stacktrace.c: In function 'opal_show_stackframe': stacktrace.c:145: error: 'FPE_FLTDIV' undeclared (first use in this function) stacktrace.c:145: error: (Each undeclared identifier is reported only once stacktrace.c:145: error: for each function it appears in.) stacktrace.c:146: error: 'FPE_FLTOVF' undeclared (first use in this function) stacktrace.c:147: error: 'FPE_FLTUND' undeclared (first use in this function) make[4]: *** [stacktrace.lo] Error 1 make[4]: Leaving directory `/build/buildd/openmpi-1.2.3/opal/util' This is caused by libc0.1-dev in /usr/include/bits/sigcontext.h, the relevant #define's are placed in an #ifdef __i386__ condition. After extending this for __x86_64__, everything works fine. Should I file a bugreport against libc0.1-dev or will you take care? I'll keep you posted... -- Cluster and Metacomputing Working Group Friedrich-Schiller-Universität Jena, Germany private: http://adi.thur.de