Martin, I have been using openmpi 4.0.2 on my computer system and I found a bug that is provoked by running a job (a Go program interfaced to the Clang MPI package) on multiple machines connected by ethernet. This crashes the program with the following output:
---------------------------------------------------------------------------------------------- plover:~/src/models/goconv$ mpirun -np 2 -hostfile hlist /home/raymond/bin/goconv032 plist orted:/usr/local/lib/pmix/mca_gds_ds21.so: undefined symbol 'pthread_mutexattr_setpshared' ld.so: orted: lazy binding failed! Killed -------------------------------------------------------------------------- ORTE has lost communication with a remote daemon. HNP daemon : [[62306,0],0] on node plover Remote daemon: [[62306,0],1] on node gryphon This is usually due to either a failure of the TCP network connection to the node, or possibly an internal failure of the daemon itself. We cannot recover from this failure, and therefore will terminate the job. -------------------------------------------------------------------------- plover:~/src/models/goconv$ ------------------------------------------------------------------------------------- I traced this to the fact that OpenBSD's version of pthreads doesn't have "pthread_mutexattr_setpshared". It turns out that the configuration file undefines a flag if this is so, but the actual code doesn't pay any attention to this. I fixed the problem by putting appropriate ifdefs around the code generating the error, which itself is simple error checking code. This seems to work. I have attached two patches for the 4.0.2 source. I'm not sure that the diffs are done quite right, but they do fix the problem using patch < mypatch in the main directory. (I do my patches after yours, but I don't think that this is important as the two act on different directories.) Dave PS -- The program runs at about the same speed as it does on Arch Linux. -- David J. Raymond david.raym...@nmt.edu http://physics.nmt.edu/~raymond
patch-opal_mca_pmix_pmix3x_pmix_src_mca_gds_gds12
Description: Binary data
patch-opal_mca_pmix_pmix3x_pmix_src_mca_gds_gds21
Description: Binary data