Martin,

I have been using openmpi 4.0.2 on my computer system and I found a
bug that is provoked by running a job (a Go program interfaced to the
Clang MPI package) on multiple machines connected by ethernet.  This
crashes the program with the following output:

----------------------------------------------------------------------------------------------
plover:~/src/models/goconv$ mpirun -np 2 -hostfile hlist
/home/raymond/bin/goconv032 plist

orted:/usr/local/lib/pmix/mca_gds_ds21.so: undefined symbol
'pthread_mutexattr_setpshared'
ld.so: orted: lazy binding failed!
Killed
--------------------------------------------------------------------------
ORTE has lost communication with a remote daemon.

  HNP daemon   : [[62306,0],0] on node plover
  Remote daemon: [[62306,0],1] on node gryphon

This is usually due to either a failure of the TCP network
connection to the node, or possibly an internal failure of
the daemon itself. We cannot recover from this failure, and
therefore will terminate the job.
--------------------------------------------------------------------------
plover:~/src/models/goconv$
-------------------------------------------------------------------------------------

I traced this to the fact that OpenBSD's version of pthreads doesn't
have "pthread_mutexattr_setpshared".  It turns out that the
configuration file undefines a flag if this is so, but the actual code
doesn't pay any attention to this.  I fixed the problem by putting
appropriate ifdefs around the code generating the error, which itself
is simple error checking code.  This seems to work.  I have attached
two patches for the 4.0.2 source.

I'm not sure that the diffs are done quite right, but they do fix the
problem using patch < mypatch in the main directory.  (I do my patches
after yours, but I don't think that this is important as the two act
on different directories.)

Dave

PS -- The program runs at about the same speed as it does on Arch Linux.
-- 
David J. Raymond
david.raym...@nmt.edu
http://physics.nmt.edu/~raymond

Attachment: patch-opal_mca_pmix_pmix3x_pmix_src_mca_gds_gds12
Description: Binary data

Attachment: patch-opal_mca_pmix_pmix3x_pmix_src_mca_gds_gds21
Description: Binary data

Reply via email to