Gilles, I looked again carefully and I am *NOT* finding -D_REENTRANT passed to most compilations. It appears to be used for building libevent and vt, but nothing else. The output from configure contains
checking if more special flags are required for pthreads... -D_REENTRANT only in the libevent and vt sub-configure portions. When configured for gcc on Solaris-11 I see the following in configure checking for C optimization flags... -m64 -D_REENTRANT -g -finline-functions -fno-strict-aliasing but with CC=cc the equivalent line is checking for C optimization flags... -m64 -g In both cases the "-m64" is from the CFLAGS I have passed to configure. However, when I use CFLAGS="-m64 -D_REENTRANT" the problem DOES NOT go away. I see [pcp-j-20:24740] mca_oob_tcp_accept: accept() failed: Error 0 (11). ------------------------------------------------------------ A process or daemon was unable to complete a TCP connection to another process: Local host: pcp-j-20 Remote host: 172.18.0.120 This is usually caused by a firewall on the remote host. Please check that any firewall (e.g., iptables) has been disabled and try again. ------------------------------------------------------------ which is at least appears to have a non-zero errno. A quick grep through /usr/include/sys/errno shows 11 is EAGAIN. With the oob.patch you provided the failed accept goes away, BUT the connection still fails: ------------------------------------------------------------ A process or daemon was unable to complete a TCP connection to another process: Local host: pcp-j-20 Remote host: 172.18.0.120 This is usually caused by a firewall on the remote host. Please check that any firewall (e.g., iptables) has been disabled and try again. ------------------------------------------------------------ Use of "-mca oob_tcp_if_include bge0" to use a single interface did not fix this. -Paul On Mon, Dec 15, 2014 at 7:18 PM, Paul Hargrove <phhargr...@lbl.gov> wrote: > > Gilles, > > I am NOT seeing the problem with gcc. > It is only occurring with the Studio compilers. > > As I've already reported, I have tried adding either "-mt" or "-mt=yes" to > both LDFLAGS and --with-wrapper-ldflags. > > The "cc" manpage (on the Solaris-10 system I can get to right now) says: > > -mt Compile and link for multithreaded code. > > This option passes -D_REENTRANT to the preprocessor and > passes -lthread in the correct order to ld. > > The -mt option is required if the application or > libraries are multithreaded. > > To ensure proper library linking order, you must use > this option, rather than -lthread, to link with lib- > thread. > > If you are using POSIX threads, you must link with the > options -mt -lpthread. The -mt option is necessary > because libC and libCrun need libthread for a mul- > tithreaded application. > > If you compile and link in separate steps and you com- > pile with -mt, you might get unexpected results. If you > compile one translation unit with -mt, compile all > units of the program with -mt. > > I cannot connect to my Solaris-11 system right now, but I recall the text > to be quite similar. > > -Paul > > On Mon, Dec 15, 2014 at 7:12 PM, Gilles Gouaillardet < > gilles.gouaillar...@iferc.org> wrote: > >> Paul, >> >> did you manually set -mt ? >> >> if i remember correctly, solaris 11 (at least with gcc compilers) do not >> need any flags >> (except the -D_REENTRANT that is added automatically) >> >> Cheers, >> >> Gilles >> >> >> On 2014/12/16 12:10, Paul Hargrove wrote: >> >> Gilles, >> >> I will try the patch when I can. >> However, our network is undergoing network maintenance right now, leaving >> me unable to reach the necessary hosts. >> >> As for -D_REENTRANT, I had already reported having verified in the "make" >> output that it had been added automatically. >> >> Additionally, the docs say that "-mt" *also* passes -D_REENTRANT to the >> preprocessor. >> >> -Paul >> >> On Mon, Dec 15, 2014 at 6:07 PM, Gilles Gouaillardet >> <gilles.gouaillar...@iferc.org> wrote: >> >> >> Paul, >> >> could you please make sure configure added "-D_REENTRANT" to the CFLAGS ? >> /* otherwise, errno is a global variable instead of a per thread variable, >> which can >> explains some weird behaviour. note this should have been already fixed */ >> >> assuming -D_REENTRANT is set, could you please give the attached patch a >> try ? >> >> i suspect the CLOSE_THE_SOCKET macro resets errno, and hence the confusing >> error message >> e.g. failed: Error 0 (0) >> >> FWIW, master is also affected. >> >> Cheers, >> >> Gilles >> >> >> On 2014/12/16 10:47, Paul Hargrove wrote: >> >> I have tried with a oob_tcp_if_include setting so that there is now only 1 >> interface. >> Even with just one interface and -mt=yes in both LDFLAGS and >> wrapper-ldflags I *still* getting messages like >> >> [pcp-j-20:11470] mca_oob_tcp_accept: accept() failed: Error 0 (0). >> ------------------------------------------------------------ >> A process or daemon was unable to complete a TCP connection >> to another process: >> Local host: pcp-j-20 >> Remote host: 172.16.0.120 >> This is usually caused by a firewall on the remote host. Please >> check that any firewall (e.g., iptables) has been disabled and >> try again. >> ------------------------------ >> ------------------------------ >> >> >> I am getting less certain that my speculation about thread-safe libs is >> correct. >> >> -Paul >> >> On Mon, Dec 15, 2014 at 1:24 PM, Paul Hargrove <phhargr...@lbl.gov> >> <phhargr...@lbl.gov> <phhargr...@lbl.gov> <phhargr...@lbl.gov> wrote: >> >> A little more reading finds that... >> >> Docs says that one needs "-mt" without the "=yes". >> That will work for both old and new compilers, where "-mt=yes" chokes >> older ones. >> >> Also, man pages say "-mt" must come before "-lpthread" in the link command. >> >> -Paul >> >> On Mon, Dec 15, 2014 at 12:52 PM, Paul Hargrove <phhargr...@lbl.gov> >> <phhargr...@lbl.gov> <phhargr...@lbl.gov> <phhargr...@lbl.gov> >> wrote: >> >> >> On Mon, Dec 15, 2014 at 5:35 AM, Ralph Castain <r...@open-mpi.org> >> <r...@open-mpi.org> <r...@open-mpi.org> <r...@open-mpi.org> wrote: >> >> 7. Linkage issue on Solaris-11 reported by Paul Hargrove. Missing the >> multi-threaded C libraries, apparently need "-mt=yes" in both compile and >> link. Need someone to investigate. >> >> >> The lack of multi-thread libraries is my SPECULATION. >> >> The fact that configuring with LDFLAGS=-mt=yes did not help may or may >> not prove anything. >> I didn't see them in "mpicc -show" and so maybe they needed to be in >> wrapper-ldflags instead. >> My time this week is quite limited, but I can "fire an forget" tests of >> any tarballs you provide. >> >> -Paul >> >> -- >> Paul H. Hargrove phhargr...@lbl.gov >> >> Computer Languages & Systems Software (CLaSS) Group >> Computer Science Department Tel: +1-510-495-2352 >> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 >> >> >> -- >> Paul H. Hargrove phhargr...@lbl.gov >> Computer Languages & Systems Software (CLaSS) Group >> Computer Science Department Tel: +1-510-495-2352 >> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 >> >> >> >> _______________________________________________ >> devel mailing listde...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> Link to this post: >> http://www.open-mpi.org/community/lists/devel/2014/12/16607.php >> >> >> >> _______________________________________________ >> devel mailing listde...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> Link to this >> post:http://www.open-mpi.org/community/lists/devel/2014/12/16608.php >> >> >> >> _______________________________________________ >> devel mailing listde...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> >> Link to this post: >> http://www.open-mpi.org/community/lists/devel/2014/12/16610.php >> >> >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> Link to this post: >> http://www.open-mpi.org/community/lists/devel/2014/12/16611.php >> > > > -- > Paul H. Hargrove phhargr...@lbl.gov > Computer Languages & Systems Software (CLaSS) Group > Computer Science Department Tel: +1-510-495-2352 > Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 > -- Paul H. Hargrove phhargr...@lbl.gov Computer Languages & Systems Software (CLaSS) Group Computer Science Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900