Paul -- The __sun macro check is now in the OMPI 1.8 tree, and is in the latest nightly tarball.
If I'm following this thread right -- and I might not be! -- I think Gilles is saying that now that the __sun check is in, it should fix this -mt/-D_REENTRANT/whatever problem. Can you confirm? On Dec 16, 2014, at 1:55 PM, Paul Hargrove <phhargr...@lbl.gov> wrote: > Gilles, > > I am running mpirun on a host that ALSO will run one of the application > processes. > Requested ifconfig and netstat outputs appear below. > > -Paul > > [phargrov@pcp-j-20 ~]$ ifconfig -a > lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 > index 1 > inet 127.0.0.1 netmask ff000000 > bge0: flags=1004843<UP,BROADCAST,RUNNING,MULTICAST,DHCP,IPv4> mtu 1500 index 2 > inet 172.16.0.120 netmask ffff0000 broadcast 172.16.255.255 > pFFFF.ibp0: flags=1001000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,FIXEDMTU> > mtu 2044 index 3 > inet 172.18.0.120 netmask ffff0000 broadcast 172.18.255.255 > lo0: flags=2002000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv6,VIRTUAL> mtu 8252 > index 1 > inet6 ::1/128 > bge0: flags=20002004841<UP,RUNNING,MULTICAST,DHCP,IPv6> mtu 1500 index 2 > inet6 fe80::250:45ff:fe5c:2b0/10 > [phargrov@pcp-j-20 ~]$ netstat -nr > > Routing Table: IPv4 > Destination Gateway Flags Ref Use Interface > -------------------- -------------------- ----- ----- ---------- --------- > default 172.16.254.1 UG 2 158463 bge0 > 127.0.0.1 127.0.0.1 UH 5 398913 lo0 > 172.16.0.0 172.16.0.120 U 4 135241319 bge0 > 172.18.0.0 172.18.0.120 U 3 26 pFFFF.ibp0 > > Routing Table: IPv6 > Destination/Mask Gateway Flags Ref Use If > > --------------------------- --------------------------- ----- --- ------- > ----- > ::1 ::1 UH 2 0 lo0 > > fe80::/10 fe80::250:45ff:fe5c:2b0 U 2 0 > bge0 > > On Tue, Dec 16, 2014 at 2:55 AM, Gilles Gouaillardet > <gilles.gouaillar...@iferc.org> wrote: > Paul, > > could you please send the output of > ifconfig -a > netstat -nr > > on the three hosts you are using > (i assume you are still invoking mpirun from one node, and tasks are running > on two other nodes) > > Cheers, > > Gilles > > > On 2014/12/16 16:00, Paul Hargrove wrote: >> Gilles, >> >> I looked again carefully and I am *NOT* finding -D_REENTRANT passed to most >> compilations. >> It appears to be used for building libevent and vt, but nothing else. >> The output from configure contains >> >> checking if more special flags are required for pthreads... -D_REENTRANT >> >> only in the libevent and vt sub-configure portions. >> >> When configured for gcc on Solaris-11 I see the following in configure >> >> checking for C optimization flags... -m64 -D_REENTRANT -g >> -finline-functions -fno-strict-aliasing >> >> but with CC=cc the equivalent line is >> >> checking for C optimization flags... -m64 -g >> >> In both cases the "-m64" is from the CFLAGS I have passed to configure. >> >> However, when I use CFLAGS="-m64 -D_REENTRANT" the problem DOES NOT go away. >> I see >> >> [pcp-j-20:24740] mca_oob_tcp_accept: accept() failed: Error 0 (11). >> ------------------------------------------------------------ >> A process or daemon was unable to complete a TCP connection >> to another process: >> Local host: pcp-j-20 >> Remote host: 172.18.0.120 >> This is usually caused by a firewall on the remote host. Please >> check that any firewall (e.g., iptables) has been disabled and >> try again. >> ------------------------------------------------------------ >> >> which is at least appears to have a non-zero errno. >> A quick grep through /usr/include/sys/errno shows 11 is EAGAIN. >> >> With the oob.patch you provided the failed accept goes away, BUT the >> connection still fails: >> >> ------------------------------------------------------------ >> A process or daemon was unable to complete a TCP connection >> to another process: >> Local host: pcp-j-20 >> Remote host: 172.18.0.120 >> This is usually caused by a firewall on the remote host. Please >> check that any firewall (e.g., iptables) has been disabled and >> try again. >> ------------------------------------------------------------ >> >> >> Use of "-mca oob_tcp_if_include bge0" to use a single interface did not fix >> this. >> >> >> -Paul >> >> On Mon, Dec 15, 2014 at 7:18 PM, Paul Hargrove >> <phhargr...@lbl.gov> >> wrote: >> >>> Gilles, >>> >>> I am NOT seeing the problem with gcc. >>> It is only occurring with the Studio compilers. >>> >>> As I've already reported, I have tried adding either "-mt" or "-mt=yes" to >>> both LDFLAGS and --with-wrapper-ldflags. >>> >>> The "cc" manpage (on the Solaris-10 system I can get to right now) says: >>> >>> -mt Compile and link for multithreaded code. >>> >>> This option passes -D_REENTRANT to the preprocessor and >>> passes -lthread in the correct order to ld. >>> >>> The -mt option is required if the application or >>> libraries are multithreaded. >>> >>> To ensure proper library linking order, you must use >>> this option, rather than -lthread, to link with lib- >>> thread. >>> >>> If you are using POSIX threads, you must link with the >>> options -mt -lpthread. The -mt option is necessary >>> because libC and libCrun need libthread for a mul- >>> tithreaded application. >>> >>> If you compile and link in separate steps and you com- >>> pile with -mt, you might get unexpected results. If you >>> compile one translation unit with -mt, compile all >>> units of the program with -mt. >>> >>> I cannot connect to my Solaris-11 system right now, but I recall the text >>> to be quite similar. >>> >>> -Paul >>> >>> On Mon, Dec 15, 2014 at 7:12 PM, Gilles Gouaillardet < >>> >>> gilles.gouaillar...@iferc.org >>> > wrote: >>> >>> >>>> Paul, >>>> >>>> did you manually set -mt ? >>>> >>>> if i remember correctly, solaris 11 (at least with gcc compilers) do not >>>> need any flags >>>> (except the -D_REENTRANT that is added automatically) >>>> >>>> Cheers, >>>> >>>> Gilles >>>> >>>> >>>> On 2014/12/16 12:10, Paul Hargrove wrote: >>>> >>>> Gilles, >>>> >>>> I will try the patch when I can. >>>> However, our network is undergoing network maintenance right now, leaving >>>> me unable to reach the necessary hosts. >>>> >>>> As for -D_REENTRANT, I had already reported having verified in the "make" >>>> output that it had been added automatically. >>>> >>>> Additionally, the docs say that "-mt" *also* passes -D_REENTRANT to the >>>> preprocessor. >>>> >>>> -Paul >>>> >>>> On Mon, Dec 15, 2014 at 6:07 PM, Gilles Gouaillardet >>>> <gilles.gouaillar...@iferc.org> >>>> wrote: >>>> >>>> >>>> Paul, >>>> >>>> could you please make sure configure added "-D_REENTRANT" to the CFLAGS ? >>>> /* otherwise, errno is a global variable instead of a per thread variable, >>>> which can >>>> explains some weird behaviour. note this should have been already fixed */ >>>> >>>> assuming -D_REENTRANT is set, could you please give the attached patch a >>>> try ? >>>> >>>> i suspect the CLOSE_THE_SOCKET macro resets errno, and hence the confusing >>>> error message >>>> e.g. failed: Error 0 (0) >>>> >>>> FWIW, master is also affected. >>>> >>>> Cheers, >>>> >>>> Gilles >>>> >>>> >>>> On 2014/12/16 10:47, Paul Hargrove wrote: >>>> >>>> I have tried with a oob_tcp_if_include setting so that there is now only 1 >>>> interface. >>>> Even with just one interface and -mt=yes in both LDFLAGS and >>>> wrapper-ldflags I *still* getting messages like >>>> >>>> [pcp-j-20:11470] mca_oob_tcp_accept: accept() failed: Error 0 (0). >>>> ------------------------------ >>>> >>>> ------------------------------ >>>> A process or daemon was unable to complete a TCP connection >>>> to another process: >>>> Local host: pcp-j-20 >>>> Remote host: 172.16.0.120 >>>> This is usually caused by a firewall on the remote host. Please >>>> check that any firewall (e.g., iptables) has been disabled and >>>> try again. >>>> ------------------------------ >>>> ------------------------------ >>>> >>>> >>>> I am getting less certain that my speculation about thread-safe libs is >>>> correct. >>>> >>>> -Paul >>>> >>>> On Mon, Dec 15, 2014 at 1:24 PM, Paul Hargrove >>>> <phhargr...@lbl.gov> <phhargr...@lbl.gov> <phhargr...@lbl.gov> >>>> <phhargr...@lbl.gov> >>>> wrote: >>>> >>>> A little more reading finds that... >>>> >>>> Docs says that one needs "-mt" without the "=yes". >>>> That will work for both old and new compilers, where "-mt=yes" chokes >>>> older ones. >>>> >>>> Also, man pages say "-mt" must come before "-lpthread" in the link command. >>>> >>>> -Paul >>>> >>>> On Mon, Dec 15, 2014 at 12:52 PM, Paul Hargrove >>>> <phhargr...@lbl.gov> <phhargr...@lbl.gov> <phhargr...@lbl.gov> >>>> <phhargr...@lbl.gov> >>>> >>>> wrote: >>>> >>>> >>>> On Mon, Dec 15, 2014 at 5:35 AM, Ralph Castain >>>> <r...@open-mpi.org> <r...@open-mpi.org> <r...@open-mpi.org> >>>> <r...@open-mpi.org> >>>> wrote: >>>> >>>> 7. Linkage issue on Solaris-11 reported by Paul Hargrove. Missing the >>>> multi-threaded C libraries, apparently need "-mt=yes" in both compile and >>>> link. Need someone to investigate. >>>> >>>> >>>> The lack of multi-thread libraries is my SPECULATION. >>>> >>>> The fact that configuring with LDFLAGS=-mt=yes did not help may or may >>>> not prove anything. >>>> I didn't see them in "mpicc -show" and so maybe they needed to be in >>>> wrapper-ldflags instead. >>>> My time this week is quite limited, but I can "fire an forget" tests of >>>> any tarballs you provide. >>>> >>>> -Paul >>>> >>>> -- >>>> Paul H. Hargrove >>>> phhargr...@lbl.gov >>>> >>>> >>>> Computer Languages & Systems Software (CLaSS) Group >>>> Computer Science Department Tel: >>>> +1-510-495-2352 >>>> >>>> Lawrence Berkeley National Laboratory Fax: >>>> +1-510-486-6900 >>>> >>>> >>>> >>>> -- >>>> Paul H. Hargrove >>>> phhargr...@lbl.gov >>>> >>>> Computer Languages & Systems Software (CLaSS) Group >>>> Computer Science Department Tel: >>>> +1-510-495-2352 >>>> >>>> Lawrence Berkeley National Laboratory Fax: >>>> +1-510-486-6900 >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> devel mailing >>>> listde...@open-mpi.org >>>> >>>> Subscription: >>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>> >>>> Link to this post: >>>> http://www.open-mpi.org/community/lists/devel/2014/12/16607.php >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> devel mailing >>>> listde...@open-mpi.org >>>> >>>> Subscription: >>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>> >>>> Link to this post: >>>> http://www.open-mpi.org/community/lists/devel/2014/12/16608.php >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> devel mailing >>>> listde...@open-mpi.org >>>> >>>> Subscription: >>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>> >>>> >>>> Link to this post: >>>> http://www.open-mpi.org/community/lists/devel/2014/12/16610.php >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> devel mailing list >>>> >>>> de...@open-mpi.org >>>> >>>> Subscription: >>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>> >>>> Link to this post: >>>> >>>> http://www.open-mpi.org/community/lists/devel/2014/12/16611.php >>>> >>>> >>>> >>> >>> -- >>> Paul H. Hargrove >>> phhargr...@lbl.gov >>> >>> Computer Languages & Systems Software (CLaSS) Group >>> Computer Science Department Tel: >>> +1-510-495-2352 >>> >>> Lawrence Berkeley National Laboratory Fax: >>> +1-510-486-6900 >>> >>> >>> >> >> >> _______________________________________________ >> devel mailing list >> >> de...@open-mpi.org >> >> Subscription: >> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> >> Link to this post: >> http://www.open-mpi.org/community/lists/devel/2014/12/16613.php > > > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/12/16615.php > > > -- > Paul H. Hargrove phhargr...@lbl.gov > Computer Languages & Systems Software (CLaSS) Group > Computer Science Department Tel: +1-510-495-2352 > Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/12/16617.php -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/