Paul,

the root cause -D_REENTRANT is not set automatically is we test the
__sun__ macro and 12.4 compiler defines only __sun and sun

I will make a fix for that ...

Cheers,

Gilles

On 2014/12/16 16:00, Paul Hargrove wrote:
> Gilles,
>
> I looked again carefully and I am *NOT* finding -D_REENTRANT passed to most
> compilations.
> It appears to be used for building libevent and vt, but nothing else.
> The output from configure contains
>
> checking if more special flags are required for pthreads... -D_REENTRANT
>
> only in the libevent and vt sub-configure portions.
>
> When configured for gcc on Solaris-11 I see the following in configure
>
> checking for C optimization flags... -m64 -D_REENTRANT -g
> -finline-functions -fno-strict-aliasing
>
> but with CC=cc the equivalent line is
>
> checking for C optimization flags... -m64 -g
>
> In both cases the "-m64" is from the CFLAGS I have passed to configure.
>
> However, when I use CFLAGS="-m64 -D_REENTRANT" the problem DOES NOT go away.
> I see
>
> [pcp-j-20:24740] mca_oob_tcp_accept: accept() failed: Error 0 (11).
> ------------------------------------------------------------
> A process or daemon was unable to complete a TCP connection
> to another process:
>   Local host:    pcp-j-20
>   Remote host:   172.18.0.120
> This is usually caused by a firewall on the remote host. Please
> check that any firewall (e.g., iptables) has been disabled and
> try again.
> ------------------------------------------------------------
>
> which is at least appears to have a non-zero errno.
> A quick grep through /usr/include/sys/errno shows 11 is EAGAIN.
>
> With the oob.patch you provided the failed accept goes away, BUT the
> connection still fails:
>
> ------------------------------------------------------------
> A process or daemon was unable to complete a TCP connection
> to another process:
>   Local host:    pcp-j-20
>   Remote host:   172.18.0.120
> This is usually caused by a firewall on the remote host. Please
> check that any firewall (e.g., iptables) has been disabled and
> try again.
> ------------------------------------------------------------
>
>
> Use of "-mca oob_tcp_if_include bge0" to use a single interface did not fix
> this.
>
>
> -Paul
>
> On Mon, Dec 15, 2014 at 7:18 PM, Paul Hargrove <phhargr...@lbl.gov> wrote:
>> Gilles,
>>
>> I am NOT seeing the problem with gcc.
>> It is only occurring with the Studio compilers.
>>
>> As I've already reported, I have tried adding either "-mt" or "-mt=yes" to
>> both LDFLAGS and --with-wrapper-ldflags.
>>
>> The "cc" manpage (on the Solaris-10 system I can get to right now) says:
>>
>>      -mt  Compile and link for multithreaded code.
>>
>>           This option passes -D_REENTRANT to the preprocessor and
>>           passes -lthread in the correct order to ld.
>>
>>           The -mt option is required if the application or
>>           libraries are multithreaded.
>>
>>           To ensure proper library linking order, you must use
>>           this option, rather than -lthread, to link with lib-
>>           thread.
>>
>>           If you are using POSIX threads, you must link with the
>>           options -mt -lpthread.  The -mt option is necessary
>>           because libC and libCrun need libthread for a mul-
>>           tithreaded application.
>>
>>           If you compile and link in separate steps and you com-
>>           pile with -mt, you might get unexpected results. If you
>>           compile one translation unit with -mt, compile all
>>           units of the program with -mt.
>>
>> I cannot connect to my Solaris-11 system right now, but I recall the text
>> to be quite similar.
>>
>> -Paul
>>
>> On Mon, Dec 15, 2014 at 7:12 PM, Gilles Gouaillardet <
>> gilles.gouaillar...@iferc.org> wrote:
>>
>>>  Paul,
>>>
>>> did you manually set -mt ?
>>>
>>> if i remember correctly, solaris 11 (at least with gcc compilers) do not
>>> need any flags
>>> (except the -D_REENTRANT that is added automatically)
>>>
>>> Cheers,
>>>
>>> Gilles
>>>
>>>
>>> On 2014/12/16 12:10, Paul Hargrove wrote:
>>>
>>> Gilles,
>>>
>>> I will try the patch when I can.
>>> However, our network is undergoing network maintenance right now, leaving
>>> me unable to reach the necessary hosts.
>>>
>>> As for -D_REENTRANT, I had already reported having verified in the "make"
>>> output that it had been added automatically.
>>>
>>> Additionally, the docs say that "-mt" *also* passes -D_REENTRANT to the
>>> preprocessor.
>>>
>>> -Paul
>>>
>>> On Mon, Dec 15, 2014 at 6:07 PM, Gilles Gouaillardet 
>>> <gilles.gouaillar...@iferc.org> wrote:
>>>
>>>
>>>  Paul,
>>>
>>> could you please make sure configure added  "-D_REENTRANT" to the CFLAGS ?
>>> /* otherwise, errno is a global variable instead of a per thread variable,
>>> which can
>>> explains some weird behaviour. note this should have been already fixed */
>>>
>>> assuming -D_REENTRANT is set, could you please give the attached patch a
>>> try ?
>>>
>>> i suspect the CLOSE_THE_SOCKET macro resets errno, and hence the confusing
>>> error message
>>> e.g. failed: Error 0 (0)
>>>
>>> FWIW, master is also affected.
>>>
>>> Cheers,
>>>
>>> Gilles
>>>
>>>
>>> On 2014/12/16 10:47, Paul Hargrove wrote:
>>>
>>> I have tried with a oob_tcp_if_include setting so that there is now only 1
>>> interface.
>>> Even with just one interface and -mt=yes in both LDFLAGS and
>>> wrapper-ldflags I *still* getting messages like
>>>
>>> [pcp-j-20:11470] mca_oob_tcp_accept: accept() failed: Error 0 (0).
>>> ------------------------------------------------------------
>>> A process or daemon was unable to complete a TCP connection
>>> to another process:
>>>   Local host:    pcp-j-20
>>>   Remote host:   172.16.0.120
>>> This is usually caused by a firewall on the remote host. Please
>>> check that any firewall (e.g., iptables) has been disabled and
>>> try again.
>>> ------------------------------
>>> ------------------------------
>>>
>>>
>>> I am getting less certain that my speculation about thread-safe libs is
>>> correct.
>>>
>>> -Paul
>>>
>>> On Mon, Dec 15, 2014 at 1:24 PM, Paul Hargrove <phhargr...@lbl.gov> 
>>> <phhargr...@lbl.gov> <phhargr...@lbl.gov> <phhargr...@lbl.gov> wrote:
>>>
>>>  A little more reading finds that...
>>>
>>> Docs says that one needs "-mt" without the "=yes".
>>> That will work for both old and new compilers, where "-mt=yes" chokes
>>> older ones.
>>>
>>> Also, man pages say "-mt" must come before "-lpthread" in the link command.
>>>
>>> -Paul
>>>
>>> On Mon, Dec 15, 2014 at 12:52 PM, Paul Hargrove <phhargr...@lbl.gov> 
>>> <phhargr...@lbl.gov> <phhargr...@lbl.gov> <phhargr...@lbl.gov>
>>> wrote:
>>>
>>>
>>> On Mon, Dec 15, 2014 at 5:35 AM, Ralph Castain <r...@open-mpi.org> 
>>> <r...@open-mpi.org> <r...@open-mpi.org> <r...@open-mpi.org> wrote:
>>>
>>>  7. Linkage issue on Solaris-11 reported by Paul Hargrove. Missing the
>>> multi-threaded C libraries, apparently need "-mt=yes" in both compile and
>>> link. Need someone to investigate.
>>>
>>>
>>> The lack of multi-thread libraries is my SPECULATION.
>>>
>>> The fact that configuring with LDFLAGS=-mt=yes did not help may or may
>>> not prove anything.
>>> I didn't see them in "mpicc -show" and so maybe they needed to be in
>>> wrapper-ldflags instead.
>>> My time this week is quite limited, but I can "fire an forget" tests of
>>> any tarballs you provide.
>>>
>>> -Paul
>>>
>>> --
>>> Paul H. Hargrove                          phhargr...@lbl.gov
>>>
>>> Computer Languages & Systems Software (CLaSS) Group
>>> Computer Science Department               Tel: +1-510-495-2352
>>> Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900
>>>
>>>
>>> --
>>> Paul H. Hargrove                          phhargr...@lbl.gov
>>> Computer Languages & Systems Software (CLaSS) Group
>>> Computer Science Department               Tel: +1-510-495-2352
>>> Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900
>>>
>>>
>>>
>>> _______________________________________________
>>> devel mailing listde...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> Link to this post: 
>>> http://www.open-mpi.org/community/lists/devel/2014/12/16607.php
>>>
>>>
>>>
>>> _______________________________________________
>>> devel mailing listde...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> Link to this 
>>> post:http://www.open-mpi.org/community/lists/devel/2014/12/16608.php
>>>
>>>
>>>
>>> _______________________________________________
>>> devel mailing listde...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>> Link to this post: 
>>> http://www.open-mpi.org/community/lists/devel/2014/12/16610.php
>>>
>>>
>>>
>>> _______________________________________________
>>> devel mailing list
>>> de...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> Link to this post:
>>> http://www.open-mpi.org/community/lists/devel/2014/12/16611.php
>>>
>>
>> --
>> Paul H. Hargrove                          phhargr...@lbl.gov
>> Computer Languages & Systems Software (CLaSS) Group
>> Computer Science Department               Tel: +1-510-495-2352
>> Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900
>>
>
>
>
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/12/16613.php

Reply via email to