Thanks Paul! Sorry I was out all day - stuck in meetings, I fear.

On Wed, Dec 17, 2014 at 7:17 PM, Paul Hargrove <phhargr...@lbl.gov> wrote:
>
> Short version:
>
> v1.8 nightly (v1.8.3-313-g54c80c2) PASSED my testing.
>
> In full:
>
> I gave openmpi-v1.8.3-313-g54c80c2 a try.
> In this test I did not add -D_REENTRANT or -mt to any flags at configure
> time.
> In addition to --prefix, I passed the following:
>
> --enable-debug --with-verbs \
> CC=cc CXX=CC FC=f90 \
> CFLAGS=-m64 --with-wrapper-cflags=-m64 \
> FCFLAGS=-m64 --with-wrapper-fcflags=-m64 \
> CXXFLAGS='-m64 -library=stlport4' --with-wrapper-cxxflags='-m64
> -library=stlport4'
>
>
> So, this was essentially an "out of the box" build with the configure
> options needed for the compilers and ABI I desire.
> They are the same options I have used successfully with 1.8.3.
> So, I believe the regression I had observed relative to 1.8.3 has ben
> resolved.
>
> I am going to run the nightly on other configs on both my
> Solaris-11/x86-64 and Solaris-10/SPARC systems.
> I just want to be sure some other compile/abi/arch combination didn't get
> broken by accident.
> I will post my results to the list (probably Thu lunch time in California).
>
> -Paul
>
> On Wed, Dec 17, 2014 at 2:54 PM, Jeff Squyres (jsquyres) <
> jsquy...@cisco.com> wrote:
>>
>> Paul --
>>
>> The __sun macro check is now in the OMPI 1.8 tree, and is in the latest
>> nightly tarball.
>>
>> If I'm following this thread right -- and I might not be! -- I think
>> Gilles is saying that now that the __sun check is in, it should fix this
>> -mt/-D_REENTRANT/whatever problem.
>>
>> Can you confirm?
>>
>>
>> On Dec 16, 2014, at 1:55 PM, Paul Hargrove <phhargr...@lbl.gov> wrote:
>>
>> > Gilles,
>> >
>> > I am running mpirun on a host that ALSO will run one of the application
>> processes.
>> > Requested ifconfig and netstat outputs appear below.
>> >
>> > -Paul
>> >
>> > [phargrov@pcp-j-20 ~]$ ifconfig -a
>> > lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu
>> 8232 index 1
>> >         inet 127.0.0.1 netmask ff000000
>> > bge0: flags=1004843<UP,BROADCAST,RUNNING,MULTICAST,DHCP,IPv4> mtu 1500
>> index 2
>> >         inet 172.16.0.120 netmask ffff0000 broadcast 172.16.255.255
>> > pFFFF.ibp0:
>> flags=1001000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,FIXEDMTU> mtu 2044
>> index 3
>> >         inet 172.18.0.120 netmask ffff0000 broadcast 172.18.255.255
>> > lo0: flags=2002000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv6,VIRTUAL> mtu
>> 8252 index 1
>> >         inet6 ::1/128
>> > bge0: flags=20002004841<UP,RUNNING,MULTICAST,DHCP,IPv6> mtu 1500 index 2
>> >         inet6 fe80::250:45ff:fe5c:2b0/10
>> > [phargrov@pcp-j-20 ~]$ netstat -nr
>> >
>> > Routing Table: IPv4
>> >   Destination           Gateway           Flags  Ref     Use
>>  Interface
>> > -------------------- -------------------- ----- ----- ----------
>> ---------
>> > default              172.16.254.1         UG        2     158463 bge0
>> > 127.0.0.1            127.0.0.1            UH        5     398913 lo0
>> > 172.16.0.0           172.16.0.120         U         4  135241319 bge0
>> > 172.18.0.0           172.18.0.120         U         3         26
>> pFFFF.ibp0
>> >
>> > Routing Table: IPv6
>> >   Destination/Mask            Gateway                   Flags Ref
>>  Use    If
>> > --------------------------- --------------------------- ----- ---
>> ------- -----
>> > ::1                         ::1                         UH      2
>>  0 lo0
>> > fe80::/10                   fe80::250:45ff:fe5c:2b0     U       2
>>  0 bge0
>> >
>> > On Tue, Dec 16, 2014 at 2:55 AM, Gilles Gouaillardet <
>> gilles.gouaillar...@iferc.org> wrote:
>> > Paul,
>> >
>> > could you please send the output of
>> > ifconfig -a
>> > netstat -nr
>> >
>> > on the three hosts you are using
>> > (i assume you are still invoking mpirun from one node, and tasks are
>> running on two other nodes)
>> >
>> > Cheers,
>> >
>> > Gilles
>> >
>> >
>> > On 2014/12/16 16:00, Paul Hargrove wrote:
>> >> Gilles,
>> >>
>> >> I looked again carefully and I am *NOT* finding -D_REENTRANT passed to
>> most
>> >> compilations.
>> >> It appears to be used for building libevent and vt, but nothing else.
>> >> The output from configure contains
>> >>
>> >> checking if more special flags are required for pthreads...
>> -D_REENTRANT
>> >>
>> >> only in the libevent and vt sub-configure portions.
>> >>
>> >> When configured for gcc on Solaris-11 I see the following in configure
>> >>
>> >> checking for C optimization flags... -m64 -D_REENTRANT -g
>> >> -finline-functions -fno-strict-aliasing
>> >>
>> >> but with CC=cc the equivalent line is
>> >>
>> >> checking for C optimization flags... -m64 -g
>> >>
>> >> In both cases the "-m64" is from the CFLAGS I have passed to configure.
>> >>
>> >> However, when I use CFLAGS="-m64 -D_REENTRANT" the problem DOES NOT go
>> away.
>> >> I see
>> >>
>> >> [pcp-j-20:24740] mca_oob_tcp_accept: accept() failed: Error 0 (11).
>> >> ------------------------------------------------------------
>> >> A process or daemon was unable to complete a TCP connection
>> >> to another process:
>> >>   Local host:    pcp-j-20
>> >>   Remote host:   172.18.0.120
>> >> This is usually caused by a firewall on the remote host. Please
>> >> check that any firewall (e.g., iptables) has been disabled and
>> >> try again.
>> >> ------------------------------------------------------------
>> >>
>> >> which is at least appears to have a non-zero errno.
>> >> A quick grep through /usr/include/sys/errno shows 11 is EAGAIN.
>> >>
>> >> With the oob.patch you provided the failed accept goes away, BUT the
>> >> connection still fails:
>> >>
>> >> ------------------------------------------------------------
>> >> A process or daemon was unable to complete a TCP connection
>> >> to another process:
>> >>   Local host:    pcp-j-20
>> >>   Remote host:   172.18.0.120
>> >> This is usually caused by a firewall on the remote host. Please
>> >> check that any firewall (e.g., iptables) has been disabled and
>> >> try again.
>> >> ------------------------------------------------------------
>> >>
>> >>
>> >> Use of "-mca oob_tcp_if_include bge0" to use a single interface did
>> not fix
>> >> this.
>> >>
>> >>
>> >> -Paul
>> >>
>> >> On Mon, Dec 15, 2014 at 7:18 PM, Paul Hargrove
>> >> <phhargr...@lbl.gov>
>> >>  wrote:
>> >>
>> >>> Gilles,
>> >>>
>> >>> I am NOT seeing the problem with gcc.
>> >>> It is only occurring with the Studio compilers.
>> >>>
>> >>> As I've already reported, I have tried adding either "-mt" or
>> "-mt=yes" to
>> >>> both LDFLAGS and --with-wrapper-ldflags.
>> >>>
>> >>> The "cc" manpage (on the Solaris-10 system I can get to right now)
>> says:
>> >>>
>> >>>      -mt  Compile and link for multithreaded code.
>> >>>
>> >>>           This option passes -D_REENTRANT to the preprocessor and
>> >>>           passes -lthread in the correct order to ld.
>> >>>
>> >>>           The -mt option is required if the application or
>> >>>           libraries are multithreaded.
>> >>>
>> >>>           To ensure proper library linking order, you must use
>> >>>           this option, rather than -lthread, to link with lib-
>> >>>           thread.
>> >>>
>> >>>           If you are using POSIX threads, you must link with the
>> >>>           options -mt -lpthread.  The -mt option is necessary
>> >>>           because libC and libCrun need libthread for a mul-
>> >>>           tithreaded application.
>> >>>
>> >>>           If you compile and link in separate steps and you com-
>> >>>           pile with -mt, you might get unexpected results. If you
>> >>>           compile one translation unit with -mt, compile all
>> >>>           units of the program with -mt.
>> >>>
>> >>> I cannot connect to my Solaris-11 system right now, but I recall the
>> text
>> >>> to be quite similar.
>> >>>
>> >>> -Paul
>> >>>
>> >>> On Mon, Dec 15, 2014 at 7:12 PM, Gilles Gouaillardet <
>> >>>
>> >>> gilles.gouaillar...@iferc.org
>> >>> > wrote:
>> >>>
>> >>>
>> >>>>  Paul,
>> >>>>
>> >>>> did you manually set -mt ?
>> >>>>
>> >>>> if i remember correctly, solaris 11 (at least with gcc compilers) do
>> not
>> >>>> need any flags
>> >>>> (except the -D_REENTRANT that is added automatically)
>> >>>>
>> >>>> Cheers,
>> >>>>
>> >>>> Gilles
>> >>>>
>> >>>>
>> >>>> On 2014/12/16 12:10, Paul Hargrove wrote:
>> >>>>
>> >>>> Gilles,
>> >>>>
>> >>>> I will try the patch when I can.
>> >>>> However, our network is undergoing network maintenance right now,
>> leaving
>> >>>> me unable to reach the necessary hosts.
>> >>>>
>> >>>> As for -D_REENTRANT, I had already reported having verified in the
>> "make"
>> >>>> output that it had been added automatically.
>> >>>>
>> >>>> Additionally, the docs say that "-mt" *also* passes -D_REENTRANT to
>> the
>> >>>> preprocessor.
>> >>>>
>> >>>> -Paul
>> >>>>
>> >>>> On Mon, Dec 15, 2014 at 6:07 PM, Gilles Gouaillardet
>> >>>> <gilles.gouaillar...@iferc.org>
>> >>>>  wrote:
>> >>>>
>> >>>>
>> >>>>  Paul,
>> >>>>
>> >>>> could you please make sure configure added  "-D_REENTRANT" to the
>> CFLAGS ?
>> >>>> /* otherwise, errno is a global variable instead of a per thread
>> variable,
>> >>>> which can
>> >>>> explains some weird behaviour. note this should have been already
>> fixed */
>> >>>>
>> >>>> assuming -D_REENTRANT is set, could you please give the attached
>> patch a
>> >>>> try ?
>> >>>>
>> >>>> i suspect the CLOSE_THE_SOCKET macro resets errno, and hence the
>> confusing
>> >>>> error message
>> >>>> e.g. failed: Error 0 (0)
>> >>>>
>> >>>> FWIW, master is also affected.
>> >>>>
>> >>>> Cheers,
>> >>>>
>> >>>> Gilles
>> >>>>
>> >>>>
>> >>>> On 2014/12/16 10:47, Paul Hargrove wrote:
>> >>>>
>> >>>> I have tried with a oob_tcp_if_include setting so that there is now
>> only 1
>> >>>> interface.
>> >>>> Even with just one interface and -mt=yes in both LDFLAGS and
>> >>>> wrapper-ldflags I *still* getting messages like
>> >>>>
>> >>>> [pcp-j-20:11470] mca_oob_tcp_accept: accept() failed: Error 0 (0).
>> >>>> ------------------------------
>> >>>>
>> >>>> ------------------------------
>> >>>> A process or daemon was unable to complete a TCP connection
>> >>>> to another process:
>> >>>>   Local host:    pcp-j-20
>> >>>>   Remote host:   172.16.0.120
>> >>>> This is usually caused by a firewall on the remote host. Please
>> >>>> check that any firewall (e.g., iptables) has been disabled and
>> >>>> try again.
>> >>>> ------------------------------
>> >>>> ------------------------------
>> >>>>
>> >>>>
>> >>>> I am getting less certain that my speculation about thread-safe libs
>> is
>> >>>> correct.
>> >>>>
>> >>>> -Paul
>> >>>>
>> >>>> On Mon, Dec 15, 2014 at 1:24 PM, Paul Hargrove
>> >>>> <phhargr...@lbl.gov> <phhargr...@lbl.gov> <phhargr...@lbl.gov> <
>> phhargr...@lbl.gov>
>> >>>>  wrote:
>> >>>>
>> >>>>  A little more reading finds that...
>> >>>>
>> >>>> Docs says that one needs "-mt" without the "=yes".
>> >>>> That will work for both old and new compilers, where "-mt=yes" chokes
>> >>>> older ones.
>> >>>>
>> >>>> Also, man pages say "-mt" must come before "-lpthread" in the link
>> command.
>> >>>>
>> >>>> -Paul
>> >>>>
>> >>>> On Mon, Dec 15, 2014 at 12:52 PM, Paul Hargrove
>> >>>> <phhargr...@lbl.gov> <phhargr...@lbl.gov> <phhargr...@lbl.gov> <
>> phhargr...@lbl.gov>
>> >>>>
>> >>>> wrote:
>> >>>>
>> >>>>
>> >>>> On Mon, Dec 15, 2014 at 5:35 AM, Ralph Castain
>> >>>> <r...@open-mpi.org> <r...@open-mpi.org> <r...@open-mpi.org> <
>> r...@open-mpi.org>
>> >>>>  wrote:
>> >>>>
>> >>>>  7. Linkage issue on Solaris-11 reported by Paul Hargrove. Missing
>> the
>> >>>> multi-threaded C libraries, apparently need "-mt=yes" in both
>> compile and
>> >>>> link. Need someone to investigate.
>> >>>>
>> >>>>
>> >>>> The lack of multi-thread libraries is my SPECULATION.
>> >>>>
>> >>>> The fact that configuring with LDFLAGS=-mt=yes did not help may or
>> may
>> >>>> not prove anything.
>> >>>> I didn't see them in "mpicc -show" and so maybe they needed to be in
>> >>>> wrapper-ldflags instead.
>> >>>> My time this week is quite limited, but I can "fire an forget" tests
>> of
>> >>>> any tarballs you provide.
>> >>>>
>> >>>> -Paul
>> >>>>
>> >>>> --
>> >>>> Paul H. Hargrove
>> >>>> phhargr...@lbl.gov
>> >>>>
>> >>>>
>> >>>> Computer Languages & Systems Software (CLaSS) Group
>> >>>> Computer Science Department               Tel:
>> >>>> +1-510-495-2352
>> >>>>
>> >>>> Lawrence Berkeley National Laboratory     Fax:
>> >>>> +1-510-486-6900
>> >>>>
>> >>>>
>> >>>>
>> >>>> --
>> >>>> Paul H. Hargrove
>> >>>> phhargr...@lbl.gov
>> >>>>
>> >>>> Computer Languages & Systems Software (CLaSS) Group
>> >>>> Computer Science Department               Tel:
>> >>>> +1-510-495-2352
>> >>>>
>> >>>> Lawrence Berkeley National Laboratory     Fax:
>> >>>> +1-510-486-6900
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>> _______________________________________________
>> >>>> devel mailing
>> >>>> listde...@open-mpi.org
>> >>>>
>> >>>> Subscription:
>> >>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> >>>>
>> >>>> Link to this post:
>> >>>> http://www.open-mpi.org/community/lists/devel/2014/12/16607.php
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>> _______________________________________________
>> >>>> devel mailing
>> >>>> listde...@open-mpi.org
>> >>>>
>> >>>> Subscription:
>> >>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> >>>>
>> >>>> Link to this post:
>> >>>> http://www.open-mpi.org/community/lists/devel/2014/12/16608.php
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>> _______________________________________________
>> >>>> devel mailing
>> >>>> listde...@open-mpi.org
>> >>>>
>> >>>> Subscription:
>> >>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> >>>>
>> >>>>
>> >>>> Link to this post:
>> >>>> http://www.open-mpi.org/community/lists/devel/2014/12/16610.php
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>> _______________________________________________
>> >>>> devel mailing list
>> >>>>
>> >>>> de...@open-mpi.org
>> >>>>
>> >>>> Subscription:
>> >>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> >>>>
>> >>>> Link to this post:
>> >>>>
>> >>>> http://www.open-mpi.org/community/lists/devel/2014/12/16611.php
>> >>>>
>> >>>>
>> >>>>
>> >>>
>> >>> --
>> >>> Paul H. Hargrove
>> >>> phhargr...@lbl.gov
>> >>>
>> >>> Computer Languages & Systems Software (CLaSS) Group
>> >>> Computer Science Department               Tel:
>> >>> +1-510-495-2352
>> >>>
>> >>> Lawrence Berkeley National Laboratory     Fax:
>> >>> +1-510-486-6900
>> >>>
>> >>>
>> >>>
>> >>
>> >>
>> >> _______________________________________________
>> >> devel mailing list
>> >>
>> >> de...@open-mpi.org
>> >>
>> >> Subscription:
>> >> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> >>
>> >> Link to this post:
>> >> http://www.open-mpi.org/community/lists/devel/2014/12/16613.php
>> >
>> >
>> > _______________________________________________
>> > devel mailing list
>> > de...@open-mpi.org
>> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> > Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2014/12/16615.php
>> >
>> >
>> > --
>> > Paul H. Hargrove                          phhargr...@lbl.gov
>> > Computer Languages & Systems Software (CLaSS) Group
>> > Computer Science Department               Tel: +1-510-495-2352
>> > Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900
>> > _______________________________________________
>> > devel mailing list
>> > de...@open-mpi.org
>> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> > Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2014/12/16617.php
>>
>>
>> --
>> Jeff Squyres
>> jsquy...@cisco.com
>> For corporate legal information go to:
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2014/12/16660.php
>>
>
>
> --
> Paul H. Hargrove                          phhargr...@lbl.gov
> Computer Languages & Systems Software (CLaSS) Group
> Computer Science Department               Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900
>
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/12/16663.php
>

Reply via email to