Short version:

v1.8 nightly (v1.8.3-313-g54c80c2) PASSED my testing.

In full:

I gave openmpi-v1.8.3-313-g54c80c2 a try.
In this test I did not add -D_REENTRANT or -mt to any flags at configure
time.
In addition to --prefix, I passed the following:

--enable-debug --with-verbs \
CC=cc CXX=CC FC=f90 \
CFLAGS=-m64 --with-wrapper-cflags=-m64 \
FCFLAGS=-m64 --with-wrapper-fcflags=-m64 \
CXXFLAGS='-m64 -library=stlport4' --with-wrapper-cxxflags='-m64
-library=stlport4'


So, this was essentially an "out of the box" build with the configure
options needed for the compilers and ABI I desire.
They are the same options I have used successfully with 1.8.3.
So, I believe the regression I had observed relative to 1.8.3 has ben
resolved.

I am going to run the nightly on other configs on both my Solaris-11/x86-64
and Solaris-10/SPARC systems.
I just want to be sure some other compile/abi/arch combination didn't get
broken by accident.
I will post my results to the list (probably Thu lunch time in California).

-Paul

On Wed, Dec 17, 2014 at 2:54 PM, Jeff Squyres (jsquyres) <jsquy...@cisco.com
> wrote:
>
> Paul --
>
> The __sun macro check is now in the OMPI 1.8 tree, and is in the latest
> nightly tarball.
>
> If I'm following this thread right -- and I might not be! -- I think
> Gilles is saying that now that the __sun check is in, it should fix this
> -mt/-D_REENTRANT/whatever problem.
>
> Can you confirm?
>
>
> On Dec 16, 2014, at 1:55 PM, Paul Hargrove <phhargr...@lbl.gov> wrote:
>
> > Gilles,
> >
> > I am running mpirun on a host that ALSO will run one of the application
> processes.
> > Requested ifconfig and netstat outputs appear below.
> >
> > -Paul
> >
> > [phargrov@pcp-j-20 ~]$ ifconfig -a
> > lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu
> 8232 index 1
> >         inet 127.0.0.1 netmask ff000000
> > bge0: flags=1004843<UP,BROADCAST,RUNNING,MULTICAST,DHCP,IPv4> mtu 1500
> index 2
> >         inet 172.16.0.120 netmask ffff0000 broadcast 172.16.255.255
> > pFFFF.ibp0:
> flags=1001000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,FIXEDMTU> mtu 2044
> index 3
> >         inet 172.18.0.120 netmask ffff0000 broadcast 172.18.255.255
> > lo0: flags=2002000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv6,VIRTUAL> mtu
> 8252 index 1
> >         inet6 ::1/128
> > bge0: flags=20002004841<UP,RUNNING,MULTICAST,DHCP,IPv6> mtu 1500 index 2
> >         inet6 fe80::250:45ff:fe5c:2b0/10
> > [phargrov@pcp-j-20 ~]$ netstat -nr
> >
> > Routing Table: IPv4
> >   Destination           Gateway           Flags  Ref     Use
>  Interface
> > -------------------- -------------------- ----- ----- ----------
> ---------
> > default              172.16.254.1         UG        2     158463 bge0
> > 127.0.0.1            127.0.0.1            UH        5     398913 lo0
> > 172.16.0.0           172.16.0.120         U         4  135241319 bge0
> > 172.18.0.0           172.18.0.120         U         3         26
> pFFFF.ibp0
> >
> > Routing Table: IPv6
> >   Destination/Mask            Gateway                   Flags Ref   Use
>   If
> > --------------------------- --------------------------- ----- ---
> ------- -----
> > ::1                         ::1                         UH      2
>  0 lo0
> > fe80::/10                   fe80::250:45ff:fe5c:2b0     U       2
>  0 bge0
> >
> > On Tue, Dec 16, 2014 at 2:55 AM, Gilles Gouaillardet <
> gilles.gouaillar...@iferc.org> wrote:
> > Paul,
> >
> > could you please send the output of
> > ifconfig -a
> > netstat -nr
> >
> > on the three hosts you are using
> > (i assume you are still invoking mpirun from one node, and tasks are
> running on two other nodes)
> >
> > Cheers,
> >
> > Gilles
> >
> >
> > On 2014/12/16 16:00, Paul Hargrove wrote:
> >> Gilles,
> >>
> >> I looked again carefully and I am *NOT* finding -D_REENTRANT passed to
> most
> >> compilations.
> >> It appears to be used for building libevent and vt, but nothing else.
> >> The output from configure contains
> >>
> >> checking if more special flags are required for pthreads... -D_REENTRANT
> >>
> >> only in the libevent and vt sub-configure portions.
> >>
> >> When configured for gcc on Solaris-11 I see the following in configure
> >>
> >> checking for C optimization flags... -m64 -D_REENTRANT -g
> >> -finline-functions -fno-strict-aliasing
> >>
> >> but with CC=cc the equivalent line is
> >>
> >> checking for C optimization flags... -m64 -g
> >>
> >> In both cases the "-m64" is from the CFLAGS I have passed to configure.
> >>
> >> However, when I use CFLAGS="-m64 -D_REENTRANT" the problem DOES NOT go
> away.
> >> I see
> >>
> >> [pcp-j-20:24740] mca_oob_tcp_accept: accept() failed: Error 0 (11).
> >> ------------------------------------------------------------
> >> A process or daemon was unable to complete a TCP connection
> >> to another process:
> >>   Local host:    pcp-j-20
> >>   Remote host:   172.18.0.120
> >> This is usually caused by a firewall on the remote host. Please
> >> check that any firewall (e.g., iptables) has been disabled and
> >> try again.
> >> ------------------------------------------------------------
> >>
> >> which is at least appears to have a non-zero errno.
> >> A quick grep through /usr/include/sys/errno shows 11 is EAGAIN.
> >>
> >> With the oob.patch you provided the failed accept goes away, BUT the
> >> connection still fails:
> >>
> >> ------------------------------------------------------------
> >> A process or daemon was unable to complete a TCP connection
> >> to another process:
> >>   Local host:    pcp-j-20
> >>   Remote host:   172.18.0.120
> >> This is usually caused by a firewall on the remote host. Please
> >> check that any firewall (e.g., iptables) has been disabled and
> >> try again.
> >> ------------------------------------------------------------
> >>
> >>
> >> Use of "-mca oob_tcp_if_include bge0" to use a single interface did not
> fix
> >> this.
> >>
> >>
> >> -Paul
> >>
> >> On Mon, Dec 15, 2014 at 7:18 PM, Paul Hargrove
> >> <phhargr...@lbl.gov>
> >>  wrote:
> >>
> >>> Gilles,
> >>>
> >>> I am NOT seeing the problem with gcc.
> >>> It is only occurring with the Studio compilers.
> >>>
> >>> As I've already reported, I have tried adding either "-mt" or
> "-mt=yes" to
> >>> both LDFLAGS and --with-wrapper-ldflags.
> >>>
> >>> The "cc" manpage (on the Solaris-10 system I can get to right now)
> says:
> >>>
> >>>      -mt  Compile and link for multithreaded code.
> >>>
> >>>           This option passes -D_REENTRANT to the preprocessor and
> >>>           passes -lthread in the correct order to ld.
> >>>
> >>>           The -mt option is required if the application or
> >>>           libraries are multithreaded.
> >>>
> >>>           To ensure proper library linking order, you must use
> >>>           this option, rather than -lthread, to link with lib-
> >>>           thread.
> >>>
> >>>           If you are using POSIX threads, you must link with the
> >>>           options -mt -lpthread.  The -mt option is necessary
> >>>           because libC and libCrun need libthread for a mul-
> >>>           tithreaded application.
> >>>
> >>>           If you compile and link in separate steps and you com-
> >>>           pile with -mt, you might get unexpected results. If you
> >>>           compile one translation unit with -mt, compile all
> >>>           units of the program with -mt.
> >>>
> >>> I cannot connect to my Solaris-11 system right now, but I recall the
> text
> >>> to be quite similar.
> >>>
> >>> -Paul
> >>>
> >>> On Mon, Dec 15, 2014 at 7:12 PM, Gilles Gouaillardet <
> >>>
> >>> gilles.gouaillar...@iferc.org
> >>> > wrote:
> >>>
> >>>
> >>>>  Paul,
> >>>>
> >>>> did you manually set -mt ?
> >>>>
> >>>> if i remember correctly, solaris 11 (at least with gcc compilers) do
> not
> >>>> need any flags
> >>>> (except the -D_REENTRANT that is added automatically)
> >>>>
> >>>> Cheers,
> >>>>
> >>>> Gilles
> >>>>
> >>>>
> >>>> On 2014/12/16 12:10, Paul Hargrove wrote:
> >>>>
> >>>> Gilles,
> >>>>
> >>>> I will try the patch when I can.
> >>>> However, our network is undergoing network maintenance right now,
> leaving
> >>>> me unable to reach the necessary hosts.
> >>>>
> >>>> As for -D_REENTRANT, I had already reported having verified in the
> "make"
> >>>> output that it had been added automatically.
> >>>>
> >>>> Additionally, the docs say that "-mt" *also* passes -D_REENTRANT to
> the
> >>>> preprocessor.
> >>>>
> >>>> -Paul
> >>>>
> >>>> On Mon, Dec 15, 2014 at 6:07 PM, Gilles Gouaillardet
> >>>> <gilles.gouaillar...@iferc.org>
> >>>>  wrote:
> >>>>
> >>>>
> >>>>  Paul,
> >>>>
> >>>> could you please make sure configure added  "-D_REENTRANT" to the
> CFLAGS ?
> >>>> /* otherwise, errno is a global variable instead of a per thread
> variable,
> >>>> which can
> >>>> explains some weird behaviour. note this should have been already
> fixed */
> >>>>
> >>>> assuming -D_REENTRANT is set, could you please give the attached
> patch a
> >>>> try ?
> >>>>
> >>>> i suspect the CLOSE_THE_SOCKET macro resets errno, and hence the
> confusing
> >>>> error message
> >>>> e.g. failed: Error 0 (0)
> >>>>
> >>>> FWIW, master is also affected.
> >>>>
> >>>> Cheers,
> >>>>
> >>>> Gilles
> >>>>
> >>>>
> >>>> On 2014/12/16 10:47, Paul Hargrove wrote:
> >>>>
> >>>> I have tried with a oob_tcp_if_include setting so that there is now
> only 1
> >>>> interface.
> >>>> Even with just one interface and -mt=yes in both LDFLAGS and
> >>>> wrapper-ldflags I *still* getting messages like
> >>>>
> >>>> [pcp-j-20:11470] mca_oob_tcp_accept: accept() failed: Error 0 (0).
> >>>> ------------------------------
> >>>>
> >>>> ------------------------------
> >>>> A process or daemon was unable to complete a TCP connection
> >>>> to another process:
> >>>>   Local host:    pcp-j-20
> >>>>   Remote host:   172.16.0.120
> >>>> This is usually caused by a firewall on the remote host. Please
> >>>> check that any firewall (e.g., iptables) has been disabled and
> >>>> try again.
> >>>> ------------------------------
> >>>> ------------------------------
> >>>>
> >>>>
> >>>> I am getting less certain that my speculation about thread-safe libs
> is
> >>>> correct.
> >>>>
> >>>> -Paul
> >>>>
> >>>> On Mon, Dec 15, 2014 at 1:24 PM, Paul Hargrove
> >>>> <phhargr...@lbl.gov> <phhargr...@lbl.gov> <phhargr...@lbl.gov> <
> phhargr...@lbl.gov>
> >>>>  wrote:
> >>>>
> >>>>  A little more reading finds that...
> >>>>
> >>>> Docs says that one needs "-mt" without the "=yes".
> >>>> That will work for both old and new compilers, where "-mt=yes" chokes
> >>>> older ones.
> >>>>
> >>>> Also, man pages say "-mt" must come before "-lpthread" in the link
> command.
> >>>>
> >>>> -Paul
> >>>>
> >>>> On Mon, Dec 15, 2014 at 12:52 PM, Paul Hargrove
> >>>> <phhargr...@lbl.gov> <phhargr...@lbl.gov> <phhargr...@lbl.gov> <
> phhargr...@lbl.gov>
> >>>>
> >>>> wrote:
> >>>>
> >>>>
> >>>> On Mon, Dec 15, 2014 at 5:35 AM, Ralph Castain
> >>>> <r...@open-mpi.org> <r...@open-mpi.org> <r...@open-mpi.org> <
> r...@open-mpi.org>
> >>>>  wrote:
> >>>>
> >>>>  7. Linkage issue on Solaris-11 reported by Paul Hargrove. Missing the
> >>>> multi-threaded C libraries, apparently need "-mt=yes" in both compile
> and
> >>>> link. Need someone to investigate.
> >>>>
> >>>>
> >>>> The lack of multi-thread libraries is my SPECULATION.
> >>>>
> >>>> The fact that configuring with LDFLAGS=-mt=yes did not help may or may
> >>>> not prove anything.
> >>>> I didn't see them in "mpicc -show" and so maybe they needed to be in
> >>>> wrapper-ldflags instead.
> >>>> My time this week is quite limited, but I can "fire an forget" tests
> of
> >>>> any tarballs you provide.
> >>>>
> >>>> -Paul
> >>>>
> >>>> --
> >>>> Paul H. Hargrove
> >>>> phhargr...@lbl.gov
> >>>>
> >>>>
> >>>> Computer Languages & Systems Software (CLaSS) Group
> >>>> Computer Science Department               Tel:
> >>>> +1-510-495-2352
> >>>>
> >>>> Lawrence Berkeley National Laboratory     Fax:
> >>>> +1-510-486-6900
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> Paul H. Hargrove
> >>>> phhargr...@lbl.gov
> >>>>
> >>>> Computer Languages & Systems Software (CLaSS) Group
> >>>> Computer Science Department               Tel:
> >>>> +1-510-495-2352
> >>>>
> >>>> Lawrence Berkeley National Laboratory     Fax:
> >>>> +1-510-486-6900
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> devel mailing
> >>>> listde...@open-mpi.org
> >>>>
> >>>> Subscription:
> >>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >>>>
> >>>> Link to this post:
> >>>> http://www.open-mpi.org/community/lists/devel/2014/12/16607.php
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> devel mailing
> >>>> listde...@open-mpi.org
> >>>>
> >>>> Subscription:
> >>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >>>>
> >>>> Link to this post:
> >>>> http://www.open-mpi.org/community/lists/devel/2014/12/16608.php
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> devel mailing
> >>>> listde...@open-mpi.org
> >>>>
> >>>> Subscription:
> >>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >>>>
> >>>>
> >>>> Link to this post:
> >>>> http://www.open-mpi.org/community/lists/devel/2014/12/16610.php
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> devel mailing list
> >>>>
> >>>> de...@open-mpi.org
> >>>>
> >>>> Subscription:
> >>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >>>>
> >>>> Link to this post:
> >>>>
> >>>> http://www.open-mpi.org/community/lists/devel/2014/12/16611.php
> >>>>
> >>>>
> >>>>
> >>>
> >>> --
> >>> Paul H. Hargrove
> >>> phhargr...@lbl.gov
> >>>
> >>> Computer Languages & Systems Software (CLaSS) Group
> >>> Computer Science Department               Tel:
> >>> +1-510-495-2352
> >>>
> >>> Lawrence Berkeley National Laboratory     Fax:
> >>> +1-510-486-6900
> >>>
> >>>
> >>>
> >>
> >>
> >> _______________________________________________
> >> devel mailing list
> >>
> >> de...@open-mpi.org
> >>
> >> Subscription:
> >> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >>
> >> Link to this post:
> >> http://www.open-mpi.org/community/lists/devel/2014/12/16613.php
> >
> >
> > _______________________________________________
> > devel mailing list
> > de...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/12/16615.php
> >
> >
> > --
> > Paul H. Hargrove                          phhargr...@lbl.gov
> > Computer Languages & Systems Software (CLaSS) Group
> > Computer Science Department               Tel: +1-510-495-2352
> > Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900
> > _______________________________________________
> > devel mailing list
> > de...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/12/16617.php
>
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/12/16660.php
>


-- 
Paul H. Hargrove                          phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department               Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900

Reply via email to