Thanks Paul !

yes, this snapshot does include the patch i posted earlier.

btw, the issue was a runtime error, not a build error.


Cheers,


Gilles


On 8/25/2016 12:00 PM, Paul Hargrove wrote:
Giles,

I have successfully built openmpi-v2.0.0-227-g917d293 (tonight's nightly tarball) on Solaris 11.3 with both the Gnu and Studio compilers. Based on Ralph's previous email, I assume that included the patch you had directed me to (though I did not attempt to verify that myself).

-Paul

On Wed, Aug 24, 2016 at 10:44 AM, Paul Hargrove <phhargr...@lbl.gov <mailto:phhargr...@lbl.gov>> wrote:

    Ralph,

    That will allow me to test much sooner.

    -Paul

    On Wed, Aug 24, 2016 at 10:41 AM, r...@open-mpi.org
    <mailto:r...@open-mpi.org> <r...@open-mpi.org
    <mailto:r...@open-mpi.org>> wrote:

        When you do, that PR has already been committed, so you can
        just pull the next nightly 2.x tarball and test from there

        On Aug 24, 2016, at 10:39 AM, Paul Hargrove
        <phhargr...@lbl.gov <mailto:phhargr...@lbl.gov>> wrote:

        I am afraid it might take a day or two before I can get to
        testing that patch.

        -Paul

        On Tue, Aug 23, 2016 at 10:16 PM, Gilles Gouaillardet
        <gil...@rist.or.jp <mailto:gil...@rist.or.jp>> wrote:

            Paul,


            you can download a patch at
            
https://patch-diff.githubusercontent.com/raw/open-mpi/ompi-release/pull/1336.patch
            
<https://patch-diff.githubusercontent.com/raw/open-mpi/ompi-release/pull/1336.patch>

            (note you need recent autotools in order to use it)


            Cheers,


            Gilles


            On 8/23/2016 10:40 PM, r...@open-mpi.org
            <mailto:r...@open-mpi.org> wrote:
            Looks like Solaris has a “getupeercred” - can you take a
            look at it, Gilles? We’d have to add that to our
            AC_CHECK_FUNCS and update the native sec component.


            On Aug 23, 2016, at 6:32 AM, r...@open-mpi.org
            <mailto:r...@open-mpi.org> wrote:

            I took a quick glance at this one, and the only way I
            can see to get that error is from this block of code:

            #if defined(HAVE_STRUCT_UCRED_UID)
                euid = ucred.uid;
                gid = ucred.gid;
            #else
                euid = ucred.cr_uid;
                gid = ucred.cr_gid;
            #endif

            #elif defined(HAVE_GETPEEREID)
            pmix_output_verbose(2, pmix_globals.debug_output,
            "sec:native checking getpeereid for peer credentials");
                if (0 != getpeereid(peer->sd, &euid, &gid)) {
            pmix_output_verbose(2, pmix_globals.debug_output,
                "sec: getsockopt getpeereid failed: %s",
                strerror (pmix_socket_errno));
            return PMIX_ERR_INVALID_CRED;
                }
            #else
            return PMIX_ERR_NOT_SUPPORTED;
            #endif


            I can only surmise, therefore, that Solaris doesn’t
            pass either of the two #if define’d tests. Is there a
            Solaris alternative?


            On Aug 23, 2016, at 5:55 AM, r...@open-mpi.org
            <mailto:r...@open-mpi.org> wrote:

            Thanks Gilles!

            On Aug 23, 2016, at 3:42 AM, Gilles Gouaillardet
            <gilles.gouaillar...@gmail.com
            <mailto:gilles.gouaillar...@gmail.com>> wrote:

            Thanks Paul,

            at first glance, something is going wrong in the sec
            module under solaris.
            I will keep digging tomorrow

            Cheers,

            Gilles

            On Tuesday, August 23, 2016, Paul Hargrove
            <phhargr...@lbl.gov <mailto:phhargr...@lbl.gov>> wrote:

                On Solaris 11.3 on x86-64:

                $ mpirun -mca btl sm,self,openib -np 2 -host
                pcp-d-3,pcp-d-4 examples/ring_c'
                [pcp-d-4:25075] PMIX ERROR: NOT-SUPPORTED in file
                
/shared/OMPI/openmpi-2.0.1rc1-solaris11-x86-ib-gcc/openmpi-2.0.1rc1/opal/mca/pmix/pmix112/pmix/src/server/pmix_server_listener.c
                at line 529
                [pcp-d-4:25078] PMIX ERROR: UNREACHABLE in file
                
/shared/OMPI/openmpi-2.0.1rc1-solaris11-x86-ib-gcc/openmpi-2.0.1rc1/opal/mca/pmix/pmix112/pmix/src/client/pmix_client.c
                at line 983
                [pcp-d-4:25078] PMIX ERROR: UNREACHABLE in file
                
/shared/OMPI/openmpi-2.0.1rc1-solaris11-x86-ib-gcc/openmpi-2.0.1rc1/opal/mca/pmix/pmix112/pmix/src/client/pmix_client.c
                at line 199
                
--------------------------------------------------------------------------
                It looks like MPI_INIT failed for some reason;
                your parallel process is
                likely to abort. There are many reasons that a
                parallel process can
                fail during MPI_INIT; some of which are due to
                configuration or environment
                problems. This failure appears to be an internal
                failure; here's some
                additional information (which may only be
                relevant to an Open MPI
                developer):

                ompi_mpi_init: ompi_rte_init failed
                --> Returned "(null)" (-43) instead of "Success" (0)
                
--------------------------------------------------------------------------
                *** An error occurred in MPI_Init
                *** on a NULL communicator
                *** MPI_ERRORS_ARE_FATAL (processes in this
                communicator will now abort,
                ***    and potentially your MPI job)
                [pcp-d-4:25078] Local abort before MPI_INIT
                completed completed successfully, but am not able
                to aggregate error messages, and not able to
                guarantee that all other processes were killed!
                -------------------------------------------------------
                Primary job  terminated normally, but 1 process
                returned
                a non-zero exit code.. Per user-direction, the
                job has been aborted.
                -------------------------------------------------------
                
--------------------------------------------------------------------------
                mpirun detected that one or more processes exited
                with non-zero status, thus causing
                the job to be terminated. The first process to do
                so was:

                Process name: [[25599,1],1]
                Exit code:  1
                
--------------------------------------------------------------------------

                -Paul

-- Paul H. Hargrove phhargr...@lbl.gov
                Computer Languages & Systems Software (CLaSS) Group
                Computer Science Department           Tel:
                +1-510-495-2352 <tel:%2B1-510-495-2352>
                Lawrence Berkeley National Laboratory Fax:
                +1-510-486-6900 <tel:%2B1-510-486-6900>

            _______________________________________________
            devel mailing list
            devel@lists.open-mpi.org
            <mailto:devel@lists.open-mpi.org>
            https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
            <https://rfd.newmexicoconsortium.org/mailman/listinfo/devel>

            _______________________________________________
            devel mailing list
            devel@lists.open-mpi.org <mailto:devel@lists.open-mpi.org>
            https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
            <https://rfd.newmexicoconsortium.org/mailman/listinfo/devel>




            _______________________________________________
            devel mailing list
            devel@lists.open-mpi.org <mailto:devel@lists.open-mpi.org>
            https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
            <https://rfd.newmexicoconsortium.org/mailman/listinfo/devel>
            _______________________________________________ devel
            mailing list devel@lists.open-mpi.org
            <mailto:devel@lists.open-mpi.org>
            https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
<https://rfd.newmexicoconsortium.org/mailman/listinfo/devel> -- Paul H. Hargrove phhargr...@lbl.gov <mailto:phhargr...@lbl.gov>
        Computer Languages & Systems Software (CLaSS) Group
        Computer Science Department               Tel:
        +1-510-495-2352 <tel:%2B1-510-495-2352>
        Lawrence Berkeley National Laboratory     Fax:
        +1-510-486-6900 <tel:%2B1-510-486-6900>
        _______________________________________________ devel mailing
        list devel@lists.open-mpi.org
        <mailto:devel@lists.open-mpi.org>
        https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
        <https://rfd.newmexicoconsortium.org/mailman/listinfo/devel>
        _______________________________________________ devel mailing
        list devel@lists.open-mpi.org
        <mailto:devel@lists.open-mpi.org>
        https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
<https://rfd.newmexicoconsortium.org/mailman/listinfo/devel> -- Paul H. Hargrove phhargr...@lbl.gov <mailto:phhargr...@lbl.gov>
    Computer Languages & Systems Software (CLaSS) Group
    Computer Science Department               Tel: +1-510-495-2352
    <tel:%2B1-510-495-2352>
    Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900
    <tel:%2B1-510-486-6900>

--
Paul H. Hargrove phhargr...@lbl.gov <mailto:phhargr...@lbl.gov>
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department               Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900

_______________________________________________
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
_______________________________________________
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Reply via email to