Paul,

you can download a patch at https://patch-diff.githubusercontent.com/raw/open-mpi/ompi-release/pull/1336.patch

(note you need recent autotools in order to use it)


Cheers,


Gilles


On 8/23/2016 10:40 PM, r...@open-mpi.org wrote:
Looks like Solaris has a “getupeercred” - can you take a look at it, Gilles? We’d have to add that to our AC_CHECK_FUNCS and update the native sec component.


On Aug 23, 2016, at 6:32 AM, r...@open-mpi.org <mailto:r...@open-mpi.org> wrote:

I took a quick glance at this one, and the only way I can see to get that error is from this block of code:

#if defined(HAVE_STRUCT_UCRED_UID)
    euid = ucred.uid;
    gid = ucred.gid;
#else
    euid = ucred.cr_uid;
    gid = ucred.cr_gid;
#endif

#elif defined(HAVE_GETPEEREID)
    pmix_output_verbose(2, pmix_globals.debug_output,
"sec:native checking getpeereid for peer credentials");
    if (0 != getpeereid(peer->sd, &euid, &gid)) {
        pmix_output_verbose(2, pmix_globals.debug_output,
                            "sec: getsockopt getpeereid failed: %s",
                            strerror (pmix_socket_errno));
        return PMIX_ERR_INVALID_CRED;
    }
#else
    return PMIX_ERR_NOT_SUPPORTED;
#endif


I can only surmise, therefore, that Solaris doesn’t pass either of the two #if define’d tests. Is there a Solaris alternative?


On Aug 23, 2016, at 5:55 AM, r...@open-mpi.org <mailto:r...@open-mpi.org> wrote:

Thanks Gilles!

On Aug 23, 2016, at 3:42 AM, Gilles Gouaillardet <gilles.gouaillar...@gmail.com <mailto:gilles.gouaillar...@gmail.com>> wrote:

Thanks Paul,

at first glance, something is going wrong in the sec module under solaris.
I will keep digging tomorrow

Cheers,

Gilles

On Tuesday, August 23, 2016, Paul Hargrove <phhargr...@lbl.gov <mailto:phhargr...@lbl.gov>> wrote:

    On Solaris 11.3 on x86-64:

    $ mpirun -mca btl sm,self,openib -np 2 -host pcp-d-3,pcp-d-4
    examples/ring_c'
    [pcp-d-4:25075] PMIX ERROR: NOT-SUPPORTED in file
    
/shared/OMPI/openmpi-2.0.1rc1-solaris11-x86-ib-gcc/openmpi-2.0.1rc1/opal/mca/pmix/pmix112/pmix/src/server/pmix_server_listener.c
    at line 529
    [pcp-d-4:25078] PMIX ERROR: UNREACHABLE in file
    
/shared/OMPI/openmpi-2.0.1rc1-solaris11-x86-ib-gcc/openmpi-2.0.1rc1/opal/mca/pmix/pmix112/pmix/src/client/pmix_client.c
    at line 983
    [pcp-d-4:25078] PMIX ERROR: UNREACHABLE in file
    
/shared/OMPI/openmpi-2.0.1rc1-solaris11-x86-ib-gcc/openmpi-2.0.1rc1/opal/mca/pmix/pmix112/pmix/src/client/pmix_client.c
    at line 199
    --------------------------------------------------------------------------
    It looks like MPI_INIT failed for some reason; your parallel
    process is
    likely to abort.  There are many reasons that a parallel
    process can
    fail during MPI_INIT; some of which are due to configuration or
    environment
    problems.  This failure appears to be an internal failure;
    here's some
    additional information (which may only be relevant to an Open MPI
    developer):

    ompi_mpi_init: ompi_rte_init failed
      --> Returned "(null)" (-43) instead of "Success" (0)
    --------------------------------------------------------------------------
    *** An error occurred in MPI_Init
    *** on a NULL communicator
    *** MPI_ERRORS_ARE_FATAL (processes in this communicator will
    now abort,
    ***    and potentially your MPI job)
    [pcp-d-4:25078] Local abort before MPI_INIT completed completed
    successfully, but am not able to aggregate error messages, and
    not able to guarantee that all other processes were killed!
    -------------------------------------------------------
    Primary job  terminated normally, but 1 process returned
    a non-zero exit code.. Per user-direction, the job has been
    aborted.
    -------------------------------------------------------
    --------------------------------------------------------------------------
    mpirun detected that one or more processes exited with non-zero
    status, thus causing
    the job to be terminated. The first process to do so was:

      Process name: [[25599,1],1]
      Exit code:  1
    --------------------------------------------------------------------------

    -Paul

-- Paul H. Hargrove phhargr...@lbl.gov
    <javascript:_e(%7B%7D,'cvml','phhargr...@lbl.gov');>
    Computer Languages & Systems Software (CLaSS) Group
    Computer Science Department             Tel: +1-510-495-2352
    Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900

_______________________________________________
devel mailing list
devel@lists.open-mpi.org <mailto:devel@lists.open-mpi.org>
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

_______________________________________________
devel mailing list
devel@lists.open-mpi.org <mailto:devel@lists.open-mpi.org>
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel




_______________________________________________
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

_______________________________________________
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Reply via email to