Giles,

I have successfully built openmpi-v2.0.0-227-g917d293 (tonight's nightly
tarball) on Solaris 11.3 with both the Gnu and Studio compilers.  Based on
Ralph's previous email, I assume that included the patch you had directed
me to (though I did not attempt to verify that myself).

-Paul

On Wed, Aug 24, 2016 at 10:44 AM, Paul Hargrove <phhargr...@lbl.gov> wrote:

> Ralph,
>
> That will allow me to test much sooner.
>
> -Paul
>
> On Wed, Aug 24, 2016 at 10:41 AM, r...@open-mpi.org <r...@open-mpi.org>
> wrote:
>
>> When you do, that PR has already been committed, so you can just pull the
>> next nightly 2.x tarball and test from there
>>
>> On Aug 24, 2016, at 10:39 AM, Paul Hargrove <phhargr...@lbl.gov> wrote:
>>
>> I am afraid it might take a day or two before I can get to testing that
>> patch.
>>
>> -Paul
>>
>> On Tue, Aug 23, 2016 at 10:16 PM, Gilles Gouaillardet <gil...@rist.or.jp>
>> wrote:
>>
>>> Paul,
>>>
>>>
>>> you can download a patch at https://patch-diff.githubuserc
>>> ontent.com/raw/open-mpi/ompi-release/pull/1336.patch
>>>
>>> (note you need recent autotools in order to use it)
>>>
>>>
>>> Cheers,
>>>
>>>
>>> Gilles
>>>
>>> On 8/23/2016 10:40 PM, r...@open-mpi.org wrote:
>>>
>>> Looks like Solaris has a “getupeercred” - can you take a look at it,
>>> Gilles? We’d have to add that to our AC_CHECK_FUNCS and update the native
>>> sec component.
>>>
>>>
>>> On Aug 23, 2016, at 6:32 AM, r...@open-mpi.org wrote:
>>>
>>> I took a quick glance at this one, and the only way I can see to get
>>> that error is from this block of code:
>>>
>>> #if defined(HAVE_STRUCT_UCRED_UID)
>>>     euid = ucred.uid;
>>>     gid = ucred.gid;
>>> #else
>>>     euid = ucred.cr_uid;
>>>     gid = ucred.cr_gid;
>>> #endif
>>>
>>> #elif defined(HAVE_GETPEEREID)
>>>     pmix_output_verbose(2, pmix_globals.debug_output,
>>>                         "sec:native checking getpeereid for peer
>>> credentials");
>>>     if (0 != getpeereid(peer->sd, &euid, &gid)) {
>>>         pmix_output_verbose(2, pmix_globals.debug_output,
>>>                             "sec: getsockopt getpeereid failed: %s",
>>>                             strerror (pmix_socket_errno));
>>>         return PMIX_ERR_INVALID_CRED;
>>>     }
>>> #else
>>>     return PMIX_ERR_NOT_SUPPORTED;
>>> #endif
>>>
>>>
>>> I can only surmise, therefore, that Solaris doesn’t pass either of the
>>> two #if define’d tests. Is there a Solaris alternative?
>>>
>>>
>>> On Aug 23, 2016, at 5:55 AM, r...@open-mpi.org wrote:
>>>
>>> Thanks Gilles!
>>>
>>> On Aug 23, 2016, at 3:42 AM, Gilles Gouaillardet <
>>> gilles.gouaillar...@gmail.com> wrote:
>>>
>>> Thanks Paul,
>>>
>>> at first glance, something is going wrong in the sec module under
>>> solaris.
>>> I will keep digging tomorrow
>>>
>>> Cheers,
>>>
>>> Gilles
>>>
>>> On Tuesday, August 23, 2016, Paul Hargrove <phhargr...@lbl.gov> wrote:
>>>
>>>> On Solaris 11.3 on x86-64:
>>>>
>>>> $ mpirun -mca btl sm,self,openib -np 2 -host pcp-d-3,pcp-d-4
>>>> examples/ring_c'
>>>> [pcp-d-4:25075] PMIX ERROR: NOT-SUPPORTED in file
>>>> /shared/OMPI/openmpi-2.0.1rc1-solaris11-x86-ib-gcc/openmpi-2
>>>> .0.1rc1/opal/mca/pmix/pmix112/pmix/src/server/pmix_server_listener.c
>>>> at line 529
>>>> [pcp-d-4:25078] PMIX ERROR: UNREACHABLE in file
>>>> /shared/OMPI/openmpi-2.0.1rc1-solaris11-x86-ib-gcc/openmpi-2
>>>> .0.1rc1/opal/mca/pmix/pmix112/pmix/src/client/pmix_client.c at line 983
>>>> [pcp-d-4:25078] PMIX ERROR: UNREACHABLE in file
>>>> /shared/OMPI/openmpi-2.0.1rc1-solaris11-x86-ib-gcc/openmpi-2
>>>> .0.1rc1/opal/mca/pmix/pmix112/pmix/src/client/pmix_client.c at line 199
>>>> ------------------------------------------------------------
>>>> --------------
>>>> It looks like MPI_INIT failed for some reason; your parallel process is
>>>> likely to abort.  There are many reasons that a parallel process can
>>>> fail during MPI_INIT; some of which are due to configuration or
>>>> environment
>>>> problems.  This failure appears to be an internal failure; here's some
>>>> additional information (which may only be relevant to an Open MPI
>>>> developer):
>>>>
>>>>   ompi_mpi_init: ompi_rte_init failed
>>>>   --> Returned "(null)" (-43) instead of "Success" (0)
>>>> ------------------------------------------------------------
>>>> --------------
>>>> *** An error occurred in MPI_Init
>>>> *** on a NULL communicator
>>>> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
>>>> ***    and potentially your MPI job)
>>>> [pcp-d-4:25078] Local abort before MPI_INIT completed completed
>>>> successfully, but am not able to aggregate error messages, and not able to
>>>> guarantee that all other processes were killed!
>>>> -------------------------------------------------------
>>>> Primary job  terminated normally, but 1 process returned
>>>> a non-zero exit code.. Per user-direction, the job has been aborted.
>>>> -------------------------------------------------------
>>>> ------------------------------------------------------------
>>>> --------------
>>>> mpirun detected that one or more processes exited with non-zero status,
>>>> thus causing
>>>> the job to be terminated. The first process to do so was:
>>>>
>>>>   Process name: [[25599,1],1]
>>>>   Exit code:    1
>>>> ------------------------------------------------------------
>>>> --------------
>>>>
>>>> -Paul
>>>>
>>>> --
>>>> Paul H. Hargrove                          phhargr...@lbl.gov
>>>> Computer Languages & Systems Software (CLaSS) Group
>>>> Computer Science Department               Tel: +1-510-495-2352
>>>> Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900
>>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel@lists.open-mpi.org
>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>>>
>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel@lists.open-mpi.org
>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> devel mailing 
>>> listde...@lists.open-mpi.orghttps://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>>>
>>>
>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel@lists.open-mpi.org
>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>>>
>>
>>
>>
>> --
>> Paul H. Hargrove                          phhargr...@lbl.gov
>> Computer Languages & Systems Software (CLaSS) Group
>> Computer Science Department               Tel: +1-510-495-2352
>> Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900
>> _______________________________________________
>> devel mailing list
>> devel@lists.open-mpi.org
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>>
>>
>>
>> _______________________________________________
>> devel mailing list
>> devel@lists.open-mpi.org
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>>
>
>
>
> --
> Paul H. Hargrove                          phhargr...@lbl.gov
> Computer Languages & Systems Software (CLaSS) Group
> Computer Science Department               Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900
>



-- 
Paul H. Hargrove                          phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department               Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900
_______________________________________________
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Reply via email to