Giles, I have successfully built openmpi-v2.0.0-227-g917d293 (tonight's nightly tarball) on Solaris 11.3 with both the Gnu and Studio compilers. Based on Ralph's previous email, I assume that included the patch you had directed me to (though I did not attempt to verify that myself).
-Paul On Wed, Aug 24, 2016 at 10:44 AM, Paul Hargrove <phhargr...@lbl.gov> wrote: > Ralph, > > That will allow me to test much sooner. > > -Paul > > On Wed, Aug 24, 2016 at 10:41 AM, r...@open-mpi.org <r...@open-mpi.org> > wrote: > >> When you do, that PR has already been committed, so you can just pull the >> next nightly 2.x tarball and test from there >> >> On Aug 24, 2016, at 10:39 AM, Paul Hargrove <phhargr...@lbl.gov> wrote: >> >> I am afraid it might take a day or two before I can get to testing that >> patch. >> >> -Paul >> >> On Tue, Aug 23, 2016 at 10:16 PM, Gilles Gouaillardet <gil...@rist.or.jp> >> wrote: >> >>> Paul, >>> >>> >>> you can download a patch at https://patch-diff.githubuserc >>> ontent.com/raw/open-mpi/ompi-release/pull/1336.patch >>> >>> (note you need recent autotools in order to use it) >>> >>> >>> Cheers, >>> >>> >>> Gilles >>> >>> On 8/23/2016 10:40 PM, r...@open-mpi.org wrote: >>> >>> Looks like Solaris has a “getupeercred” - can you take a look at it, >>> Gilles? We’d have to add that to our AC_CHECK_FUNCS and update the native >>> sec component. >>> >>> >>> On Aug 23, 2016, at 6:32 AM, r...@open-mpi.org wrote: >>> >>> I took a quick glance at this one, and the only way I can see to get >>> that error is from this block of code: >>> >>> #if defined(HAVE_STRUCT_UCRED_UID) >>> euid = ucred.uid; >>> gid = ucred.gid; >>> #else >>> euid = ucred.cr_uid; >>> gid = ucred.cr_gid; >>> #endif >>> >>> #elif defined(HAVE_GETPEEREID) >>> pmix_output_verbose(2, pmix_globals.debug_output, >>> "sec:native checking getpeereid for peer >>> credentials"); >>> if (0 != getpeereid(peer->sd, &euid, &gid)) { >>> pmix_output_verbose(2, pmix_globals.debug_output, >>> "sec: getsockopt getpeereid failed: %s", >>> strerror (pmix_socket_errno)); >>> return PMIX_ERR_INVALID_CRED; >>> } >>> #else >>> return PMIX_ERR_NOT_SUPPORTED; >>> #endif >>> >>> >>> I can only surmise, therefore, that Solaris doesn’t pass either of the >>> two #if define’d tests. Is there a Solaris alternative? >>> >>> >>> On Aug 23, 2016, at 5:55 AM, r...@open-mpi.org wrote: >>> >>> Thanks Gilles! >>> >>> On Aug 23, 2016, at 3:42 AM, Gilles Gouaillardet < >>> gilles.gouaillar...@gmail.com> wrote: >>> >>> Thanks Paul, >>> >>> at first glance, something is going wrong in the sec module under >>> solaris. >>> I will keep digging tomorrow >>> >>> Cheers, >>> >>> Gilles >>> >>> On Tuesday, August 23, 2016, Paul Hargrove <phhargr...@lbl.gov> wrote: >>> >>>> On Solaris 11.3 on x86-64: >>>> >>>> $ mpirun -mca btl sm,self,openib -np 2 -host pcp-d-3,pcp-d-4 >>>> examples/ring_c' >>>> [pcp-d-4:25075] PMIX ERROR: NOT-SUPPORTED in file >>>> /shared/OMPI/openmpi-2.0.1rc1-solaris11-x86-ib-gcc/openmpi-2 >>>> .0.1rc1/opal/mca/pmix/pmix112/pmix/src/server/pmix_server_listener.c >>>> at line 529 >>>> [pcp-d-4:25078] PMIX ERROR: UNREACHABLE in file >>>> /shared/OMPI/openmpi-2.0.1rc1-solaris11-x86-ib-gcc/openmpi-2 >>>> .0.1rc1/opal/mca/pmix/pmix112/pmix/src/client/pmix_client.c at line 983 >>>> [pcp-d-4:25078] PMIX ERROR: UNREACHABLE in file >>>> /shared/OMPI/openmpi-2.0.1rc1-solaris11-x86-ib-gcc/openmpi-2 >>>> .0.1rc1/opal/mca/pmix/pmix112/pmix/src/client/pmix_client.c at line 199 >>>> ------------------------------------------------------------ >>>> -------------- >>>> It looks like MPI_INIT failed for some reason; your parallel process is >>>> likely to abort. There are many reasons that a parallel process can >>>> fail during MPI_INIT; some of which are due to configuration or >>>> environment >>>> problems. This failure appears to be an internal failure; here's some >>>> additional information (which may only be relevant to an Open MPI >>>> developer): >>>> >>>> ompi_mpi_init: ompi_rte_init failed >>>> --> Returned "(null)" (-43) instead of "Success" (0) >>>> ------------------------------------------------------------ >>>> -------------- >>>> *** An error occurred in MPI_Init >>>> *** on a NULL communicator >>>> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, >>>> *** and potentially your MPI job) >>>> [pcp-d-4:25078] Local abort before MPI_INIT completed completed >>>> successfully, but am not able to aggregate error messages, and not able to >>>> guarantee that all other processes were killed! >>>> ------------------------------------------------------- >>>> Primary job terminated normally, but 1 process returned >>>> a non-zero exit code.. Per user-direction, the job has been aborted. >>>> ------------------------------------------------------- >>>> ------------------------------------------------------------ >>>> -------------- >>>> mpirun detected that one or more processes exited with non-zero status, >>>> thus causing >>>> the job to be terminated. The first process to do so was: >>>> >>>> Process name: [[25599,1],1] >>>> Exit code: 1 >>>> ------------------------------------------------------------ >>>> -------------- >>>> >>>> -Paul >>>> >>>> -- >>>> Paul H. Hargrove phhargr...@lbl.gov >>>> Computer Languages & Systems Software (CLaSS) Group >>>> Computer Science Department Tel: +1-510-495-2352 >>>> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 >>>> >>> _______________________________________________ >>> devel mailing list >>> devel@lists.open-mpi.org >>> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel >>> >>> >>> _______________________________________________ >>> devel mailing list >>> devel@lists.open-mpi.org >>> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel >>> >>> >>> >>> >>> >>> _______________________________________________ >>> devel mailing >>> listde...@lists.open-mpi.orghttps://rfd.newmexicoconsortium.org/mailman/listinfo/devel >>> >>> >>> >>> _______________________________________________ >>> devel mailing list >>> devel@lists.open-mpi.org >>> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel >>> >> >> >> >> -- >> Paul H. Hargrove phhargr...@lbl.gov >> Computer Languages & Systems Software (CLaSS) Group >> Computer Science Department Tel: +1-510-495-2352 >> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 >> _______________________________________________ >> devel mailing list >> devel@lists.open-mpi.org >> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel >> >> >> >> _______________________________________________ >> devel mailing list >> devel@lists.open-mpi.org >> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel >> > > > > -- > Paul H. Hargrove phhargr...@lbl.gov > Computer Languages & Systems Software (CLaSS) Group > Computer Science Department Tel: +1-510-495-2352 > Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 > -- Paul H. Hargrove phhargr...@lbl.gov Computer Languages & Systems Software (CLaSS) Group Computer Science Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
_______________________________________________ devel mailing list devel@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/devel