I am afraid it might take a day or two before I can get to testing that patch.
-Paul On Tue, Aug 23, 2016 at 10:16 PM, Gilles Gouaillardet <gil...@rist.or.jp> wrote: > Paul, > > > you can download a patch at https://patch-diff.githubusercontent.com/raw/ > open-mpi/ompi-release/pull/1336.patch > > (note you need recent autotools in order to use it) > > > Cheers, > > > Gilles > > On 8/23/2016 10:40 PM, r...@open-mpi.org wrote: > > Looks like Solaris has a “getupeercred” - can you take a look at it, > Gilles? We’d have to add that to our AC_CHECK_FUNCS and update the native > sec component. > > > On Aug 23, 2016, at 6:32 AM, r...@open-mpi.org wrote: > > I took a quick glance at this one, and the only way I can see to get that > error is from this block of code: > > #if defined(HAVE_STRUCT_UCRED_UID) > euid = ucred.uid; > gid = ucred.gid; > #else > euid = ucred.cr_uid; > gid = ucred.cr_gid; > #endif > > #elif defined(HAVE_GETPEEREID) > pmix_output_verbose(2, pmix_globals.debug_output, > "sec:native checking getpeereid for peer > credentials"); > if (0 != getpeereid(peer->sd, &euid, &gid)) { > pmix_output_verbose(2, pmix_globals.debug_output, > "sec: getsockopt getpeereid failed: %s", > strerror (pmix_socket_errno)); > return PMIX_ERR_INVALID_CRED; > } > #else > return PMIX_ERR_NOT_SUPPORTED; > #endif > > > I can only surmise, therefore, that Solaris doesn’t pass either of the two > #if define’d tests. Is there a Solaris alternative? > > > On Aug 23, 2016, at 5:55 AM, r...@open-mpi.org wrote: > > Thanks Gilles! > > On Aug 23, 2016, at 3:42 AM, Gilles Gouaillardet < > gilles.gouaillar...@gmail.com> wrote: > > Thanks Paul, > > at first glance, something is going wrong in the sec module under solaris. > I will keep digging tomorrow > > Cheers, > > Gilles > > On Tuesday, August 23, 2016, Paul Hargrove <phhargr...@lbl.gov> wrote: > >> On Solaris 11.3 on x86-64: >> >> $ mpirun -mca btl sm,self,openib -np 2 -host pcp-d-3,pcp-d-4 >> examples/ring_c' >> [pcp-d-4:25075] PMIX ERROR: NOT-SUPPORTED in file >> /shared/OMPI/openmpi-2.0.1rc1-solaris11-x86-ib-gcc/openmpi-2 >> .0.1rc1/opal/mca/pmix/pmix112/pmix/src/server/pmix_server_listener.c at >> line 529 >> [pcp-d-4:25078] PMIX ERROR: UNREACHABLE in file >> /shared/OMPI/openmpi-2.0.1rc1-solaris11-x86-ib-gcc/openmpi-2 >> .0.1rc1/opal/mca/pmix/pmix112/pmix/src/client/pmix_client.c at line 983 >> [pcp-d-4:25078] PMIX ERROR: UNREACHABLE in file >> /shared/OMPI/openmpi-2.0.1rc1-solaris11-x86-ib-gcc/openmpi-2 >> .0.1rc1/opal/mca/pmix/pmix112/pmix/src/client/pmix_client.c at line 199 >> ------------------------------------------------------------ >> -------------- >> It looks like MPI_INIT failed for some reason; your parallel process is >> likely to abort. There are many reasons that a parallel process can >> fail during MPI_INIT; some of which are due to configuration or >> environment >> problems. This failure appears to be an internal failure; here's some >> additional information (which may only be relevant to an Open MPI >> developer): >> >> ompi_mpi_init: ompi_rte_init failed >> --> Returned "(null)" (-43) instead of "Success" (0) >> ------------------------------------------------------------ >> -------------- >> *** An error occurred in MPI_Init >> *** on a NULL communicator >> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, >> *** and potentially your MPI job) >> [pcp-d-4:25078] Local abort before MPI_INIT completed completed >> successfully, but am not able to aggregate error messages, and not able to >> guarantee that all other processes were killed! >> ------------------------------------------------------- >> Primary job terminated normally, but 1 process returned >> a non-zero exit code.. Per user-direction, the job has been aborted. >> ------------------------------------------------------- >> ------------------------------------------------------------ >> -------------- >> mpirun detected that one or more processes exited with non-zero status, >> thus causing >> the job to be terminated. The first process to do so was: >> >> Process name: [[25599,1],1] >> Exit code: 1 >> ------------------------------------------------------------ >> -------------- >> >> -Paul >> >> -- >> Paul H. Hargrove phhargr...@lbl.gov >> Computer Languages & Systems Software (CLaSS) Group >> Computer Science Department Tel: +1-510-495-2352 >> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 >> > _______________________________________________ > devel mailing list > devel@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/devel > > > _______________________________________________ > devel mailing list > devel@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/devel > > > > > > _______________________________________________ > devel mailing > listde...@lists.open-mpi.orghttps://rfd.newmexicoconsortium.org/mailman/listinfo/devel > > > > _______________________________________________ > devel mailing list > devel@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/devel > -- Paul H. Hargrove phhargr...@lbl.gov Computer Languages & Systems Software (CLaSS) Group Computer Science Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
_______________________________________________ devel mailing list devel@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/devel