Thanks Paul !
yes, this snapshot does include the patch i posted earlier.
btw, the issue was a runtime error, not a build error.
Cheers,
Gilles
On 8/25/2016 12:00 PM, Paul Hargrove wrote:
Giles,
I have successfully built openmpi-v2.0.0-227-g917d293 (tonight's
nightly tarball) on Solaris 11.3 with both the Gnu and Studio
compilers. Based on Ralph's previous email, I assume that included
the patch you had directed me to (though I did not attempt to verify
that myself).
-Paul
On Wed, Aug 24, 2016 at 10:44 AM, Paul Hargrove <phhargr...@lbl.gov
<mailto:phhargr...@lbl.gov>> wrote:
Ralph,
That will allow me to test much sooner.
-Paul
On Wed, Aug 24, 2016 at 10:41 AM, r...@open-mpi.org
<mailto:r...@open-mpi.org> <r...@open-mpi.org
<mailto:r...@open-mpi.org>> wrote:
When you do, that PR has already been committed, so you can
just pull the next nightly 2.x tarball and test from there
On Aug 24, 2016, at 10:39 AM, Paul Hargrove
<phhargr...@lbl.gov <mailto:phhargr...@lbl.gov>> wrote:
I am afraid it might take a day or two before I can get to
testing that patch.
-Paul
On Tue, Aug 23, 2016 at 10:16 PM, Gilles Gouaillardet
<gil...@rist.or.jp <mailto:gil...@rist.or.jp>> wrote:
Paul,
you can download a patch at
https://patch-diff.githubusercontent.com/raw/open-mpi/ompi-release/pull/1336.patch
<https://patch-diff.githubusercontent.com/raw/open-mpi/ompi-release/pull/1336.patch>
(note you need recent autotools in order to use it)
Cheers,
Gilles
On 8/23/2016 10:40 PM, r...@open-mpi.org
<mailto:r...@open-mpi.org> wrote:
Looks like Solaris has a “getupeercred” - can you take a
look at it, Gilles? We’d have to add that to our
AC_CHECK_FUNCS and update the native sec component.
On Aug 23, 2016, at 6:32 AM, r...@open-mpi.org
<mailto:r...@open-mpi.org> wrote:
I took a quick glance at this one, and the only way I
can see to get that error is from this block of code:
#if defined(HAVE_STRUCT_UCRED_UID)
euid = ucred.uid;
gid = ucred.gid;
#else
euid = ucred.cr_uid;
gid = ucred.cr_gid;
#endif
#elif defined(HAVE_GETPEEREID)
pmix_output_verbose(2, pmix_globals.debug_output,
"sec:native checking getpeereid for peer credentials");
if (0 != getpeereid(peer->sd, &euid, &gid)) {
pmix_output_verbose(2, pmix_globals.debug_output,
"sec: getsockopt getpeereid failed: %s",
strerror (pmix_socket_errno));
return PMIX_ERR_INVALID_CRED;
}
#else
return PMIX_ERR_NOT_SUPPORTED;
#endif
I can only surmise, therefore, that Solaris doesn’t
pass either of the two #if define’d tests. Is there a
Solaris alternative?
On Aug 23, 2016, at 5:55 AM, r...@open-mpi.org
<mailto:r...@open-mpi.org> wrote:
Thanks Gilles!
On Aug 23, 2016, at 3:42 AM, Gilles Gouaillardet
<gilles.gouaillar...@gmail.com
<mailto:gilles.gouaillar...@gmail.com>> wrote:
Thanks Paul,
at first glance, something is going wrong in the sec
module under solaris.
I will keep digging tomorrow
Cheers,
Gilles
On Tuesday, August 23, 2016, Paul Hargrove
<phhargr...@lbl.gov <mailto:phhargr...@lbl.gov>> wrote:
On Solaris 11.3 on x86-64:
$ mpirun -mca btl sm,self,openib -np 2 -host
pcp-d-3,pcp-d-4 examples/ring_c'
[pcp-d-4:25075] PMIX ERROR: NOT-SUPPORTED in file
/shared/OMPI/openmpi-2.0.1rc1-solaris11-x86-ib-gcc/openmpi-2.0.1rc1/opal/mca/pmix/pmix112/pmix/src/server/pmix_server_listener.c
at line 529
[pcp-d-4:25078] PMIX ERROR: UNREACHABLE in file
/shared/OMPI/openmpi-2.0.1rc1-solaris11-x86-ib-gcc/openmpi-2.0.1rc1/opal/mca/pmix/pmix112/pmix/src/client/pmix_client.c
at line 983
[pcp-d-4:25078] PMIX ERROR: UNREACHABLE in file
/shared/OMPI/openmpi-2.0.1rc1-solaris11-x86-ib-gcc/openmpi-2.0.1rc1/opal/mca/pmix/pmix112/pmix/src/client/pmix_client.c
at line 199
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason;
your parallel process is
likely to abort. There are many reasons that a
parallel process can
fail during MPI_INIT; some of which are due to
configuration or environment
problems. This failure appears to be an internal
failure; here's some
additional information (which may only be
relevant to an Open MPI
developer):
ompi_mpi_init: ompi_rte_init failed
--> Returned "(null)" (-43) instead of "Success" (0)
--------------------------------------------------------------------------
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this
communicator will now abort,
*** and potentially your MPI job)
[pcp-d-4:25078] Local abort before MPI_INIT
completed completed successfully, but am not able
to aggregate error messages, and not able to
guarantee that all other processes were killed!
-------------------------------------------------------
Primary job terminated normally, but 1 process
returned
a non-zero exit code.. Per user-direction, the
job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited
with non-zero status, thus causing
the job to be terminated. The first process to do
so was:
Process name: [[25599,1],1]
Exit code: 1
--------------------------------------------------------------------------
-Paul
--
Paul H. Hargrove phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department Tel:
+1-510-495-2352 <tel:%2B1-510-495-2352>
Lawrence Berkeley National Laboratory Fax:
+1-510-486-6900 <tel:%2B1-510-486-6900>
_______________________________________________
devel mailing list
devel@lists.open-mpi.org
<mailto:devel@lists.open-mpi.org>
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
<https://rfd.newmexicoconsortium.org/mailman/listinfo/devel>
_______________________________________________
devel mailing list
devel@lists.open-mpi.org <mailto:devel@lists.open-mpi.org>
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
<https://rfd.newmexicoconsortium.org/mailman/listinfo/devel>
_______________________________________________
devel mailing list
devel@lists.open-mpi.org <mailto:devel@lists.open-mpi.org>
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
<https://rfd.newmexicoconsortium.org/mailman/listinfo/devel>
_______________________________________________ devel
mailing list devel@lists.open-mpi.org
<mailto:devel@lists.open-mpi.org>
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
<https://rfd.newmexicoconsortium.org/mailman/listinfo/devel>
--
Paul H. Hargrove phhargr...@lbl.gov <mailto:phhargr...@lbl.gov>
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department Tel:
+1-510-495-2352 <tel:%2B1-510-495-2352>
Lawrence Berkeley National Laboratory Fax:
+1-510-486-6900 <tel:%2B1-510-486-6900>
_______________________________________________ devel mailing
list devel@lists.open-mpi.org
<mailto:devel@lists.open-mpi.org>
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
<https://rfd.newmexicoconsortium.org/mailman/listinfo/devel>
_______________________________________________ devel mailing
list devel@lists.open-mpi.org
<mailto:devel@lists.open-mpi.org>
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
<https://rfd.newmexicoconsortium.org/mailman/listinfo/devel>
--
Paul H. Hargrove phhargr...@lbl.gov <mailto:phhargr...@lbl.gov>
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department Tel: +1-510-495-2352
<tel:%2B1-510-495-2352>
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
<tel:%2B1-510-486-6900>
--
Paul H. Hargrove phhargr...@lbl.gov <mailto:phhargr...@lbl.gov>
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
_______________________________________________
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
_______________________________________________
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel