Thanks Paul,

at first glance, something is going wrong in the sec module under solaris.
I will keep digging tomorrow

Cheers,

Gilles

On Tuesday, August 23, 2016, Paul Hargrove <phhargr...@lbl.gov> wrote:

> On Solaris 11.3 on x86-64:
>
> $ mpirun -mca btl sm,self,openib -np 2 -host pcp-d-3,pcp-d-4
> examples/ring_c'
> [pcp-d-4:25075] PMIX ERROR: NOT-SUPPORTED in file
> /shared/OMPI/openmpi-2.0.1rc1-solaris11-x86-ib-gcc/openmpi-
> 2.0.1rc1/opal/mca/pmix/pmix112/pmix/src/server/pmix_server_listener.c at
> line 529
> [pcp-d-4:25078] PMIX ERROR: UNREACHABLE in file
> /shared/OMPI/openmpi-2.0.1rc1-solaris11-x86-ib-gcc/openmpi-
> 2.0.1rc1/opal/mca/pmix/pmix112/pmix/src/client/pmix_client.c at line 983
> [pcp-d-4:25078] PMIX ERROR: UNREACHABLE in file
> /shared/OMPI/openmpi-2.0.1rc1-solaris11-x86-ib-gcc/openmpi-
> 2.0.1rc1/opal/mca/pmix/pmix112/pmix/src/client/pmix_client.c at line 199
> --------------------------------------------------------------------------
> It looks like MPI_INIT failed for some reason; your parallel process is
> likely to abort.  There are many reasons that a parallel process can
> fail during MPI_INIT; some of which are due to configuration or environment
> problems.  This failure appears to be an internal failure; here's some
> additional information (which may only be relevant to an Open MPI
> developer):
>
>   ompi_mpi_init: ompi_rte_init failed
>   --> Returned "(null)" (-43) instead of "Success" (0)
> --------------------------------------------------------------------------
> *** An error occurred in MPI_Init
> *** on a NULL communicator
> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
> ***    and potentially your MPI job)
> [pcp-d-4:25078] Local abort before MPI_INIT completed completed
> successfully, but am not able to aggregate error messages, and not able to
> guarantee that all other processes were killed!
> -------------------------------------------------------
> Primary job  terminated normally, but 1 process returned
> a non-zero exit code.. Per user-direction, the job has been aborted.
> -------------------------------------------------------
> --------------------------------------------------------------------------
> mpirun detected that one or more processes exited with non-zero status,
> thus causing
> the job to be terminated. The first process to do so was:
>
>   Process name: [[25599,1],1]
>   Exit code:    1
> --------------------------------------------------------------------------
>
> -Paul
>
> --
> Paul H. Hargrove                          phhargr...@lbl.gov
> <javascript:_e(%7B%7D,'cvml','phhargr...@lbl.gov');>
> Computer Languages & Systems Software (CLaSS) Group
> Computer Science Department               Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900
>
_______________________________________________
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Reply via email to