Paul,

Can you please double check pmix1xx is compiled with -D_REENTRANT ?
We ran into similar issues in the past, and they only occurred with Solaris

Cheers,

Gilles

On Sunday, September 20, 2015, Paul Hargrove <phhargr...@lbl.gov> wrote:

> Ralph,
> The output from the requested run is attached.
> -Paul
>
> On Sat, Sep 19, 2015 at 9:46 PM, Ralph Castain <r...@open-mpi.org
> <javascript:_e(%7B%7D,'cvml','r...@open-mpi.org');>> wrote:
>
>> Ah, okay - that makes more sense. I’ll have to let Brice see if he can
>> figure out how to silence the hwloc error message as I can’t find where it
>> came from. The other errors are real and are the reason why the job was
>> terminated.
>>
>> The problem is that we are trying to establish a communication between
>> the app and the daemon via unix domain socket, and we failed to do so. The
>> error tells me that we were able to create and connect to the socket, but
>> failed when the daemon tried to do a blocking send to the app.
>>
>> Can you rerun it with -mca pmix_base_verbose 10? It will tell us the
>> value of the error number that was returned
>>
>> Thanks
>> Ralph
>>
>>
>> On Sep 19, 2015, at 9:37 PM, Paul Hargrove <phhargr...@lbl.gov
>> <javascript:_e(%7B%7D,'cvml','phhargr...@lbl.gov');>> wrote:
>>
>> Ralph,
>>
>> No it did not run.
>> The complete output (which I really should have included in the first
>> place) is below.
>>
>> -Paul
>>
>> $ mpirun -mca btl sm,self -np 2 examples/ring_c'
>> Error opening /devices/pci@0,0:reg: Permission denied
>> [pcp-d-3:26054] PMIX ERROR: ERROR in file
>> /export/home/phargrov/OMPI/openmpi-master-solaris11-x64-ss12u3/openmpi-dev-2559-g567c9e3/opal/mca/pmix/pmix1xx/pmix/src/client/pmix_client.c
>> at line 181
>> [pcp-d-3:26053] PMIX ERROR: UNREACHABLE in file
>> /export/home/phargrov/OMPI/openmpi-master-solaris11-x64-ss12u3/openmpi-dev-2559-g567c9e3/opal/mca/pmix/pmix1xx/pmix/src/server/pmix_server_listener.c
>> at line 463
>> --------------------------------------------------------------------------
>> It looks like MPI_INIT failed for some reason; your parallel process is
>> likely to abort.  There are many reasons that a parallel process can
>> fail during MPI_INIT; some of which are due to configuration or
>> environment
>> problems.  This failure appears to be an internal failure; here's some
>> additional information (which may only be relevant to an Open MPI
>> developer):
>>
>>   ompi_mpi_init: ompi_rte_init failed
>>   --> Returned "(null)" (-43) instead of "Success" (0)
>> --------------------------------------------------------------------------
>> *** An error occurred in MPI_Init
>> *** on a NULL communicator
>> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
>> ***    and potentially your MPI job)
>> [pcp-d-3:26054] Local abort before MPI_INIT completed completed
>> successfully, but am not able to aggregate error messages, and not able to
>> guarantee that all other processes were killed!
>> -------------------------------------------------------
>> Primary job  terminated normally, but 1 process returned
>> a non-zero exit code.. Per user-direction, the job has been aborted.
>> -------------------------------------------------------
>> --------------------------------------------------------------------------
>> mpirun detected that one or more processes exited with non-zero status,
>> thus causing
>> the job to be terminated. The first process to do so was:
>>
>>   Process name: [[11371,1],0]
>>   Exit code:    1
>> --------------------------------------------------------------------------
>>
>> On Sat, Sep 19, 2015 at 8:50 PM, Ralph Castain <r...@open-mpi.org
>> <javascript:_e(%7B%7D,'cvml','r...@open-mpi.org');>> wrote:
>>
>>> Paul, can you clarify something for me? The error in this case indicates
>>> that the client wasn’t able to reach the daemon - this should have resulted
>>> in termination of the job. Did the job actually run?
>>>
>>>
>>> On Sep 18, 2015, at 2:50 AM, Ralph Castain <r...@open-mpi.org
>>> <javascript:_e(%7B%7D,'cvml','r...@open-mpi.org');>> wrote:
>>>
>>> I'm on travel right now, but it should be an easy fix when I return.
>>> Sorry for the annoyance
>>>
>>>
>>> On Thu, Sep 17, 2015 at 11:13 PM, Paul Hargrove <phhargr...@lbl.gov
>>> <javascript:_e(%7B%7D,'cvml','phhargr...@lbl.gov');>> wrote:
>>>
>>>> Any suggestion how I (as a non-root user) can avoid seeing this hwloc
>>>> error message on every run?
>>>>
>>>> -Paul
>>>>
>>>> On Thu, Sep 17, 2015 at 11:00 PM, Gilles Gouaillardet <
>>>> gil...@rist.or.jp <javascript:_e(%7B%7D,'cvml','gil...@rist.or.jp');>>
>>>> wrote:
>>>>
>>>>> Paul,
>>>>>
>>>>> IIRC, the "Permission denied" is coming from hwloc that cannot collect
>>>>> all the info it would like.
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Gilles
>>>>>
>>>>> On 9/18/2015 2:34 PM, Paul Hargrove wrote:
>>>>>
>>>>> Tried tonight's master tarball on Solaris 11.2 on x86-64 with the
>>>>> Studio Compilers  (default ILP32 output) and saw the following result
>>>>>
>>>>> $ mpirun -mca btl sm,self -np 2 examples/ring_c'
>>>>> Error opening /devices/pci@0,0:reg: Permission denied
>>>>> [pcp-d-4:00492] PMIX ERROR: ERROR in file
>>>>> /export/home/phargrov/OMPI/openmpi-master-solaris11-x86-ss12u3/openmpi-dev-2559-g567c9e3/opal/mca/pmix/pmix1xx/pmix/src/client/pmix_client.c
>>>>> at line 181
>>>>> [pcp-d-4:00491] PMIX ERROR: UNREACHABLE in file
>>>>> /export/home/phargrov/OMPI/openmpi-master-solaris11-x86-ss12u3/openmpi-dev-2559-g567c9e3/opal/mca/pmix/pmix1xx/pmix/src/server/pmix_server_listener.c
>>>>> at line 463
>>>>>
>>>>> I don't know if the Permission denied error is related to the
>>>>> subsequent PMIX errors, but any message that says "UNREACHABLE" is clearly
>>>>> worth reporting.
>>>>>
>>>>> -Paul
>>>>>
>>>>> --
>>>>> Paul H. Hargrove
>>>>> <javascript:_e(%7B%7D,'cvml','phhargr...@lbl.gov');>phhargr...@lbl.gov
>>>>> <javascript:_e(%7B%7D,'cvml','phhargr...@lbl.gov');>
>>>>> Computer Languages & Systems Software (CLaSS) Group
>>>>> Computer Science Department               Tel: +1-510-495-2352
>>>>> Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> devel mailing listde...@open-mpi.org 
>>>>> <javascript:_e(%7B%7D,'cvml','de...@open-mpi.org');>
>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>> Link to this post: 
>>>>> http://www.open-mpi.org/community/lists/devel/2015/09/18074.php
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> de...@open-mpi.org
>>>>> <javascript:_e(%7B%7D,'cvml','de...@open-mpi.org');>
>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>> Link to this post:
>>>>> http://www.open-mpi.org/community/lists/devel/2015/09/18075.php
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Paul H. Hargrove                          phhargr...@lbl.gov
>>>> <javascript:_e(%7B%7D,'cvml','phhargr...@lbl.gov');>
>>>> Computer Languages & Systems Software (CLaSS) Group
>>>> Computer Science Department               Tel: +1-510-495-2352
>>>> Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900
>>>>
>>>> _______________________________________________
>>>> devel mailing list
>>>> de...@open-mpi.org <javascript:_e(%7B%7D,'cvml','de...@open-mpi.org');>
>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>> Link to this post:
>>>> http://www.open-mpi.org/community/lists/devel/2015/09/18076.php
>>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> devel mailing list
>>> de...@open-mpi.org <javascript:_e(%7B%7D,'cvml','de...@open-mpi.org');>
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> Link to this post:
>>> http://www.open-mpi.org/community/lists/devel/2015/09/18078.php
>>>
>>
>>
>>
>> --
>> Paul H. Hargrove                          phhargr...@lbl.gov
>> <javascript:_e(%7B%7D,'cvml','phhargr...@lbl.gov');>
>> Computer Languages & Systems Software (CLaSS) Group
>> Computer Science Department               Tel: +1-510-495-2352
>> Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org <javascript:_e(%7B%7D,'cvml','de...@open-mpi.org');>
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2015/09/18080.php
>>
>>
>>
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org <javascript:_e(%7B%7D,'cvml','de...@open-mpi.org');>
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2015/09/18081.php
>>
>
>
>
> --
> Paul H. Hargrove                          phhargr...@lbl.gov
> <javascript:_e(%7B%7D,'cvml','phhargr...@lbl.gov');>
> Computer Languages & Systems Software (CLaSS) Group
> Computer Science Department               Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900
>

Reply via email to