Yes, it is definitely at 10.
Another attempt is attached.
-Paul

On Sun, Sep 20, 2015 at 8:19 AM, Ralph Castain <r...@open-mpi.org> wrote:

> Paul - can you please confirm that you gave mpirun a level of 10 for the
> pmix_base_verbose param? This output isn’t what I would have expected from
> that level - it looks more like the verbosity was set to 5, and so the
> error number isn’t printed.
>
> Thanks
> Ralph
>
>
> On Sep 20, 2015, at 3:42 AM, Gilles Gouaillardet <
> gilles.gouaillar...@gmail.com> wrote:
>
> Paul,
>
> I do not remember it like that ...
>
> at that time, the issue in ompi was that the global errno was uses instead
> of the per thread errno.
> though the man pages tells -mt should be used fir multithreaded apps, you
> tried -D_REENTRANT on all your platforms, and it was enough to get the
> expected result.
>
> I just wanted to check pmix1xx (sub)configure did correctly pass the
> -D_REENTRANT flag, and it does. so this is very likely a new and unrelated
> error
>
> Cheers,
>
> Gilles
>
> On Sunday, September 20, 2015, Paul Hargrove <phhargr...@lbl.gov> wrote:
>
>> Gilles,
>>
>> Yes every $CC invocation in opal/mca/pmix/pmix1xx includes "-D_REENTRANT".
>> However, they don't include "-mt".
>> I believe we concluded (when we had problems previously) that "-mt" was
>> the proper flag (at compile and link) for multi-threaded with the Studio
>> compilers.
>>
>> -Paul
>>
>> On Sat, Sep 19, 2015 at 11:29 PM, Gilles Gouaillardet <
>> gilles.gouaillar...@gmail.com> wrote:
>>
>>> Paul,
>>>
>>> Can you please double check pmix1xx is compiled with -D_REENTRANT ?
>>> We ran into similar issues in the past, and they only occurred with
>>> Solaris
>>>
>>> Cheers,
>>>
>>> Gilles
>>>
>>>
>>> On Sunday, September 20, 2015, Paul Hargrove <phhargr...@lbl.gov> wrote:
>>>
>>>> Ralph,
>>>> The output from the requested run is attached.
>>>> -Paul
>>>>
>>>> On Sat, Sep 19, 2015 at 9:46 PM, Ralph Castain <r...@open-mpi.org>
>>>> wrote:
>>>>
>>>>> Ah, okay - that makes more sense. I’ll have to let Brice see if he can
>>>>> figure out how to silence the hwloc error message as I can’t find where it
>>>>> came from. The other errors are real and are the reason why the job was
>>>>> terminated.
>>>>>
>>>>> The problem is that we are trying to establish a communication between
>>>>> the app and the daemon via unix domain socket, and we failed to do so. The
>>>>> error tells me that we were able to create and connect to the socket, but
>>>>> failed when the daemon tried to do a blocking send to the app.
>>>>>
>>>>> Can you rerun it with -mca pmix_base_verbose 10? It will tell us the
>>>>> value of the error number that was returned
>>>>>
>>>>> Thanks
>>>>> Ralph
>>>>>
>>>>>
>>>>> On Sep 19, 2015, at 9:37 PM, Paul Hargrove <phhargr...@lbl.gov> wrote:
>>>>>
>>>>> Ralph,
>>>>>
>>>>> No it did not run.
>>>>> The complete output (which I really should have included in the first
>>>>> place) is below.
>>>>>
>>>>> -Paul
>>>>>
>>>>> $ mpirun -mca btl sm,self -np 2 examples/ring_c'
>>>>> Error opening /devices/pci@0,0:reg: Permission denied
>>>>> [pcp-d-3:26054] PMIX ERROR: ERROR in file
>>>>> /export/home/phargrov/OMPI/openmpi-master-solaris11-x64-ss12u3/openmpi-dev-2559-g567c9e3/opal/mca/pmix/pmix1xx/pmix/src/client/pmix_client.c
>>>>> at line 181
>>>>> [pcp-d-3:26053] PMIX ERROR: UNREACHABLE in file
>>>>> /export/home/phargrov/OMPI/openmpi-master-solaris11-x64-ss12u3/openmpi-dev-2559-g567c9e3/opal/mca/pmix/pmix1xx/pmix/src/server/pmix_server_listener.c
>>>>> at line 463
>>>>>
>>>>> --------------------------------------------------------------------------
>>>>> It looks like MPI_INIT failed for some reason; your parallel process is
>>>>> likely to abort.  There are many reasons that a parallel process can
>>>>> fail during MPI_INIT; some of which are due to configuration or
>>>>> environment
>>>>> problems.  This failure appears to be an internal failure; here's some
>>>>> additional information (which may only be relevant to an Open MPI
>>>>> developer):
>>>>>
>>>>>   ompi_mpi_init: ompi_rte_init failed
>>>>>   --> Returned "(null)" (-43) instead of "Success" (0)
>>>>>
>>>>> --------------------------------------------------------------------------
>>>>> *** An error occurred in MPI_Init
>>>>> *** on a NULL communicator
>>>>> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now
>>>>> abort,
>>>>> ***    and potentially your MPI job)
>>>>> [pcp-d-3:26054] Local abort before MPI_INIT completed completed
>>>>> successfully, but am not able to aggregate error messages, and not able to
>>>>> guarantee that all other processes were killed!
>>>>> -------------------------------------------------------
>>>>> Primary job  terminated normally, but 1 process returned
>>>>> a non-zero exit code.. Per user-direction, the job has been aborted.
>>>>> -------------------------------------------------------
>>>>>
>>>>> --------------------------------------------------------------------------
>>>>> mpirun detected that one or more processes exited with non-zero
>>>>> status, thus causing
>>>>> the job to be terminated. The first process to do so was:
>>>>>
>>>>>   Process name: [[11371,1],0]
>>>>>   Exit code:    1
>>>>>
>>>>> --------------------------------------------------------------------------
>>>>>
>>>>> On Sat, Sep 19, 2015 at 8:50 PM, Ralph Castain <r...@open-mpi.org>
>>>>> wrote:
>>>>>
>>>>>> Paul, can you clarify something for me? The error in this case
>>>>>> indicates that the client wasn’t able to reach the daemon - this should
>>>>>> have resulted in termination of the job. Did the job actually run?
>>>>>>
>>>>>>
>>>>>> On Sep 18, 2015, at 2:50 AM, Ralph Castain <r...@open-mpi.org> wrote:
>>>>>>
>>>>>> I'm on travel right now, but it should be an easy fix when I return.
>>>>>> Sorry for the annoyance
>>>>>>
>>>>>>
>>>>>> On Thu, Sep 17, 2015 at 11:13 PM, Paul Hargrove <phhargr...@lbl.gov>
>>>>>> wrote:
>>>>>>
>>>>>>> Any suggestion how I (as a non-root user) can avoid seeing this
>>>>>>> hwloc error message on every run?
>>>>>>>
>>>>>>> -Paul
>>>>>>>
>>>>>>> On Thu, Sep 17, 2015 at 11:00 PM, Gilles Gouaillardet <
>>>>>>> gil...@rist.or.jp> wrote:
>>>>>>>
>>>>>>>> Paul,
>>>>>>>>
>>>>>>>> IIRC, the "Permission denied" is coming from hwloc that cannot
>>>>>>>> collect all the info it would like.
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>>
>>>>>>>> Gilles
>>>>>>>>
>>>>>>>> On 9/18/2015 2:34 PM, Paul Hargrove wrote:
>>>>>>>>
>>>>>>>> Tried tonight's master tarball on Solaris 11.2 on x86-64 with the
>>>>>>>> Studio Compilers  (default ILP32 output) and saw the following result
>>>>>>>>
>>>>>>>> $ mpirun -mca btl sm,self -np 2 examples/ring_c'
>>>>>>>> Error opening /devices/pci@0,0:reg: Permission denied
>>>>>>>> [pcp-d-4:00492] PMIX ERROR: ERROR in file
>>>>>>>> /export/home/phargrov/OMPI/openmpi-master-solaris11-x86-ss12u3/openmpi-dev-2559-g567c9e3/opal/mca/pmix/pmix1xx/pmix/src/client/pmix_client.c
>>>>>>>> at line 181
>>>>>>>> [pcp-d-4:00491] PMIX ERROR: UNREACHABLE in file
>>>>>>>> /export/home/phargrov/OMPI/openmpi-master-solaris11-x86-ss12u3/openmpi-dev-2559-g567c9e3/opal/mca/pmix/pmix1xx/pmix/src/server/pmix_server_listener.c
>>>>>>>> at line 463
>>>>>>>>
>>>>>>>> I don't know if the Permission denied error is related to the
>>>>>>>> subsequent PMIX errors, but any message that says "UNREACHABLE" is 
>>>>>>>> clearly
>>>>>>>> worth reporting.
>>>>>>>>
>>>>>>>> -Paul
>>>>>>>>
>>>>>>>> --
>>>>>>>> Paul H. Hargrove                          phhargr...@lbl.gov
>>>>>>>> Computer Languages & Systems Software (CLaSS) Group
>>>>>>>> Computer Science Department               Tel: +1-510-495-2352
>>>>>>>> Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> devel mailing listde...@open-mpi.org
>>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>> Link to this post: 
>>>>>>>> http://www.open-mpi.org/community/lists/devel/2015/09/18074.php
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> devel mailing list
>>>>>>>> de...@open-mpi.org
>>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>> Link to this post:
>>>>>>>> http://www.open-mpi.org/community/lists/devel/2015/09/18075.php
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Paul H. Hargrove                          phhargr...@lbl.gov
>>>>>>> Computer Languages & Systems Software (CLaSS) Group
>>>>>>> Computer Science Department               Tel: +1-510-495-2352
>>>>>>> Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> devel mailing list
>>>>>>> de...@open-mpi.org
>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>> Link to this post:
>>>>>>> http://www.open-mpi.org/community/lists/devel/2015/09/18076.php
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> devel mailing list
>>>>>> de...@open-mpi.org
>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>> Link to this post:
>>>>>> http://www.open-mpi.org/community/lists/devel/2015/09/18078.php
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Paul H. Hargrove                          phhargr...@lbl.gov
>>>>> Computer Languages & Systems Software (CLaSS) Group
>>>>> Computer Science Department               Tel: +1-510-495-2352
>>>>> Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> de...@open-mpi.org
>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>> Link to this post:
>>>>> http://www.open-mpi.org/community/lists/devel/2015/09/18080.php
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> de...@open-mpi.org
>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>> Link to this post:
>>>>> http://www.open-mpi.org/community/lists/devel/2015/09/18081.php
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Paul H. Hargrove                          phhargr...@lbl.gov
>>>> Computer Languages & Systems Software (CLaSS) Group
>>>> Computer Science Department               Tel: +1-510-495-2352
>>>> Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900
>>>>
>>>
>>> _______________________________________________
>>> devel mailing list
>>> de...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> Link to this post:
>>> http://www.open-mpi.org/community/lists/devel/2015/09/18083.php
>>>
>>
>>
>>
>> --
>> Paul H. Hargrove                          phhargr...@lbl.gov
>> Computer Languages & Systems Software (CLaSS) Group
>> Computer Science Department               Tel: +1-510-495-2352
>> Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900
>>
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2015/09/18085.php
>
>
>
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2015/09/18086.php
>



-- 
Paul H. Hargrove                          phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department               Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900

Attachment: typescript
Description: Binary data

Reply via email to