Paul - can you please confirm that you gave mpirun a level of 10 for the 
pmix_base_verbose param? This output isn’t what I would have expected from that 
level - it looks more like the verbosity was set to 5, and so the error number 
isn’t printed.

Thanks
Ralph


> On Sep 20, 2015, at 3:42 AM, Gilles Gouaillardet 
> <gilles.gouaillar...@gmail.com> wrote:
> 
> Paul,
> 
> I do not remember it like that ...
> 
> at that time, the issue in ompi was that the global errno was uses instead of 
> the per thread errno.
> though the man pages tells -mt should be used fir multithreaded apps, you 
> tried -D_REENTRANT on all your platforms, and it was enough to get the 
> expected result.
> 
> I just wanted to check pmix1xx (sub)configure did correctly pass the 
> -D_REENTRANT flag, and it does. so this is very likely a new and unrelated 
> error
> 
> Cheers,
> 
> Gilles
> 
> On Sunday, September 20, 2015, Paul Hargrove <phhargr...@lbl.gov 
> <mailto:phhargr...@lbl.gov>> wrote:
> Gilles,
> 
> Yes every $CC invocation in opal/mca/pmix/pmix1xx includes "-D_REENTRANT".
> However, they don't include "-mt".
> I believe we concluded (when we had problems previously) that "-mt" was the 
> proper flag (at compile and link) for multi-threaded with the Studio 
> compilers.
> 
> -Paul
> 
> On Sat, Sep 19, 2015 at 11:29 PM, Gilles Gouaillardet 
> <gilles.gouaillar...@gmail.com <>> wrote:
> Paul,
> 
> Can you please double check pmix1xx is compiled with -D_REENTRANT ?
> We ran into similar issues in the past, and they only occurred with Solaris 
> 
> Cheers,
> 
> Gilles
> 
> 
> On Sunday, September 20, 2015, Paul Hargrove <phhargr...@lbl.gov <>> wrote:
> Ralph,
> The output from the requested run is attached.
> -Paul
> 
> On Sat, Sep 19, 2015 at 9:46 PM, Ralph Castain <r...@open-mpi.org <>> wrote:
> Ah, okay - that makes more sense. I’ll have to let Brice see if he can figure 
> out how to silence the hwloc error message as I can’t find where it came 
> from. The other errors are real and are the reason why the job was terminated.
> 
> The problem is that we are trying to establish a communication between the 
> app and the daemon via unix domain socket, and we failed to do so. The error 
> tells me that we were able to create and connect to the socket, but failed 
> when the daemon tried to do a blocking send to the app.
> 
> Can you rerun it with -mca pmix_base_verbose 10? It will tell us the value of 
> the error number that was returned
> 
> Thanks
> Ralph
> 
> 
>> On Sep 19, 2015, at 9:37 PM, Paul Hargrove <phhargr...@lbl.gov <>> wrote:
>> 
>> Ralph,
>> 
>> No it did not run.
>> The complete output (which I really should have included in the first place) 
>> is below.
>> 
>> -Paul
>> 
>> $ mpirun -mca btl sm,self -np 2 examples/ring_c'
>> Error opening /devices/pci@0,0:reg: Permission denied
>> [pcp-d-3:26054] PMIX ERROR: ERROR in file 
>> /export/home/phargrov/OMPI/openmpi-master-solaris11-x64-ss12u3/openmpi-dev-2559-g567c9e3/opal/mca/pmix/pmix1xx/pmix/src/client/pmix_client.c
>>  at line 181
>> [pcp-d-3:26053] PMIX ERROR: UNREACHABLE in file 
>> /export/home/phargrov/OMPI/openmpi-master-solaris11-x64-ss12u3/openmpi-dev-2559-g567c9e3/opal/mca/pmix/pmix1xx/pmix/src/server/pmix_server_listener.c
>>  at line 463
>> --------------------------------------------------------------------------
>> It looks like MPI_INIT failed for some reason; your parallel process is
>> likely to abort.  There are many reasons that a parallel process can
>> fail during MPI_INIT; some of which are due to configuration or environment
>> problems.  This failure appears to be an internal failure; here's some
>> additional information (which may only be relevant to an Open MPI
>> developer):
>> 
>>   ompi_mpi_init: ompi_rte_init failed
>>   --> Returned "(null)" (-43) instead of "Success" (0)
>> --------------------------------------------------------------------------
>> *** An error occurred in MPI_Init
>> *** on a NULL communicator
>> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
>> ***    and potentially your MPI job)
>> [pcp-d-3:26054] Local abort before MPI_INIT completed completed 
>> successfully, but am not able to aggregate error messages, and not able to 
>> guarantee that all other processes were killed!
>> -------------------------------------------------------
>> Primary job  terminated normally, but 1 process returned
>> a non-zero exit code.. Per user-direction, the job has been aborted.
>> -------------------------------------------------------
>> --------------------------------------------------------------------------
>> mpirun detected that one or more processes exited with non-zero status, thus 
>> causing
>> the job to be terminated. The first process to do so was:
>> 
>>   Process name: [[11371,1],0]
>>   Exit code:    1
>> --------------------------------------------------------------------------
>> 
>> On Sat, Sep 19, 2015 at 8:50 PM, Ralph Castain <r...@open-mpi.org <>> wrote:
>> Paul, can you clarify something for me? The error in this case indicates 
>> that the client wasn’t able to reach the daemon - this should have resulted 
>> in termination of the job. Did the job actually run?
>> 
>> 
>>> On Sep 18, 2015, at 2:50 AM, Ralph Castain <r...@open-mpi.org <>> wrote:
>>> 
>>> I'm on travel right now, but it should be an easy fix when I return. Sorry 
>>> for the annoyance
>>> 
>>> 
>>> On Thu, Sep 17, 2015 at 11:13 PM, Paul Hargrove <phhargr...@lbl.gov <>> 
>>> wrote:
>>> Any suggestion how I (as a non-root user) can avoid seeing this hwloc error 
>>> message on every run?
>>> 
>>> -Paul
>>> 
>>> On Thu, Sep 17, 2015 at 11:00 PM, Gilles Gouaillardet <gil...@rist.or.jp 
>>> <>> wrote:
>>> Paul,
>>> 
>>> IIRC, the "Permission denied" is coming from hwloc that cannot collect all 
>>> the info it would like.
>>> 
>>> Cheers,
>>> 
>>> Gilles 
>>> 
>>> On 9/18/2015 2:34 PM, Paul Hargrove wrote:
>>>> Tried tonight's master tarball on Solaris 11.2 on x86-64 with the Studio 
>>>> Compilers  (default ILP32 output) and saw the following result
>>>> 
>>>> $ mpirun -mca btl sm,self -np 2 examples/ring_c'
>>>> Error opening /devices/pci@0,0:reg: Permission denied
>>>> [pcp-d-4:00492] PMIX ERROR: ERROR in file 
>>>> /export/home/phargrov/OMPI/openmpi-master-solaris11-x86-ss12u3/openmpi-dev-2559-g567c9e3/opal/mca/pmix/pmix1xx/pmix/src/client/pmix_client.c
>>>>  at line 181
>>>> [pcp-d-4:00491] PMIX ERROR: UNREACHABLE in file 
>>>> /export/home/phargrov/OMPI/openmpi-master-solaris11-x86-ss12u3/openmpi-dev-2559-g567c9e3/opal/mca/pmix/pmix1xx/pmix/src/server/pmix_server_listener.c
>>>>  at line 463
>>>> 
>>>> I don't know if the Permission denied error is related to the subsequent 
>>>> PMIX errors, but any message that says "UNREACHABLE" is clearly worth 
>>>> reporting.
>>>> 
>>>> -Paul
>>>> 
>>>> -- 
>>>> Paul H. Hargrove                           <>phhargr...@lbl.gov <>
>>>> Computer Languages & Systems Software (CLaSS) Group
>>>> Computer Science Department               Tel: +1-510-495-2352 
>>>> <tel:%2B1-510-495-2352>
>>>> Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900 
>>>> <tel:%2B1-510-486-6900>
>>>> 
>>>> _______________________________________________
>>>> devel mailing list
>>>> de...@open-mpi.org <>
>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel 
>>>> <http://www.open-mpi.org/mailman/listinfo.cgi/devel>
>>>> Link to this post: 
>>>> http://www.open-mpi.org/community/lists/devel/2015/09/18074.php 
>>>> <http://www.open-mpi.org/community/lists/devel/2015/09/18074.php>
>>> 
>>> _______________________________________________
>>> devel mailing list
>>> de...@open-mpi.org <>
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel 
>>> <http://www.open-mpi.org/mailman/listinfo.cgi/devel>
>>> Link to this post: 
>>> http://www.open-mpi.org/community/lists/devel/2015/09/18075.php 
>>> <http://www.open-mpi.org/community/lists/devel/2015/09/18075.php>
>>> 
>>> 
>>> 
>>> -- 
>>> Paul H. Hargrove                          phhargr...@lbl.gov <>
>>> Computer Languages & Systems Software (CLaSS) Group
>>> Computer Science Department               Tel: +1-510-495-2352 
>>> <tel:%2B1-510-495-2352>
>>> Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900 
>>> <tel:%2B1-510-486-6900>
>>> _______________________________________________
>>> devel mailing list
>>> de...@open-mpi.org <>
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel 
>>> <http://www.open-mpi.org/mailman/listinfo.cgi/devel>
>>> Link to this post: 
>>> http://www.open-mpi.org/community/lists/devel/2015/09/18076.php 
>>> <http://www.open-mpi.org/community/lists/devel/2015/09/18076.php>
>>> 
>> 
>> 
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org <>
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel 
>> <http://www.open-mpi.org/mailman/listinfo.cgi/devel>
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2015/09/18078.php 
>> <http://www.open-mpi.org/community/lists/devel/2015/09/18078.php>
>> 
>> 
>> 
>> -- 
>> Paul H. Hargrove                          phhargr...@lbl.gov <>
>> Computer Languages & Systems Software (CLaSS) Group
>> Computer Science Department               Tel: +1-510-495-2352 
>> <tel:%2B1-510-495-2352>
>> Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900 
>> <tel:%2B1-510-486-6900>_______________________________________________
>> devel mailing list
>> de...@open-mpi.org <>
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel 
>> <http://www.open-mpi.org/mailman/listinfo.cgi/devel>
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2015/09/18080.php 
>> <http://www.open-mpi.org/community/lists/devel/2015/09/18080.php>
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org <>
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel 
> <http://www.open-mpi.org/mailman/listinfo.cgi/devel>
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2015/09/18081.php 
> <http://www.open-mpi.org/community/lists/devel/2015/09/18081.php>
> 
> 
> 
> -- 
> Paul H. Hargrove                          phhargr...@lbl.gov <>
> Computer Languages & Systems Software (CLaSS) Group
> Computer Science Department               Tel: +1-510-495-2352 
> <tel:%2B1-510-495-2352>
> Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900 
> <tel:%2B1-510-486-6900>
> _______________________________________________
> devel mailing list
> de...@open-mpi.org <>
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel 
> <http://www.open-mpi.org/mailman/listinfo.cgi/devel>
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2015/09/18083.php 
> <http://www.open-mpi.org/community/lists/devel/2015/09/18083.php>
> 
> 
> 
> -- 
> Paul H. Hargrove                          phhargr...@lbl.gov <>
> Computer Languages & Systems Software (CLaSS) Group
> Computer Science Department               Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2015/09/18085.php 
> <http://www.open-mpi.org/community/lists/devel/2015/09/18085.php>

Reply via email to