On Sep 20, 2015, at 3:42 AM, Gilles Gouaillardet
<gilles.gouaillar...@gmail.com
<mailto:gilles.gouaillar...@gmail.com>> wrote:
Paul,
I do not remember it like that ...
at that time, the issue in ompi was that the
global errno was uses instead of the per thread
errno.
though the man pages tells -mt should be used fir
multithreaded apps, you tried -D_REENTRANT on all
your platforms, and it was enough to get the
expected result.
I just wanted to check pmix1xx (sub)configure did
correctly pass the -D_REENTRANT flag, and it
does. so this is very likely a new and unrelated
error
Cheers,
Gilles
On Sunday, September 20, 2015, Paul Hargrove
<phhargr...@lbl.gov <mailto:phhargr...@lbl.gov>>
wrote:
Gilles,
Yes every $CC invocation
in opal/mca/pmix/pmix1xx includes "-D_REENTRANT".
However, they don't include "-mt".
I believe we concluded (when we had problems
previously) that "-mt" was the proper flag
(at compile and link) for multi-threaded with
the Studio compilers.
-Paul
On Sat, Sep 19, 2015 at 11:29 PM, Gilles
Gouaillardet<gilles.gouaillar...@gmail.com
<mailto:gilles.gouaillar...@gmail.com>>wrote:
Paul,
Can you please double check pmix1xx is
compiled with -D_REENTRANT ?
We ran into similar issues in the past,
and they only occurred with Solaris
Cheers,
Gilles
On Sunday, September 20, 2015, Paul
Hargrove <phhargr...@lbl.gov
<mailto:phhargr...@lbl.gov>> wrote:
Ralph,
The output from the requested run is
attached.
-Paul
On Sat, Sep 19, 2015 at 9:46 PM,
Ralph Castain<r...@open-mpi.org
<mailto:r...@open-mpi.org>>wrote:
Ah, okay - that makes more sense.
I’ll have to let Brice see if he
can figure out how to silence the
hwloc error message as I can’t
find where it came from. The
other errors are real and are the
reason why the job was terminated.
The problem is that we are trying
to establish a communication
between the app and the daemon
via unix domain socket, and we
failed to do so. The error tells
me that we were able to create
and connect to the socket, but
failed when the daemon tried to
do a blocking send to the app.
Can you rerun it with -mca
pmix_base_verbose 10? It will
tell us the value of the error
number that was returned
Thanks
Ralph
On Sep 19, 2015, at 9:37 PM,
Paul Hargrove
<phhargr...@lbl.gov
<mailto:phhargr...@lbl.gov>> wrote:
Ralph,
No it did not run.
The complete output (which I
really should have included in
the first place) is below.
-Paul
$ mpirun -mca btl sm,self -np 2
examples/ring_c'
Error opening
/devices/pci@0,0:reg: Permission
denied
[pcp-d-3:26054] PMIX ERROR:
ERROR in file
/export/home/phargrov/OMPI/openmpi-master-solaris11-x64-ss12u3/openmpi-dev-2559-g567c9e3/opal/mca/pmix/pmix1xx/pmix/src/client/pmix_client.c
at line 181
[pcp-d-3:26053] PMIX ERROR:
UNREACHABLE in file
/export/home/phargrov/OMPI/openmpi-master-solaris11-x64-ss12u3/openmpi-dev-2559-g567c9e3/opal/mca/pmix/pmix1xx/pmix/src/server/pmix_server_listener.c
at line 463
--------------------------------------------------------------------------
It looks like MPI_INIT failed
for some reason; your parallel
process is
likely to abort. There are many
reasons that a parallel process can
fail during MPI_INIT; some of
which are due to configuration
or environment
problems. This failure appears
to be an internal failure;
here's some
additional information (which
may only be relevant to an Open MPI
developer):
ompi_mpi_init: ompi_rte_init failed
--> Returned "(null)" (-43)
instead of "Success" (0)
--------------------------------------------------------------------------
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL
(processes in this communicator
will now abort,
*** and potentially your MPI job)
[pcp-d-3:26054] Local abort
before MPI_INIT completed
completed successfully, but am
not able to aggregate error
messages, and not able to
guarantee that all other
processes were killed!
-------------------------------------------------------
Primary job terminated
normally, but 1 process returned
a non-zero exit code.. Per
user-direction, the job has been
aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more
processes exited with non-zero
status, thus causing
the job to be terminated. The
first process to do so was:
Process name: [[11371,1],0]
Exit code: 1
--------------------------------------------------------------------------
On Sat, Sep 19, 2015 at 8:50 PM,
Ralph Castain<r...@open-mpi.org
<mailto:r...@open-mpi.org>>wrote:
Paul, can you clarify
something for me? The error
in this case indicates that
the client wasn’t able to
reach the daemon - this
should have resulted in
termination of the job. Did
the job actually run?
On Sep 18, 2015, at 2:50
AM, Ralph Castain
<r...@open-mpi.org
<mailto:r...@open-mpi.org>>
wrote:
I'm on travel right now,
but it should be an easy
fix when I return. Sorry
for the annoyance
On Thu, Sep 17, 2015 at
11:13 PM, Paul
Hargrove<phhargr...@lbl.gov
<mailto:phhargr...@lbl.gov>>wrote:
Any suggestion how I
(as a non-root user)
can avoid seeing this
hwloc error message on
every run?
-Paul
On Thu, Sep 17, 2015 at
11:00 PM, Gilles
Gouaillardet<gil...@rist.or.jp
<mailto:gil...@rist.or.jp>>wrote:
Paul,
IIRC, the
"Permission denied"
is coming from
hwloc that cannot
collect all the
info it would like.
Cheers,
Gilles
On 9/18/2015 2:34
PM, Paul Hargrove
wrote:
Tried tonight's
master tarball on
Solaris 11.2 on
x86-64 with the
Studio Compilers
(default ILP32
output) and saw
the following result
$ mpirun -mca btl
sm,self -np 2
examples/ring_c'
Error opening
/devices/pci@0,0:reg:
Permission denied
[pcp-d-4:00492]
PMIX ERROR: ERROR
in file
/export/home/phargrov/OMPI/openmpi-master-solaris11-x86-ss12u3/openmpi-dev-2559-g567c9e3/opal/mca/pmix/pmix1xx/pmix/src/client/pmix_client.c
at line 181
[pcp-d-4:00491]
PMIX ERROR:
UNREACHABLE in
file
/export/home/phargrov/OMPI/openmpi-master-solaris11-x86-ss12u3/openmpi-dev-2559-g567c9e3/opal/mca/pmix/pmix1xx/pmix/src/server/pmix_server_listener.c
at line 463
I don't know if
the Permission
denied error is
related to the
subsequent PMIX
errors, but any
message that says
"UNREACHABLE" is
clearly worth
reporting.
-Paul
--
Paul H. Hargrove
phhargr...@lbl.gov
<mailto:phhargr...@lbl.gov>
Computer Languages
& Systems Software
(CLaSS) Group
Computer Science
Department
Tel:+1-510-495-2352 <tel:%2B1-510-495-2352>
Lawrence Berkeley
National
Laboratory
Fax:+1-510-486-6900
<tel:%2B1-510-486-6900>
_______________________________________________
devel mailing list
de...@open-mpi.org
Subscription:http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this
post:http://www.open-mpi.org/community/lists/devel/2015/09/18074.php
_______________________________________________
devel mailing list
de...@open-mpi.org
<mailto:de...@open-mpi.org>
Subscription:http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this
post:http://www.open-mpi.org/community/lists/devel/2015/09/18075.php
--
Paul H. Hargrove
phhargr...@lbl.gov
<mailto:phhargr...@lbl.gov>
Computer Languages &
Systems Software
(CLaSS) Group
Computer Science
Department
Tel:+1-510-495-2352
<tel:%2B1-510-495-2352>
Lawrence Berkeley
National Laboratory
Fax:+1-510-486-6900
<tel:%2B1-510-486-6900>
_______________________________________________
devel mailing list
de...@open-mpi.org
<mailto:de...@open-mpi.org>
Subscription:http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this
post:http://www.open-mpi.org/community/lists/devel/2015/09/18076.php
_______________________________________________
devel mailing list
de...@open-mpi.org
<mailto:de...@open-mpi.org>
Subscription:http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this
post:http://www.open-mpi.org/community/lists/devel/2015/09/18078.php
--
Paul H. Hargrove
phhargr...@lbl.gov
<mailto:phhargr...@lbl.gov>
Computer Languages & Systems
Software (CLaSS) Group
Computer Science Department
Tel:+1-510-495-2352
<tel:%2B1-510-495-2352>
Lawrence Berkeley National
Laboratory Fax:+1-510-486-6900
<tel:%2B1-510-486-6900>
_______________________________________________
devel mailing list
de...@open-mpi.org
<mailto:de...@open-mpi.org>
Subscription:http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this
post:http://www.open-mpi.org/community/lists/devel/2015/09/18080.php
_______________________________________________
devel mailing list
de...@open-mpi.org
<mailto:de...@open-mpi.org>
Subscription:http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this
post:http://www.open-mpi.org/community/lists/devel/2015/09/18081.php
--
Paul H. Hargrove phhargr...@lbl.gov
<mailto:phhargr...@lbl.gov>
Computer Languages & Systems Software
(CLaSS) Group
Computer Science Department
Tel:+1-510-495-2352
<tel:%2B1-510-495-2352>
Lawrence Berkeley National Laboratory
Fax:+1-510-486-6900
<tel:%2B1-510-486-6900>
_______________________________________________
devel mailing list
de...@open-mpi.org
<mailto:de...@open-mpi.org>
Subscription:http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this
post:http://www.open-mpi.org/community/lists/devel/2015/09/18083.php
--
Paul H. Hargrove phhargr...@lbl.gov
<mailto:phhargr...@lbl.gov>
Computer Languages & Systems Software (CLaSS)
Group
Computer Science Department Tel:
+1-510-495-2352 <tel:%2B1-510-495-2352>
Lawrence Berkeley National Laboratory Fax:
+1-510-486-6900 <tel:%2B1-510-486-6900>
_______________________________________________
devel mailing list
de...@open-mpi.org <mailto:de...@open-mpi.org>
Subscription:
http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this
post:http://www.open-mpi.org/community/lists/devel/2015/09/18085.php