Backing up a several messages... maybe I missed it, but did you try
running the GASNet diagnostic tests as suggested in the original response?
With debugging on?
It seems to me that the duplicate cs-minta issue below (as well as the
previous issues) are very likely a GASNet configuration/launch issue
rather than a Chapel-specific thing and that it would probably be easiest
to debug using GASNet's tests (e.g., testhello) if it is. And if it
isn't, that would be valuable information for debugging Chapel.
Also likely to be useful, if you haven't already found it, is the GASNet
README for the ibv conduit, which is in
third-party/gasnet/GASNet-*/ibv-conduit/README
(or vapi-conduit/README for older versions of GASNet -- I can't remember
when they finally moved the README into ibv-conduit). Among other things,
this details the various options for launching via MPI or ssh, etc.
-Brad
On Thu, 3 Apr 2014, Danilo Guerrera wrote:
We use hydra for starting the parallel jobs.
You don't need to load mpi environment before using it, you can just (compile
and) run a mpi program like this
mpiexec -f hostfile ./app
Greets,
Danilo
________________________________________
Da: rafael [[email protected]]
Inviato: giovedì 3 aprile 2014 10.17
A: Danilo Guerrera
Cc: chapel-developers; Public Chapel Bugs list
Oggetto: Re: [Chapel-developers] [Chapel-bugs] problem with chapel locales (fwd)
Hello,
I set GASNET_IBV_SPAWNER=mpi and then exported GASNET_BACKTRACE=1,
when I run the program this is the only output I get:
./hello_ibv2 -nl 2 -v
/usr/local/chapel-1.8.0/third-party/gasnet/install/linux64-gnu/seg-everything/nodbg/bin/gasnetrun_ibv
-n 2 ./hello_ibv2_real -nl 2 -v
executing on node 1 of 2 node(s): cs-minta
executing on node 0 of 2 node(s): cs-minta
Hello, world! (from locale 0 of 2 named cs-minta)
Hello, world! (from locale 1 of 2 named cs-minta)
from the same node, instead of minta and mintb
So it works on a node.
Do you use a queue system in the cluster ? (slurm, PBS, …)
How do you send mpi programs ?
Do you need to load the mpi environment before using it ?
Greets,
Rafael
Greets,
Danilo
________________________________________
Da: rafael [[email protected]]
Inviato: giovedì 3 aprile 2014 9.35
A: Danilo Guerrera
Cc: chapel-developers; Public Chapel Bugs list
Oggetto: Re: [Chapel-developers] [Chapel-bugs] problem with chapel locales (fwd)
Hi,
Have you modified the rights of uverbs in all the computers in the cluster ?
Have you tried again with export GASNET_IBV_SPAWNER=mpi ?
Do you have mpi properly installed and configured ?
What does it says when you do the GASNET_BACKTRACE=1 ?
Greets,
Rafael
Hello,
Try adding
export GASNET_IBV_SPAWNER=ssh
to your exports (shouldn't need to recompile anything).
The GASNet/IBV launcher defaults to using MPI.
I did it and now I get this error:
./hello_ibv -nl 2
Cleaning up orphaned processes...
*** FATAL ERROR: One or more processes died before setup was completed
WARNING: Ignoring call to gasneti_print_backtrace_ifenabled before
gasneti_backtrace_init
Aborted (core dumped)
Perhaps it is a problem with the IB HCA device file permissions.
If you do as your user:
ibv_devinfo
It should print the HCA information, otherwise that is the problem.
It works and prints out the following:
hca_id: mlx4_0
transport: InfiniBand (0)
fw_ver: 2.8.000
node_guid: 0025:90ff:ff16:c09c
sys_image_guid: 0025:90ff:ff16:c09f
vendor_id: 0x02c9
vendor_part_id: 26428
hw_ver: 0xB0
board_id: SM_2121000001000
phys_port_cnt: 1
port: 1
state: PORT_ACTIVE (4)
max_mtu: 4096 (5)
active_mtu: 4096 (5)
sm_lid: 4
port_lid: 1
port_lmc: 0x00
I solved the problem with the IBV modifying the rights on uverbs0 in
/dev/infiniband/
Greets,
Danilo Guerrera
Departement Mathematik & Informatik
Universität Basel
Klingelbergstrasse 50
CH-4056 Basel Switzerland
email: [email protected]
Phone: +41 (0)61 267 15 18
________________________________________
Da: Michael Ferguson [[email protected]]
Inviato: mercoledì 2 aprile 2014 20.09
A: Danilo Guerrera
Cc: Public Chapel Bugs list; chapel-developers
Oggetto: Re: [Chapel-bugs] [Chapel-developers] problem with chapel locales (fwd)
Hi Danilo -
Try adding
export GASNET_IBV_SPAWNER=ssh
to your exports (shouldn't need to recompile anything).
The GASNet/IBV launcher defaults to using MPI.
-michael
On 04/02/2014 12:08 PM, Danilo Guerrera wrote:
Hello Greg,
yes, it's set to GASNET_SPAWNFN=S
the following are our exports:
export CHPL_COMM=gasnet
export CHPL_COMM_SUBSTRATE=ibv
export GASNET_SPAWNFN=S
export GASNET_SSH_SERVERS="ib-minta ib-mintb"
export SSH_CMD=ssh
export SSH_OPTIONS=-x
Thanks,
Danilo
________________________________________
Da: [email protected] [[email protected]]
Inviato: mercoledì 2 aprile 2014 18.05
A: Brad Chamberlain
Cc: Danilo Guerrera; Public Chapel Bugs list; Chapel Sourceforge Developers List
Oggetto: Re: [Chapel-developers] problem with chapel locales (fwd)
Is GASNET_SPAWNFN set and if so, to what?
greg
On Wed, 2 Apr 2014, Brad Chamberlain wrote:
For this thread's reference, here's a follow-up from Danilo that I hadn't
found prior to sending:
Good afternoon Mr. Chamberlain,
I went through these errors and now I'm able to compile and run the
hello-world with locales. The problem is that even if I provide
GASNET_SSH_SERVERS with 2 node to be used I only and always get as an output
2 prints from the same node, so somehow it's not going through the infiniband
and executing 2 locales on the same machine. If you have any suggestion I
would appreciate it, anyway I will open an issue in the mailing list hoping
to find a solution so that we can use chapel locales in our course.
I think the original suggestion still holds, but this behavior may be
familiar to others... It sounds vaguely familiar to me, but not enough for
the solution to leap into my hands.
-Brad
On Wed, 2 Apr 2014, Brad Chamberlain wrote:
Hi Danilo --
I don't personally have enough experience with GASNet over ibv to
immediately recognize this error, but am Cc:ing the public chapel-bugs list
and chapel-developers list in case someone else does (Rafael?).
[Note that your responses to chapel-developers will bounce unless you're
subscribed, but you should be able to post to chapel-bugs]
it seems likely that the problem is with your GASNet installation rather
than something Chapel specific, so in cases like this, it's often helpful
to run GASNet's test suite that it ships with. To do so, cd to the
subdirectory of $CHPL_HOME/third-party/gasnet/build/ that corresponds to
your configuration (e.g., mine would be linux64-gnu/seg-fast/nodbg) and
then do 'make run-tests-par' or 'make run-tests' (see
third-party/gasnet/GASNet-*/README for more information).
If that doesn't point out the problem, you may also want to turn on
GASNet's internal debugging assertion checks by setting the environment
variable CHPL_COMM_DEBUG and remaking. This will create a sibling to
'nodbg' in the path above called 'debug' and will often be more verbose
about what's going wrong.
Hope this is helpful,
-Brad
---------- Forwarded message ----------
Date: Wed, 2 Apr 2014 04:28:45 -0500
From: Danilo Guerrera <[email protected]>
To: "[email protected]" <[email protected]>
Subject: problem with chapel locales
Good morning Mr. Chamberlain,
I'm a PhD student at University of Basel in the High Performance and Web
Computing Group led by prof. H. Burkhart. We introduced Chapel in our High
Performance Computing course and now wanted to exploit our cluster and try
the locales. We have an infiniband interconnection so that our first choice
wad to set CHPL_COMM_SUBSTRATE to ibv. We followed the simple steps shown
in the $CHPL_HOME/doc/README.multilocale file, recompiled chapel and then
exported the environment variables as shown at point 5 of the README, in
particular giving
export GASNET_SSH_SERVERS="minta mintb" as locales to be used.
We were able to compile the hello6-taskpar-dist.chpl example, but when
running it with the syntax
./hello6-taskpar-dist -nl 2
we get this error:
GASNet gasnetc_init returning an error code: GASNET_ERR_RESOURCE (Problem
with requested resource)
at
/usr/local/chapel-1.8.0/third-party/gasnet/GASNet-1.20.2/vapi-conduit/gasnet_core.c:1606
reason: unable to open any HCA ports
GASNet
gasnet_init_GASNET_PARnopshmEVERYTHINGnodebugnotracenostatsnodebugmallocnosrclines
returning an error code: GASNET_ERR_RESOURCE (Problem with requested
resource)
at
/usr/local/chapel-1.8.0/third-party/gasnet/GASNet-1.20.2/vapi-conduit/gasnet_core.c:1849
*** Caught a fatal signal: SIGSEGV(11) on node 0/2
NOTICE: Before reporting bugs, run with GASNET_BACKTRACE=1 in the
environment to generate a backtrace.
GASNet gasnetc_init returning an error code: GASNET_ERR_RESOURCE (Problem
with requested resource)
at
/usr/local/chapel-1.8.0/third-party/gasnet/GASNet-1.20.2/vapi-conduit/gasnet_core.c:1606
reason: unable to open any HCA ports
GASNet
gasnet_init_GASNET_PARnopshmEVERYTHINGnodebugnotracenostatsnodebugmallocnosrclines
returning an error code: GASNET_ERR_RESOURCE (Problem with requested
resource)
at
/usr/local/chapel-1.8.0/third-party/gasnet/GASNet-1.20.2/vapi-conduit/gasnet_core.c:1849
*** Caught a fatal signal: SIGSEGV(11) on node 1/2
NOTICE: Before reporting bugs, run with GASNET_BACKTRACE=1 in the
environment to generate a backtrace.
=====================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= EXIT CODE: 139
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
=====================================================================================
APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault (signal 11)
Are there particular additional configuration steps we have to make before
being able to run chapel locales properly? Or are we making somehow
mistakes?
I hope you can help us.
Kind regards,
Danilo Guerrera
Departement Mathematik & Informatik
Universität Basel
Klingelbergstrasse 50
CH-4056 Basel Switzerland
email: [email protected]
Phone: +41 (0)61 267 15 18
------------------------------------------------------------------------------
_______________________________________________
Chapel-bugs mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/chapel-bugs
------------------------------------------------------------------------------
_______________________________________________
Chapel-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/chapel-developers
--
Rafael Larrosa Jiménez
Centro de Supercomputación y Bioinformática - http://www.scbi.uma.es
Universidad de Málaga
EMAIL: [email protected] Edificio de Bioinnovación
TELEF: + 34951952788 C/ Severo Ochoa 34
FAX : +34951952792 Parque Tecnológico de Andalucía
29590 Málaga
(SPAIN)
--
Rafael Larrosa Jiménez
Centro de Supercomputación y Bioinformática - http://www.scbi.uma.es
Universidad de Málaga
EMAIL: [email protected] Edificio de Bioinnovación
TELEF: + 34951952788 C/ Severo Ochoa 34
FAX : +34951952792 Parque Tecnológico de Andalucía
29590 Málaga
(SPAIN)
------------------------------------------------------------------------------
_______________________________________________
Chapel-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/chapel-developers
------------------------------------------------------------------------------
_______________________________________________
Chapel-bugs mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/chapel-bugs