We use hydra for starting the parallel jobs. You don't need to load mpi environment before using it, you can just (compile and) run a mpi program like this mpiexec -f hostfile ./app
Greets, Danilo ________________________________________ Da: rafael [[email protected]] Inviato: giovedì 3 aprile 2014 10.17 A: Danilo Guerrera Cc: chapel-developers; Public Chapel Bugs list Oggetto: Re: [Chapel-developers] [Chapel-bugs] problem with chapel locales (fwd) > Hello, > > I set GASNET_IBV_SPAWNER=mpi and then exported GASNET_BACKTRACE=1, > > when I run the program this is the only output I get: > ./hello_ibv2 -nl 2 -v > /usr/local/chapel-1.8.0/third-party/gasnet/install/linux64-gnu/seg-everything/nodbg/bin/gasnetrun_ibv > -n 2 ./hello_ibv2_real -nl 2 -v > executing on node 1 of 2 node(s): cs-minta > executing on node 0 of 2 node(s): cs-minta > Hello, world! (from locale 0 of 2 named cs-minta) > Hello, world! (from locale 1 of 2 named cs-minta) > > from the same node, instead of minta and mintb So it works on a node. Do you use a queue system in the cluster ? (slurm, PBS, …) How do you send mpi programs ? Do you need to load the mpi environment before using it ? Greets, Rafael > Greets, > > Danilo > > ________________________________________ > Da: rafael [[email protected]] > Inviato: giovedì 3 aprile 2014 9.35 > A: Danilo Guerrera > Cc: chapel-developers; Public Chapel Bugs list > Oggetto: Re: [Chapel-developers] [Chapel-bugs] problem with chapel locales > (fwd) > > Hi, > > Have you modified the rights of uverbs in all the computers in the cluster ? > > Have you tried again with export GASNET_IBV_SPAWNER=mpi ? > Do you have mpi properly installed and configured ? > > What does it says when you do the GASNET_BACKTRACE=1 ? > > Greets, > > Rafael > >> Hello, >> >>> Try adding >>> >>> export GASNET_IBV_SPAWNER=ssh >>> >>> to your exports (shouldn't need to recompile anything). >>> The GASNet/IBV launcher defaults to using MPI. >> >> I did it and now I get this error: >> >> ./hello_ibv -nl 2 >> Cleaning up orphaned processes... >> *** FATAL ERROR: One or more processes died before setup was completed >> WARNING: Ignoring call to gasneti_print_backtrace_ifenabled before >> gasneti_backtrace_init >> Aborted (core dumped) >> >>> Perhaps it is a problem with the IB HCA device file permissions. >>> If you do as your user: >>> >>> ibv_devinfo >>> >>> It should print the HCA information, otherwise that is the problem. >> >> It works and prints out the following: >> hca_id: mlx4_0 >> transport: InfiniBand (0) >> fw_ver: 2.8.000 >> node_guid: 0025:90ff:ff16:c09c >> sys_image_guid: 0025:90ff:ff16:c09f >> vendor_id: 0x02c9 >> vendor_part_id: 26428 >> hw_ver: 0xB0 >> board_id: SM_2121000001000 >> phys_port_cnt: 1 >> port: 1 >> state: PORT_ACTIVE (4) >> max_mtu: 4096 (5) >> active_mtu: 4096 (5) >> sm_lid: 4 >> port_lid: 1 >> port_lmc: 0x00 >> >> I solved the problem with the IBV modifying the rights on uverbs0 in >> /dev/infiniband/ >> >> Greets, >> >> Danilo Guerrera >> Departement Mathematik & Informatik >> Universität Basel >> Klingelbergstrasse 50 >> CH-4056 Basel Switzerland >> email: [email protected] >> Phone: +41 (0)61 267 15 18 >> >> ________________________________________ >> Da: Michael Ferguson [[email protected]] >> Inviato: mercoledì 2 aprile 2014 20.09 >> A: Danilo Guerrera >> Cc: Public Chapel Bugs list; chapel-developers >> Oggetto: Re: [Chapel-bugs] [Chapel-developers] problem with chapel locales >> (fwd) >> >> Hi Danilo - >> >> Try adding >> >> export GASNET_IBV_SPAWNER=ssh >> >> to your exports (shouldn't need to recompile anything). >> The GASNet/IBV launcher defaults to using MPI. >> >> -michael >> >> On 04/02/2014 12:08 PM, Danilo Guerrera wrote: >>> Hello Greg, >>> >>> yes, it's set to GASNET_SPAWNFN=S >>> >>> the following are our exports: >>> >>> export CHPL_COMM=gasnet >>> export CHPL_COMM_SUBSTRATE=ibv >>> export GASNET_SPAWNFN=S >>> export GASNET_SSH_SERVERS="ib-minta ib-mintb" >>> export SSH_CMD=ssh >>> export SSH_OPTIONS=-x >>> >>> Thanks, >>> >>> Danilo >>> >>> ________________________________________ >>> Da: [email protected] [[email protected]] >>> Inviato: mercoledì 2 aprile 2014 18.05 >>> A: Brad Chamberlain >>> Cc: Danilo Guerrera; Public Chapel Bugs list; Chapel Sourceforge Developers >>> List >>> Oggetto: Re: [Chapel-developers] problem with chapel locales (fwd) >>> >>> Is GASNET_SPAWNFN set and if so, to what? >>> >>> greg >>> >>> >>> On Wed, 2 Apr 2014, Brad Chamberlain wrote: >>> >>>> >>>> For this thread's reference, here's a follow-up from Danilo that I hadn't >>>> found prior to sending: >>>> >>>> Good afternoon Mr. Chamberlain, >>>> >>>> I went through these errors and now I'm able to compile and run the >>>> hello-world with locales. The problem is that even if I provide >>>> GASNET_SSH_SERVERS with 2 node to be used I only and always get as an >>>> output >>>> 2 prints from the same node, so somehow it's not going through the >>>> infiniband >>>> and executing 2 locales on the same machine. If you have any suggestion I >>>> would appreciate it, anyway I will open an issue in the mailing list hoping >>>> to find a solution so that we can use chapel locales in our course. >>>> >>>> >>>> I think the original suggestion still holds, but this behavior may be >>>> familiar to others... It sounds vaguely familiar to me, but not enough for >>>> the solution to leap into my hands. >>>> >>>> -Brad >>>> >>>> >>>> >>>> On Wed, 2 Apr 2014, Brad Chamberlain wrote: >>>> >>>>> >>>>> Hi Danilo -- >>>>> >>>>> I don't personally have enough experience with GASNet over ibv to >>>>> immediately recognize this error, but am Cc:ing the public chapel-bugs >>>>> list >>>>> and chapel-developers list in case someone else does (Rafael?). >>>>> >>>>> [Note that your responses to chapel-developers will bounce unless you're >>>>> subscribed, but you should be able to post to chapel-bugs] >>>>> >>>>> it seems likely that the problem is with your GASNet installation rather >>>>> than something Chapel specific, so in cases like this, it's often helpful >>>>> to run GASNet's test suite that it ships with. To do so, cd to the >>>>> subdirectory of $CHPL_HOME/third-party/gasnet/build/ that corresponds to >>>>> your configuration (e.g., mine would be linux64-gnu/seg-fast/nodbg) and >>>>> then do 'make run-tests-par' or 'make run-tests' (see >>>>> third-party/gasnet/GASNet-*/README for more information). >>>>> >>>>> If that doesn't point out the problem, you may also want to turn on >>>>> GASNet's internal debugging assertion checks by setting the environment >>>>> variable CHPL_COMM_DEBUG and remaking. This will create a sibling to >>>>> 'nodbg' in the path above called 'debug' and will often be more verbose >>>>> about what's going wrong. >>>>> >>>>> Hope this is helpful, >>>>> -Brad >>>>> >>>>> >>>>> >>>>> >>>>> ---------- Forwarded message ---------- >>>>> Date: Wed, 2 Apr 2014 04:28:45 -0500 >>>>> From: Danilo Guerrera <[email protected]> >>>>> To: "[email protected]" <[email protected]> >>>>> Subject: problem with chapel locales >>>>> >>>>> Good morning Mr. Chamberlain, >>>>> >>>>> I'm a PhD student at University of Basel in the High Performance and Web >>>>> Computing Group led by prof. H. Burkhart. We introduced Chapel in our High >>>>> Performance Computing course and now wanted to exploit our cluster and try >>>>> the locales. We have an infiniband interconnection so that our first >>>>> choice >>>>> wad to set CHPL_COMM_SUBSTRATE to ibv. We followed the simple steps shown >>>>> in the $CHPL_HOME/doc/README.multilocale file, recompiled chapel and then >>>>> exported the environment variables as shown at point 5 of the README, in >>>>> particular giving >>>>> export GASNET_SSH_SERVERS="minta mintb" as locales to be used. >>>>> We were able to compile the hello6-taskpar-dist.chpl example, but when >>>>> running it with the syntax >>>>> ./hello6-taskpar-dist -nl 2 >>>>> >>>>> we get this error: >>>>> >>>>> >>>>> GASNet gasnetc_init returning an error code: GASNET_ERR_RESOURCE (Problem >>>>> with requested resource) >>>>> >>>>> at >>>>> /usr/local/chapel-1.8.0/third-party/gasnet/GASNet-1.20.2/vapi-conduit/gasnet_core.c:1606 >>>>> >>>>> reason: unable to open any HCA ports >>>>> >>>>> GASNet >>>>> gasnet_init_GASNET_PARnopshmEVERYTHINGnodebugnotracenostatsnodebugmallocnosrclines >>>>> returning an error code: GASNET_ERR_RESOURCE (Problem with requested >>>>> resource) >>>>> >>>>> at >>>>> /usr/local/chapel-1.8.0/third-party/gasnet/GASNet-1.20.2/vapi-conduit/gasnet_core.c:1849 >>>>> >>>>> *** Caught a fatal signal: SIGSEGV(11) on node 0/2 >>>>> >>>>> NOTICE: Before reporting bugs, run with GASNET_BACKTRACE=1 in the >>>>> environment to generate a backtrace. >>>>> >>>>> GASNet gasnetc_init returning an error code: GASNET_ERR_RESOURCE (Problem >>>>> with requested resource) >>>>> >>>>> at >>>>> /usr/local/chapel-1.8.0/third-party/gasnet/GASNet-1.20.2/vapi-conduit/gasnet_core.c:1606 >>>>> >>>>> reason: unable to open any HCA ports >>>>> >>>>> GASNet >>>>> gasnet_init_GASNET_PARnopshmEVERYTHINGnodebugnotracenostatsnodebugmallocnosrclines >>>>> returning an error code: GASNET_ERR_RESOURCE (Problem with requested >>>>> resource) >>>>> >>>>> at >>>>> /usr/local/chapel-1.8.0/third-party/gasnet/GASNet-1.20.2/vapi-conduit/gasnet_core.c:1849 >>>>> >>>>> *** Caught a fatal signal: SIGSEGV(11) on node 1/2 >>>>> >>>>> NOTICE: Before reporting bugs, run with GASNET_BACKTRACE=1 in the >>>>> environment to generate a backtrace. >>>>> >>>>> >>>>> ===================================================================================== >>>>> >>>>> = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES >>>>> >>>>> = EXIT CODE: 139 >>>>> >>>>> = CLEANING UP REMAINING PROCESSES >>>>> >>>>> = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES >>>>> >>>>> ===================================================================================== >>>>> >>>>> APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault (signal >>>>> 11) >>>>> >>>>> Are there particular additional configuration steps we have to make before >>>>> being able to run chapel locales properly? Or are we making somehow >>>>> mistakes? >>>>> >>>>> I hope you can help us. >>>>> >>>>> Kind regards, >>>>> >>>>> Danilo Guerrera >>>>> Departement Mathematik & Informatik >>>>> Universität Basel >>>>> Klingelbergstrasse 50 >>>>> CH-4056 Basel Switzerland >>>>> email: [email protected] >>>>> Phone: +41 (0)61 267 15 18 >>>> >>> >>> ------------------------------------------------------------------------------ >>> _______________________________________________ >>> Chapel-bugs mailing list >>> [email protected] >>> https://lists.sourceforge.net/lists/listinfo/chapel-bugs >>> >> >> >> ------------------------------------------------------------------------------ >> _______________________________________________ >> Chapel-developers mailing list >> [email protected] >> https://lists.sourceforge.net/lists/listinfo/chapel-developers > > -- > Rafael Larrosa Jiménez > Centro de Supercomputación y Bioinformática - http://www.scbi.uma.es > Universidad de Málaga > > EMAIL: [email protected] Edificio de Bioinnovación > TELEF: + 34951952788 C/ Severo Ochoa 34 > FAX : +34951952792 Parque Tecnológico de > Andalucía > 29590 Málaga > (SPAIN) > > -- Rafael Larrosa Jiménez Centro de Supercomputación y Bioinformática - http://www.scbi.uma.es Universidad de Málaga EMAIL: [email protected] Edificio de Bioinnovación TELEF: + 34951952788 C/ Severo Ochoa 34 FAX : +34951952792 Parque Tecnológico de Andalucía 29590 Málaga (SPAIN) ------------------------------------------------------------------------------ _______________________________________________ Chapel-bugs mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/chapel-bugs
