I forgot to add that the rights are now set properly on all of the nodes.
Danilo Guerrera Departement Mathematik & Informatik Universität Basel Klingelbergstrasse 50 CH-4056 Basel Switzerland email: [email protected] Phone: +41 (0)61 267 15 18 ________________________________________ Da: rafael [[email protected]] Inviato: giovedì 3 aprile 2014 9.35 A: Danilo Guerrera Cc: chapel-developers; Public Chapel Bugs list Oggetto: Re: [Chapel-developers] [Chapel-bugs] problem with chapel locales (fwd) Hi, Have you modified the rights of uverbs in all the computers in the cluster ? Have you tried again with export GASNET_IBV_SPAWNER=mpi ? Do you have mpi properly installed and configured ? What does it says when you do the GASNET_BACKTRACE=1 ? Greets, Rafael > Hello, > >> Try adding >> >> export GASNET_IBV_SPAWNER=ssh >> >> to your exports (shouldn't need to recompile anything). >> The GASNet/IBV launcher defaults to using MPI. > > I did it and now I get this error: > > ./hello_ibv -nl 2 > Cleaning up orphaned processes... > *** FATAL ERROR: One or more processes died before setup was completed > WARNING: Ignoring call to gasneti_print_backtrace_ifenabled before > gasneti_backtrace_init > Aborted (core dumped) > >> Perhaps it is a problem with the IB HCA device file permissions. >> If you do as your user: >> >> ibv_devinfo >> >> It should print the HCA information, otherwise that is the problem. > > It works and prints out the following: > hca_id: mlx4_0 > transport: InfiniBand (0) > fw_ver: 2.8.000 > node_guid: 0025:90ff:ff16:c09c > sys_image_guid: 0025:90ff:ff16:c09f > vendor_id: 0x02c9 > vendor_part_id: 26428 > hw_ver: 0xB0 > board_id: SM_2121000001000 > phys_port_cnt: 1 > port: 1 > state: PORT_ACTIVE (4) > max_mtu: 4096 (5) > active_mtu: 4096 (5) > sm_lid: 4 > port_lid: 1 > port_lmc: 0x00 > > I solved the problem with the IBV modifying the rights on uverbs0 in > /dev/infiniband/ > > Greets, > > Danilo Guerrera > Departement Mathematik & Informatik > Universität Basel > Klingelbergstrasse 50 > CH-4056 Basel Switzerland > email: [email protected] > Phone: +41 (0)61 267 15 18 > > ________________________________________ > Da: Michael Ferguson [[email protected]] > Inviato: mercoledì 2 aprile 2014 20.09 > A: Danilo Guerrera > Cc: Public Chapel Bugs list; chapel-developers > Oggetto: Re: [Chapel-bugs] [Chapel-developers] problem with chapel locales > (fwd) > > Hi Danilo - > > Try adding > > export GASNET_IBV_SPAWNER=ssh > > to your exports (shouldn't need to recompile anything). > The GASNet/IBV launcher defaults to using MPI. > > -michael > > On 04/02/2014 12:08 PM, Danilo Guerrera wrote: >> Hello Greg, >> >> yes, it's set to GASNET_SPAWNFN=S >> >> the following are our exports: >> >> export CHPL_COMM=gasnet >> export CHPL_COMM_SUBSTRATE=ibv >> export GASNET_SPAWNFN=S >> export GASNET_SSH_SERVERS="ib-minta ib-mintb" >> export SSH_CMD=ssh >> export SSH_OPTIONS=-x >> >> Thanks, >> >> Danilo >> >> ________________________________________ >> Da: [email protected] [[email protected]] >> Inviato: mercoledì 2 aprile 2014 18.05 >> A: Brad Chamberlain >> Cc: Danilo Guerrera; Public Chapel Bugs list; Chapel Sourceforge Developers >> List >> Oggetto: Re: [Chapel-developers] problem with chapel locales (fwd) >> >> Is GASNET_SPAWNFN set and if so, to what? >> >> greg >> >> >> On Wed, 2 Apr 2014, Brad Chamberlain wrote: >> >>> >>> For this thread's reference, here's a follow-up from Danilo that I hadn't >>> found prior to sending: >>> >>> Good afternoon Mr. Chamberlain, >>> >>> I went through these errors and now I'm able to compile and run the >>> hello-world with locales. The problem is that even if I provide >>> GASNET_SSH_SERVERS with 2 node to be used I only and always get as an output >>> 2 prints from the same node, so somehow it's not going through the >>> infiniband >>> and executing 2 locales on the same machine. If you have any suggestion I >>> would appreciate it, anyway I will open an issue in the mailing list hoping >>> to find a solution so that we can use chapel locales in our course. >>> >>> >>> I think the original suggestion still holds, but this behavior may be >>> familiar to others... It sounds vaguely familiar to me, but not enough for >>> the solution to leap into my hands. >>> >>> -Brad >>> >>> >>> >>> On Wed, 2 Apr 2014, Brad Chamberlain wrote: >>> >>>> >>>> Hi Danilo -- >>>> >>>> I don't personally have enough experience with GASNet over ibv to >>>> immediately recognize this error, but am Cc:ing the public chapel-bugs list >>>> and chapel-developers list in case someone else does (Rafael?). >>>> >>>> [Note that your responses to chapel-developers will bounce unless you're >>>> subscribed, but you should be able to post to chapel-bugs] >>>> >>>> it seems likely that the problem is with your GASNet installation rather >>>> than something Chapel specific, so in cases like this, it's often helpful >>>> to run GASNet's test suite that it ships with. To do so, cd to the >>>> subdirectory of $CHPL_HOME/third-party/gasnet/build/ that corresponds to >>>> your configuration (e.g., mine would be linux64-gnu/seg-fast/nodbg) and >>>> then do 'make run-tests-par' or 'make run-tests' (see >>>> third-party/gasnet/GASNet-*/README for more information). >>>> >>>> If that doesn't point out the problem, you may also want to turn on >>>> GASNet's internal debugging assertion checks by setting the environment >>>> variable CHPL_COMM_DEBUG and remaking. This will create a sibling to >>>> 'nodbg' in the path above called 'debug' and will often be more verbose >>>> about what's going wrong. >>>> >>>> Hope this is helpful, >>>> -Brad >>>> >>>> >>>> >>>> >>>> ---------- Forwarded message ---------- >>>> Date: Wed, 2 Apr 2014 04:28:45 -0500 >>>> From: Danilo Guerrera <[email protected]> >>>> To: "[email protected]" <[email protected]> >>>> Subject: problem with chapel locales >>>> >>>> Good morning Mr. Chamberlain, >>>> >>>> I'm a PhD student at University of Basel in the High Performance and Web >>>> Computing Group led by prof. H. Burkhart. We introduced Chapel in our High >>>> Performance Computing course and now wanted to exploit our cluster and try >>>> the locales. We have an infiniband interconnection so that our first choice >>>> wad to set CHPL_COMM_SUBSTRATE to ibv. We followed the simple steps shown >>>> in the $CHPL_HOME/doc/README.multilocale file, recompiled chapel and then >>>> exported the environment variables as shown at point 5 of the README, in >>>> particular giving >>>> export GASNET_SSH_SERVERS="minta mintb" as locales to be used. >>>> We were able to compile the hello6-taskpar-dist.chpl example, but when >>>> running it with the syntax >>>> ./hello6-taskpar-dist -nl 2 >>>> >>>> we get this error: >>>> >>>> >>>> GASNet gasnetc_init returning an error code: GASNET_ERR_RESOURCE (Problem >>>> with requested resource) >>>> >>>> at >>>> /usr/local/chapel-1.8.0/third-party/gasnet/GASNet-1.20.2/vapi-conduit/gasnet_core.c:1606 >>>> >>>> reason: unable to open any HCA ports >>>> >>>> GASNet >>>> gasnet_init_GASNET_PARnopshmEVERYTHINGnodebugnotracenostatsnodebugmallocnosrclines >>>> returning an error code: GASNET_ERR_RESOURCE (Problem with requested >>>> resource) >>>> >>>> at >>>> /usr/local/chapel-1.8.0/third-party/gasnet/GASNet-1.20.2/vapi-conduit/gasnet_core.c:1849 >>>> >>>> *** Caught a fatal signal: SIGSEGV(11) on node 0/2 >>>> >>>> NOTICE: Before reporting bugs, run with GASNET_BACKTRACE=1 in the >>>> environment to generate a backtrace. >>>> >>>> GASNet gasnetc_init returning an error code: GASNET_ERR_RESOURCE (Problem >>>> with requested resource) >>>> >>>> at >>>> /usr/local/chapel-1.8.0/third-party/gasnet/GASNet-1.20.2/vapi-conduit/gasnet_core.c:1606 >>>> >>>> reason: unable to open any HCA ports >>>> >>>> GASNet >>>> gasnet_init_GASNET_PARnopshmEVERYTHINGnodebugnotracenostatsnodebugmallocnosrclines >>>> returning an error code: GASNET_ERR_RESOURCE (Problem with requested >>>> resource) >>>> >>>> at >>>> /usr/local/chapel-1.8.0/third-party/gasnet/GASNet-1.20.2/vapi-conduit/gasnet_core.c:1849 >>>> >>>> *** Caught a fatal signal: SIGSEGV(11) on node 1/2 >>>> >>>> NOTICE: Before reporting bugs, run with GASNET_BACKTRACE=1 in the >>>> environment to generate a backtrace. >>>> >>>> >>>> ===================================================================================== >>>> >>>> = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES >>>> >>>> = EXIT CODE: 139 >>>> >>>> = CLEANING UP REMAINING PROCESSES >>>> >>>> = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES >>>> >>>> ===================================================================================== >>>> >>>> APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault (signal 11) >>>> >>>> Are there particular additional configuration steps we have to make before >>>> being able to run chapel locales properly? Or are we making somehow >>>> mistakes? >>>> >>>> I hope you can help us. >>>> >>>> Kind regards, >>>> >>>> Danilo Guerrera >>>> Departement Mathematik & Informatik >>>> Universität Basel >>>> Klingelbergstrasse 50 >>>> CH-4056 Basel Switzerland >>>> email: [email protected] >>>> Phone: +41 (0)61 267 15 18 >>> >> >> ------------------------------------------------------------------------------ >> _______________________________________________ >> Chapel-bugs mailing list >> [email protected] >> https://lists.sourceforge.net/lists/listinfo/chapel-bugs >> > > > ------------------------------------------------------------------------------ > _______________________________________________ > Chapel-developers mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/chapel-developers -- Rafael Larrosa Jiménez Centro de Supercomputación y Bioinformática - http://www.scbi.uma.es Universidad de Málaga EMAIL: [email protected] Edificio de Bioinnovación TELEF: + 34951952788 C/ Severo Ochoa 34 FAX : +34951952792 Parque Tecnológico de Andalucía 29590 Málaga (SPAIN) ------------------------------------------------------------------------------ _______________________________________________ Chapel-bugs mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/chapel-bugs
