Hi - On 9/16/16, 3:55 PM, "Brad Chamberlain" <[email protected]> wrote:
> >Is there anything that could/should be done for the release to ease >future >user pain? (in terms of either code or documentation changes?) Yes, absolutely. I'm looking at updating multilocale.rst and launcher.rst. I think these documents need to be clearer about not just what variables are set but where they fit in. (E.g., you don't use sbatch with gasnetrun_ibv). -michael >On Fri, 16 Sep 2016, Michael Ferguson wrote: > >> Hi - >> >> (For the archives). I was able to help Hui and got 3 different ways of >> launching Chapel programs working on that Infiniband cluster: >> >> 1) export CHPL_LAUNCHER=slurm-gasnetrun_ibv >> export CHPL_LAUNCHER_WALLTIME=00:15:00 >> >> export SLURM_PARTITION=debug >> make >> chpl program.chpl >> ./a.out -nl 3 >> >> 2) export CHPL_LAUNCHER=gasnetrun_ibv >> export GASNET_IBV_SPAWNER=S >> make >> chpl program.chpl >> salloc -N number-of-locales >> # in the salloc shell: >> export GASNET_SSH_SERVERS=`scontrol show hostnames` >> ./a.out -nl 3 >> >> 3) export CHPL_LAUNCHER=gasnetrun_ibv >> export GASNET_IBV_SPAWNER=S >> make >> chpl program.chpl >> sbatch job.sh >> >> where job.sh is an sbatch script that contains >> export GASNET_SSH_SERVERS=`scontrol show hostnames` >> among other things: >> >> job.sh file contains: >> >> #!/bin/bash >> #SBATCH -t 0:10:0 >> #SBATCH --nodes=3 >> #SBATCH --exclusive >> #SBATCH --partition=debug >> #SBATCH --output=/path-to-job-output >> >> export GASNET_SSH_SERVERS=`scontrol show hostnames` >> export GASNET_IBV_SPAWNER=ssh >> export GASNET_PHYSMEM_MAX=1G # Limit GASNet's IBV conduit probing >> >> export GASNET_SSH_OPTIONS="-o LogLevel=Error" #disable login banner into >> the output >> >> cd some-directory >> >> ./a.out -nl 3 >> >> >> >> Note: >> >> * GASNET_CSPAWN_CMD does not work with GASNet's ibv launcher. >> >> * It appears to be necessary to run GASNet's ibv launcher >> (simply running the _real executables in sbatch or srun >> isn't sufficient). >> * Setting GASNET_PHYSMEM_MAX and possibly GASNET_PHYSMEM_NOPROBE >> is important for job launches to take a reasonable amount of time >> >> Cheers, >> >> -michael >> >> >> >> >> >> On 9/10/16, 12:17 AM, "Hui Zhang" <[email protected]> wrote: >> >>> Hello, Greg >>> >>> >>> I did two ways: >>> 1. use batch script >>> CHPL_COMM=gasnet >>> CHPL_LAUNCHER=slurm-gasnetrun_ibv >>> CHPL_COMM_SUBSTRATE=ibv >>> GASNET_ROUTE_OUTPUT=0 >>> GASNET_VERBOSEENV=1 >>> GASNET_SSH_OPTIONS="-o LogLevel=Error" #disable login banner >>> >>> >>> >>> GASNET_SPAWNFN=C >>> GASNET_CSPAWN_CMD='srun -N%N %C' >>> >>> >>> cmd: >>> $CHPL_HOME/test/release/examples/hello6-taskpar-dist_real -nl 4 >>> --tasksPerLocale=6 -v >>> >>> >>> 2. use interactive: >>> same Envs, except I didn't set GASNET_SPAWNFN, and use srun explicitly: >>> >>> >>> salloc -N 4 -t 00:15:00 -p debug >>> srun $CHPL_HOME/test/release/examples/hello6-taskpar-dist_real -nl 4 >>> --tasksPerLocale=6 -v >>> >>> >>> Both gives me the same error: >>> >>> >>> *** FATAL ERROR: Requested spawner "(not set)" is unknown or not >>> supported in this build >>> WARNING: Ignoring call to gasneti_print_backtrace_ifenabled before >>> gasneti_backtrace_init >>> *** FATAL ERROR: Requested spawner "(not set)" is unknown or not >>> supported in this build >>> WARNING: Ignoring call to gasneti_print_backtrace_ifenabled before >>> gasneti_backtrace_init >>> *** FATAL ERROR: Requested spawner "(not set)" is unknown or not >>> supported in this build >>> WARNING: Ignoring call to gasneti_print_backtrace_ifenabled before >>> gasneti_backtrace_init >>> *** FATAL ERROR: Requested spawner "(not set)" is unknown or not >>> supported in this build >>> WARNING: Ignoring call to gasneti_print_backtrace_ifenabled before >>> gasneti_backtrace_init >>> srun: error: compute-b28-47: task 0: Aborted (core dumped) >>> srun: error: compute-b28-49: task 2: Aborted (core dumped) >>> srun: error: compute-b28-48: task 1: Aborted (core dumped) >>> srun: error: compute-b28-50: task 3: Aborted (core dumped) >>> >>> >>> Thanks >>> >>> >>> >>> >>> On Fri, Sep 9, 2016 at 7:20 PM, Greg Titus >>> <[email protected]> wrote: >>> >>> Hello Hui -- >>> >>> I've somewhat lost track of your environment settings. What do you >>>have >>> CHPL_LAUNCHER and CHPL_COMM_SUBSTRATE set to now, and also what are the >>> settings of all of your GASNet-specific env vars, such as >>>GASNET_SPAWNFN >>> and the like? >>> >>> thanks, >>> greg >>> >>> >>> >>> On Fri, 9 Sep 2016, Hui Zhang wrote: >>> >>> >>> Hello, team >>> Following up the previous issue, I've found out that was because I was >>> missing libibverbs.so.1 in the machine. After adding that, I came to an >>> error exactly the same as I found in an old thread in the mailing list: >>> https://sourceforge.net/p/chapel/mailman/message/34769706/ >>> >>> ** FATAL ERROR: Requested spawner "(not set)" is unknown or not >>>supported >>> in this build >>> WARNING: Ignoring call to gasneti_print_backtrace_ifenabled before >>> gasneti_backtrace_init >>> >>> srun: error: node01: task 0: Aborted >>> srun: error: node03: task 2: Aborted >>> srun: error: node02: task 1: Aborted >>> >>> But I don't see a solution provided, so is there any method tried to >>>fix >>> this problem ? >>> >>> Thanks >>> >>> >>> On Wed, Sep 7, 2016 at 11:22 PM, Hui Zhang <[email protected]> >>> wrote: >>> Update: >>> I tried chapel 1.11 and the master, both gives me the same result >>> (not outputting anything). Executing with -v gives me one line >>> message: >>> expect .chpl-expect-# (some number, not fixed from run to run) >>> >>> >>> On Wed, Sep 7, 2016 at 2:30 PM, Hui Zhang <[email protected]> >>> wrote: >>> Hello, team >>> >>> I had success on running Chapel multi-locale on an infiniband >>> cluster with the default GASNET settting. Here's my script to >>> use gasnet with slurm: >>> >>> export GASNET_SSH_OPTIONS="-o LogLevel=Error" #disable login >>> banner into the output >>> export GASNET_SPAWNFN=C >>> export GASNET_CSPAWN_CMD='srun -N%N %C' >>> >>> .​ >>> /hello6-taskpar-dist -nl 4​ (using _real won't work, any >>> idea why?)​ >>> >>> >>> It works but the output suggests to use ibv-conduit instead of >>> udp-conduit for better performance, so I ​did: >>> 1) export CHPL_COMM=gasnet >>> export CHPL_LAUNCHER=slurm-gasnetrun_ibv >>> export CHPL_COMM_SUBSTRATE=ibv >>> 2) cd $CHPL_HOME & make >>> It reports the same error >>> >>>ashttps://sourceforge.net/p/chapel/mailman/chapel-developers/thread/VI1P >>>R0 >>> 6MB118160 >>> >>><http://sourceforge.net/p/chapel/mailman/chapel-developers/thread/VI1PR0 >>>6M >>> B118160> >>> [email protected]/ >>> >>><http://[email protected]/ >>>> >>> and it builds with patch provided by Michael. >>> >>> However, when I recompiled hello6, then used the same script >>> to execute it, the job completed normally but it did not >>> output anything. If I use -v in the command, it only printed >>> out: >>> expect .chpl-expect-12045 >>> >>> Am I missing something ? >>> Thanks >>> >>> -- >>> Best regards >>> >>> >>> Hui Zhang >>> >>> >>> >>> >>> -- >>> Best regards >>> >>> >>> Hui Zhang >>> >>> >>> >>> >>> -- >>> Best regards >>> >>> >>> Hui Zhang >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> -- >>> Best regards >>> >>> >>> Hui Zhang >>> >>> >> >> >>------------------------------------------------------------------------- >>----- >> _______________________________________________ >> Chapel-developers mailing list >> [email protected] >> https://lists.sourceforge.net/lists/listinfo/chapel-developers ------------------------------------------------------------------------------ _______________________________________________ Chapel-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/chapel-developers
