There are 2 typos in the solution: On Fri, Sep 16, 2016 at 3:46 PM, Michael Ferguson <[email protected]> wrote:
> Hi - > > (For the archives). I was able to help Hui and got 3 different ways of > launching Chapel programs working on that Infiniband cluster: > > 1) export CHPL_LAUNCHER=slurm-gasnetrun_ibv > export CHPL_LAUNCHER_WALLTIME=00:15:00 > > export SLURM_PARTITION=debug > make > chpl program.chpl > ./a.out -nl 3 > > 2) export CHPL_LAUNCHER=gasnetrun_ibv > export GASNET_IBV_SPAWNER=S > GASNET_IBV_SPAWNER=ssh > make > chpl program.chpl > salloc -N number-of-locales > # in the salloc shell: > export GASNET_SSH_SERVERS=`scontrol show hostnames` > ./a.out -nl 3 > > 3) export CHPL_LAUNCHER=gasnetrun_ibv > export GASNET_IBV_SPAWNER=S > GASNET_IBV_SPAWNER=ssh > make > chpl program.chpl > sbatch job.sh > > where job.sh is an sbatch script that contains > export GASNET_SSH_SERVERS=`scontrol show hostnames` > among other things: > > job.sh file contains: > > #!/bin/bash > #SBATCH -t 0:10:0 > #SBATCH --nodes=3 > #SBATCH --exclusive > #SBATCH --partition=debug > #SBATCH --output=/path-to-job-output > > export GASNET_SSH_SERVERS=`scontrol show hostnames` > export GASNET_IBV_SPAWNER=ssh > export GASNET_PHYSMEM_MAX=1G # Limit GASNet's IBV conduit probing > > export GASNET_SSH_OPTIONS="-o LogLevel=Error" #disable login banner into > the output > > cd some-directory > > ./a.out -nl 3 > > > > Note: > > * GASNET_CSPAWN_CMD does not work with GASNet's ibv launcher. > > * It appears to be necessary to run GASNet's ibv launcher > (simply running the _real executables in sbatch or srun > isn't sufficient). > * Setting GASNET_PHYSMEM_MAX and possibly GASNET_PHYSMEM_NOPROBE > is important for job launches to take a reasonable amount of time > > Cheers, > > -michael > > > > > > On 9/10/16, 12:17 AM, "Hui Zhang" <[email protected]> wrote: > > >Hello, Greg > > > > > >I did two ways: > >1. use batch script > >CHPL_COMM=gasnet > >CHPL_LAUNCHER=slurm-gasnetrun_ibv > >CHPL_COMM_SUBSTRATE=ibv > >GASNET_ROUTE_OUTPUT=0 > >GASNET_VERBOSEENV=1 > >GASNET_SSH_OPTIONS="-o LogLevel=Error" #disable login banner > > > > > > > >GASNET_SPAWNFN=C > >GASNET_CSPAWN_CMD='srun -N%N %C' > > > > > >cmd: > >$CHPL_HOME/test/release/examples/hello6-taskpar-dist_real -nl 4 > >--tasksPerLocale=6 -v > > > > > >2. use interactive: > >same Envs, except I didn't set GASNET_SPAWNFN, and use srun explicitly: > > > > > >salloc -N 4 -t 00:15:00 -p debug > >srun $CHPL_HOME/test/release/examples/hello6-taskpar-dist_real -nl 4 > >--tasksPerLocale=6 -v > > > > > >Both gives me the same error: > > > > > >*** FATAL ERROR: Requested spawner "(not set)" is unknown or not > >supported in this build > >WARNING: Ignoring call to gasneti_print_backtrace_ifenabled before > >gasneti_backtrace_init > >*** FATAL ERROR: Requested spawner "(not set)" is unknown or not > >supported in this build > >WARNING: Ignoring call to gasneti_print_backtrace_ifenabled before > >gasneti_backtrace_init > >*** FATAL ERROR: Requested spawner "(not set)" is unknown or not > >supported in this build > >WARNING: Ignoring call to gasneti_print_backtrace_ifenabled before > >gasneti_backtrace_init > >*** FATAL ERROR: Requested spawner "(not set)" is unknown or not > >supported in this build > >WARNING: Ignoring call to gasneti_print_backtrace_ifenabled before > >gasneti_backtrace_init > >srun: error: compute-b28-47: task 0: Aborted (core dumped) > >srun: error: compute-b28-49: task 2: Aborted (core dumped) > >srun: error: compute-b28-48: task 1: Aborted (core dumped) > >srun: error: compute-b28-50: task 3: Aborted (core dumped) > > > > > >Thanks > > > > > > > > > >On Fri, Sep 9, 2016 at 7:20 PM, Greg Titus > ><[email protected]> wrote: > > > >Hello Hui -- > > > >I've somewhat lost track of your environment settings. What do you have > >CHPL_LAUNCHER and CHPL_COMM_SUBSTRATE set to now, and also what are the > >settings of all of your GASNet-specific env vars, such as GASNET_SPAWNFN > >and the like? > > > >thanks, > >greg > > > > > > > >On Fri, 9 Sep 2016, Hui Zhang wrote: > > > > > >Hello, team > >Following up the previous issue, I've found out that was because I was > >missing libibverbs.so.1 in the machine. After adding that, I came to an > >error exactly the same as I found in an old thread in the mailing list: > >https://sourceforge.net/p/chapel/mailman/message/34769706/ > > > >** FATAL ERROR: Requested spawner "(not set)" is unknown or not supported > >in this build > >WARNING: Ignoring call to gasneti_print_backtrace_ifenabled before > >gasneti_backtrace_init > > > >srun: error: node01: task 0: Aborted > >srun: error: node03: task 2: Aborted > >srun: error: node02: task 1: Aborted > > > >But I don't see a solution provided, so is there any method tried to fix > >this problem ? > > > >Thanks > > > > > >On Wed, Sep 7, 2016 at 11:22 PM, Hui Zhang <[email protected]> > >wrote: > > Update: > >I tried chapel 1.11 and the master, both gives me the same result > >(not outputting anything). Executing with -v gives me one line > >message: > >expect .chpl-expect-# (some number, not fixed from run to run) > > > > > >On Wed, Sep 7, 2016 at 2:30 PM, Hui Zhang <[email protected]> > >wrote: > > Hello, team > > > >I had success on running Chapel multi-locale on an infiniband > >cluster with the default GASNET settting. Here's my script to > >use gasnet with slurm: > > > >export GASNET_SSH_OPTIONS="-o LogLevel=Error" #disable login > >banner into the output > >export GASNET_SPAWNFN=C > >export GASNET_CSPAWN_CMD='srun -N%N %C' > > > >. > >/hello6-taskpar-dist -nl 4 (using _real won't work, any > >idea why?) > > > > > >It works but the output suggests to use ibv-conduit instead of > >udp-conduit for better performance, so I did: > >1) export CHPL_COMM=gasnet > > export CHPL_LAUNCHER=slurm-gasnetrun_ibv > > export CHPL_COMM_SUBSTRATE=ibv > >2) cd $CHPL_HOME & make > >It reports the same error > >ashttps://sourceforge.net/p/chapel/mailman/chapel- > developers/thread/VI1PR0 > >6MB118160 > ><http://sourceforge.net/p/chapel/mailman/chapel- > developers/thread/VI1PR06M > >B118160> > >[email protected]/ > ><http://[email protected]/> > >and it builds with patch provided by Michael. > > > >However, when I recompiled hello6, then used the same script > >to execute it, the job completed normally but it did not > >output anything. If I use -v in the command, it only printed > >out: > >expect .chpl-expect-12045 > > > >Am I missing something ? > >Thanks > > > >-- > >Best regards > > > > > >Hui Zhang > > > > > > > > > >-- > >Best regards > > > > > >Hui Zhang > > > > > > > > > >-- > >Best regards > > > > > >Hui Zhang > > > > > > > > > > > > > > > > > > > > > > > > > >-- > >Best regards > > > > > >Hui Zhang > > > > > > -- Best regards Hui Zhang
------------------------------------------------------------------------------
_______________________________________________ Chapel-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/chapel-developers
