Hi Bibek - I use SLURM myself and have some tidbits for you: - there is a SLURM-specific launcher for inifiniband although I havn't tried it, you should be able to use it with export CHPL_LAUNCHER=slurm-gasnetrun_ibv - With the environment you are describing, you are using the SSH job spawner but specified a custom spawn command. To use the custom spawner, you'd need to $ export GASNET_SPAWNFN=C and then the GASNET_CSPAWN_CMD will apply. - But that doesn't work for me with Infiniband - I have to use the SSH spawner. So I do something like this:
$ salloc -N2 $ export GASNET_IBV_SPAWNER=ssh $ export GASNET_MXM_SPAWNER=ssh $export GASNET_SSH_SERVERS=`scontrol show hostnames` $ ./hello6-taskpar-dist -nl 2 (and then you have to type 'exit' to release your reservation on 2 nodes created with salloc). - You should investigate the SLURM documentation to find the answers to your other questions. Hope that helps. -michael On 03/27/2014 03:58 PM, Bibek Ghimire wrote: > Hi there, > Since a simple hello world application works. I tried running > chapel program for multiple node. The computer I am running has slurm as job > scheduler. But I am running into some problem. I even tried running chapel > binary directly into the slurm but it does not work that way. > > /$ srun -p marvin -N 2 -n 4 -c 8 ./hello6-taskpar-dist/ > > /error: Specify number of locales via -nl <#> or --numLocales=<#>/ > > /error: Specify number of locales via -nl <#> or --numLocales=<#>/ > > /error: Specify number of locales via -nl <#> or --numLocales=<#>/ > > /error: Specify number of locales via -nl <#> or --numLocales=<#>/ > > *I understand why this error appears. * > > > Then I read README.multilocale and README.launcher there I found couple of > things on how to launch chapel using slurm. So I tried exporting ... > > exportCHPL_COMM=gasnet > > exportCHPL_COMM_SUBSTRATE=ibv > > exportCHPL_LAUNCHER_WALLTIME=00:15:00 > > exportGASNET_SPAWNFN=S > > exportGASNET_SSH_SERVERS="reno lyra01" > > exportSSH_CMD=ssh > > exportSSH_OPTIONS=-x > > export GASNET_CSPAWN_CMD="srun -N%N %C" > > and yes I did recompiled after doing all this. > > > when I do > > ./hello6-taskpar-dist -nl 2 > > Access denied: user bghimire (uid=3030) has no active jobs. > > Connection closed by 12.23.1.1 > > connection to reno failed. > > Terminated > > > I was curious why the slurm was not working in this case. Its just going > through the ssh but not doing any slurm thing > > Another question is what does -N%N %C really mean in export > GASNET_CSPAWN_CMD="srun -N%N %C" and how can I force it to submit job via > slurm to specific node and cores. > > > > > > > > > Thank you, > Bibek > > ------------------------------------------------------------------------------ _______________________________________________ Chapel-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/chapel-users
