Hi there,
Since a simple hello world application works. I tried running
chapel program for multiple node. The computer I am running has slurm as
job scheduler. But I am running into some problem. I even tried running
chapel binary directly into the slurm but it does not work that way.
*$ srun -p marvin -N 2 -n 4 -c 8 ./hello6-taskpar-dist*
*error: Specify number of locales via -nl <#> or --numLocales=<#>*
*error: Specify number of locales via -nl <#> or --numLocales=<#>*
*error: Specify number of locales via -nl <#> or --numLocales=<#>*
*error: Specify number of locales via -nl <#> or --numLocales=<#>*
*I understand why this error appears. *
Then I read README.multilocale and README.launcher there I found couple of
things on how to launch chapel using slurm. So I tried exporting ...
export CHPL_COMM=gasnet
export CHPL_COMM_SUBSTRATE=ibv
export CHPL_LAUNCHER_WALLTIME=00:15:00
export GASNET_SPAWNFN=S
export GASNET_SSH_SERVERS="reno lyra01"
export SSH_CMD=ssh
export SSH_OPTIONS=-x
export GASNET_CSPAWN_CMD="srun -N%N %C"
and yes I did recompiled after doing all this.
when I do
./hello6-taskpar-dist -nl 2
Access denied: user bghimire (uid=3030) has no active jobs.
Connection closed by 12.23.1.1
connection to reno failed.
Terminated
I was curious why the slurm was not working in this case. Its just going
through the ssh but not doing any slurm thing
Another question is what does -N%N %C really mean in export
GASNET_CSPAWN_CMD="srun -N%N %C" and how can I force it to submit job via
slurm to specific node and cores.
Thank you,
Bibek
------------------------------------------------------------------------------
_______________________________________________
Chapel-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/chapel-users