Hi Bibek -

I use SLURM myself and have some tidbits for you:
- there is a SLURM-specific launcher for inifiniband
   although I havn't tried it, you should be able to use
   it with export CHPL_LAUNCHER=slurm-gasnetrun_ibv
- With the environment you are describing, you are using
   the SSH job spawner but specified a custom spawn command.
   To use the custom spawner, you'd need to
  $ export GASNET_SPAWNFN=C
   and then the GASNET_CSPAWN_CMD will apply.
- But that doesn't work for me with Infiniband - I have
   to use the SSH spawner. So I do something like this:

  $ salloc -N2
  $ export GASNET_IBV_SPAWNER=ssh
  $ export GASNET_MXM_SPAWNER=ssh
  $export GASNET_SSH_SERVERS=`scontrol show hostnames`
  $ ./hello6-taskpar-dist -nl 2

  (and then you have to type 'exit' to release your
   reservation on 2 nodes created with salloc).

- You should investigate the SLURM documentation to
   find the answers to your other questions.

Hope that helps.

-michael


On 03/27/2014 03:58 PM, Bibek Ghimire wrote:
> Hi there,
>              Since a simple hello world application works. I tried running 
> chapel program for multiple node. The computer I am running has slurm as job 
> scheduler. But I am running into some problem. I even tried running chapel 
> binary directly into the slurm but it does not work that way.
>
> /$ srun -p marvin -N 2 -n 4 -c 8 ./hello6-taskpar-dist/
>
> /error: Specify number of locales via -nl <#> or --numLocales=<#>/
>
> /error: Specify number of locales via -nl <#> or --numLocales=<#>/
>
> /error: Specify number of locales via -nl <#> or --numLocales=<#>/
>
> /error: Specify number of locales via -nl <#> or --numLocales=<#>/
>
> *I understand why this error appears. *
>
>
> Then I read README.multilocale and README.launcher there I found couple of 
> things on how to launch chapel using slurm. So I tried exporting ...
>
>   exportCHPL_COMM=gasnet
>
>   exportCHPL_COMM_SUBSTRATE=ibv
>
>   exportCHPL_LAUNCHER_WALLTIME=00:15:00
>
>   exportGASNET_SPAWNFN=S
>
>   exportGASNET_SSH_SERVERS="reno lyra01"
>
>   exportSSH_CMD=ssh
>
>   exportSSH_OPTIONS=-x
>
>   export GASNET_CSPAWN_CMD="srun -N%N %C"
>
> and yes I did recompiled after doing all this.
>
>
> when I do
>
> ./hello6-taskpar-dist -nl 2
>
> Access denied: user bghimire (uid=3030) has no active jobs.
>
> Connection closed by 12.23.1.1
>
> connection to reno failed.
>
> Terminated
>
>
> I was curious why the slurm was not working in this case. Its just going 
> through the ssh but not doing any slurm thing
>
> Another question is what does -N%N %C really mean in  export 
> GASNET_CSPAWN_CMD="srun -N%N %C" and how can I force it to submit job via 
> slurm to specific node and cores.
>
>
>
>
>
>
>
>
> Thank you,
> Bibek
>
>


------------------------------------------------------------------------------
_______________________________________________
Chapel-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/chapel-users

Reply via email to