[elasticluster] SLURM multi-node/architecture cluster config question

Champak Reddy Fri, 20 Apr 2018 01:04:23 -0700

Hi Riccardo,

I have successfully setup a multiuser cluster using Elasticluster with
multiple nodes with differing architectures but I haven't been able to run
more than a single job per node. Please find below a more detailed outline.
Note that "champost" is one of the users of the cluster (i.e. sudo sacctmgr
add user champost account=users).


champost@frontend001:~/sbatch$ cat cpuSleep.sh
#!/bin/bash

sleep 1m
champost@frontend001:~/sbatch$
champost@frontend001:~/sbatch$ for i in `seq 20`; do sbatch -J test_$i -e
test_$i.out -o test_$i.out cpuSleep.sh; done

<list of submitted jobs>

champost@frontend001:~/sbatch$ squeue
             JOBID PARTITION     NAME     USER ST       TIME  NODES
NODELIST(REASON)
               193      main  test_11 champost PD       0:00      1
(Resources)
               194      main  test_12 champost PD       0:00      1
(Priority)
               195      main  test_13 champost PD       0:00      1
(Priority)
               196      main  test_14 champost PD       0:00      1
(Priority)
               197      main  test_15 champost PD       0:00      1
(Priority)
               198      main  test_16 champost PD       0:00      1
(Priority)
               199      main  test_17 champost PD       0:00      1
(Priority)
               200      main  test_18 champost PD       0:00      1
(Priority)
               201      main  test_19 champost PD       0:00      1
(Priority)
               202      main  test_20 champost PD       0:00      1
(Priority)
               188      main   test_6 champost  R       0:23      1
16cpu-64ram-hpc002
               189      main   test_7 champost  R       0:23      1
32cpu-128ram-hpc001
               190      main   test_8 champost  R       0:23      1
32cpu-128ram-hpc002
               191      main   test_9 champost  R       0:23      1
32cpu-128ram-hpc003
               192      main  test_10 champost  R       0:23      1
32cpu-128ram-hpc004
               183      main   test_1 champost  R       0:26      1
4cpu-16ram-hpc001
               184      main   test_2 champost  R       0:26      1
4cpu-16ram-hpc002
               185      main   test_3 champost  R       0:26      1
8cpu-32ram-hpc001
               186      main   test_4 champost  R       0:26      1
8cpu-32ram-hpc002
               187      main   test_5 champost  R       0:26      1
16cpu-64ram-hpc001


*My Elastcluster config file:*

[cloud/geekloud]
provider=openstack
auth_url=<>
username=<>
password=<>
project_name=<>


[login/ubuntu]
image_user=ubuntu
image_user_sudo=root
image_sudo=True
user_key_name=<>
user_key_private=<>
user_key_public=<>

[setup/slurm]
provider=ansible
frontend_groups=slurm_master,r,glusterfs_client
4cpu-16ram-hpc_groups=slurm_worker,r,glusterfs_server,glusterfs_client
8cpu-32ram-hpc_groups=slurm_worker,r,glusterfs_server,glusterfs_client
16cpu-64ram-hpc_groups=slurm_worker,r,glusterfs_server,glusterfs_client
32cpu-128ram-hpc_groups=slurm_worker,r,glusterfs_server,glusterfs_client

# set redundancy and force "dispersed" volume
server_var_gluster_redundancy=2

# install NIS/YP to manage cluster users
global_var_multiuser_cluster=yes

global_var_upgrade_packages=yes

[cluster/slurm]
cloud=geekloud
login=ubuntu
setup=slurm
security_group=default
ssh_to=frontend

frontend_nodes=1
4cpu-16ram-hpc_nodes=2
8cpu-32ram-hpc_nodes=2
16cpu-64ram-hpc_nodes=2
32cpu-128ram-hpc_nodes=4

network_ids=<>
image_id=<>

[cluster/slurm/frontend]
flavor=8cpu-32ram-hpc

[cluster/slurm/4cpu-16ram-hpc]
flavor=4cpu-16ram-hpc

[cluster/slurm/8cpu-32ram-hpc]
flavor=8cpu-32ram-hpc

[cluster/slurm/16cpu-64ram-hpc]
flavor=16cpu-64ram-hpc

[cluster/slurm/32cpu-128ram-hpc]
flavor=32cpu-128ram-hpc


I don't understand if the solution lies in adding something to the
Elasticluster configuration or if it has something to do with sacctmgr like
I did with adding users to the cluster. All help is appreciated.

Cheers,
Champak

-- 
You received this message because you are subscribed to the Google Groups 
"elasticluster" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

[elasticluster] SLURM multi-node/architecture cluster config question

Reply via email to