Hi Riccardo,
I have successfully setup a multiuser cluster using Elasticluster with
multiple nodes with differing architectures but I haven't been able to run
more than a single job per node. Please find below a more detailed outline.
Note that "champost" is one of the users of the cluster (i.e. sudo sacctmgr
add user champost account=users).
champost@frontend001:~/sbatch$ cat cpuSleep.sh
#!/bin/bash
sleep 1m
champost@frontend001:~/sbatch$
champost@frontend001:~/sbatch$ for i in `seq 20`; do sbatch -J test_$i -e
test_$i.out -o test_$i.out cpuSleep.sh; done
<list of submitted jobs>
champost@frontend001:~/sbatch$ squeue
JOBID PARTITION NAME USER ST TIME NODES
NODELIST(REASON)
193 main test_11 champost PD 0:00 1
(Resources)
194 main test_12 champost PD 0:00 1
(Priority)
195 main test_13 champost PD 0:00 1
(Priority)
196 main test_14 champost PD 0:00 1
(Priority)
197 main test_15 champost PD 0:00 1
(Priority)
198 main test_16 champost PD 0:00 1
(Priority)
199 main test_17 champost PD 0:00 1
(Priority)
200 main test_18 champost PD 0:00 1
(Priority)
201 main test_19 champost PD 0:00 1
(Priority)
202 main test_20 champost PD 0:00 1
(Priority)
188 main test_6 champost R 0:23 1
16cpu-64ram-hpc002
189 main test_7 champost R 0:23 1
32cpu-128ram-hpc001
190 main test_8 champost R 0:23 1
32cpu-128ram-hpc002
191 main test_9 champost R 0:23 1
32cpu-128ram-hpc003
192 main test_10 champost R 0:23 1
32cpu-128ram-hpc004
183 main test_1 champost R 0:26 1
4cpu-16ram-hpc001
184 main test_2 champost R 0:26 1
4cpu-16ram-hpc002
185 main test_3 champost R 0:26 1
8cpu-32ram-hpc001
186 main test_4 champost R 0:26 1
8cpu-32ram-hpc002
187 main test_5 champost R 0:26 1
16cpu-64ram-hpc001
*My Elastcluster config file:*
[cloud/geekloud]
provider=openstack
auth_url=<>
username=<>
password=<>
project_name=<>
[login/ubuntu]
image_user=ubuntu
image_user_sudo=root
image_sudo=True
user_key_name=<>
user_key_private=<>
user_key_public=<>
[setup/slurm]
provider=ansible
frontend_groups=slurm_master,r,glusterfs_client
4cpu-16ram-hpc_groups=slurm_worker,r,glusterfs_server,glusterfs_client
8cpu-32ram-hpc_groups=slurm_worker,r,glusterfs_server,glusterfs_client
16cpu-64ram-hpc_groups=slurm_worker,r,glusterfs_server,glusterfs_client
32cpu-128ram-hpc_groups=slurm_worker,r,glusterfs_server,glusterfs_client
# set redundancy and force "dispersed" volume
server_var_gluster_redundancy=2
# install NIS/YP to manage cluster users
global_var_multiuser_cluster=yes
global_var_upgrade_packages=yes
[cluster/slurm]
cloud=geekloud
login=ubuntu
setup=slurm
security_group=default
ssh_to=frontend
frontend_nodes=1
4cpu-16ram-hpc_nodes=2
8cpu-32ram-hpc_nodes=2
16cpu-64ram-hpc_nodes=2
32cpu-128ram-hpc_nodes=4
network_ids=<>
image_id=<>
[cluster/slurm/frontend]
flavor=8cpu-32ram-hpc
[cluster/slurm/4cpu-16ram-hpc]
flavor=4cpu-16ram-hpc
[cluster/slurm/8cpu-32ram-hpc]
flavor=8cpu-32ram-hpc
[cluster/slurm/16cpu-64ram-hpc]
flavor=16cpu-64ram-hpc
[cluster/slurm/32cpu-128ram-hpc]
flavor=32cpu-128ram-hpc
I don't understand if the solution lies in adding something to the
Elasticluster configuration or if it has something to do with sacctmgr like
I did with adding users to the cluster. All help is appreciated.
Cheers,
Champak
--
You received this message because you are subscribed to the Google Groups
"elasticluster" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.