Re: [elasticluster] SLURM sbatch error

Orxan Shibliyev Thu, 19 Apr 2018 07:21:59 -0700

Your test does not work for me. Restarting SLURM does not help. Base OS is
Debian GNU/Linux 9.4 (stretch). I get errors related to lmod


TASK [lmod : Is installation directory writable?]
**********************************************************************************************************************************************************
fatal: [compute003]: FAILED! => {"changed": true, "cmd": ["test", "-w",
"/opt/lmod/7.0/"], "delta": "0:00:00.010908", "end": "2018-04-19
14:05:07.669722", "failed": true, "rc": 1, "start": "2018-04-19
14:05:07.658814", "stderr": "", "stderr_lines": [], "stdout": "",
"stdout_lines": []}
...ignoring
fatal: [compute002]: FAILED! => {"changed": true, "cmd": ["test", "-w",
"/opt/lmod/7.0/"], "delta": "0:00:00.035474", "end": "2018-04-19
14:05:08.090735", "failed": true, "rc": 1, "start": "2018-04-19
14:05:08.055261", "stderr": "", "stderr_lines": [], "stdout": "",
"stdout_lines": []}
...ignoring

 and other errors such as these:

compute001               : ok=7    changed=1    unreachable=0    failed=1
compute002               : ok=121  changed=79   unreachable=0    failed=0
compute003               : ok=121  changed=79   unreachable=0    failed=0
frontend001               : ok=124  changed=87   unreachable=0    failed=0

Command `ansible-playbook
--private-key=/home/orhan/.ssh/google_compute_engine
/home/elasticluster/share/playbooks/site.yml
--inventory=/home/orhan/.elasticluster/storage/slurm-on-gce.inventory
--become --become-user=root -e
elasticluster_output_dir=/tmp/elasticluster.2WFV9u.d` failed with exit code
2.

I think in my previous tries only lmod related errors existed. For some
reason I considered them as warnings instead of errors.

*Config:*

[cloud/google]
noauth_local_webserver=yes
provider=google
gce_client_id=<>
gce_client_secret=<>
gce_project_id=tailor-193612

[login/google]
image_user=orxan.shibli
image_sudo=yes
user_key_name=elasticluster
user_key_private=~/.ssh/google_compute_engine
user_key_public=~/.ssh/google_compute_engine.pub

[setup/slurm]
frontend_groups=slurm_master
compute_groups=slurm_worker
submit_groups=slurm_submit,glusterfs_client
global_var_multiuser_cluster=yes

[cluster/slurm-on-gce]
setup=slurm
frontend_nodes=1
compute_nodes=3
ssh_to=frontend
cloud=google
login=google
flavor=n1-standard-1
security_group=default
image_id=
https://www.googleapis.com/compute/v1/projects/tailor-193612/global/images/image-23

On Thu, Apr 19, 2018 at 3:17 PM, Riccardo Murri <[email protected]>
wrote:

> Hello Orxan,
>
> I cannot reproduce this error; with a freshly-started Ubuntu 16.04
> cluster, I get::
>
> ubuntu@frontend001:~$ cat test.sh
> #! /bin/sh
>
> echo hello
>
> ubuntu@frontend001:~$ sbatch test.sh
> Submitted batch job 2
>
> ubuntu@frontend001:~$ cat slurm-2.out
> hello
>
> One caveat: right after building the cluster, the SLURM controller
> daemon was not running -- I had to restart it with "sudo service
> slurmctld restart".
>
> Did you get any errors while building the cluster?  What base OS are
> you using? What config?
>
> Ciao,
> R
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticluster" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [elasticluster] SLURM sbatch error

Reply via email to