Your test does not work for me. Restarting SLURM does not help. Base OS is
Debian GNU/Linux 9.4 (stretch). I get errors related to lmod
TASK [lmod : Is installation directory writable?]
**********************************************************************************************************************************************************
fatal: [compute003]: FAILED! => {"changed": true, "cmd": ["test", "-w",
"/opt/lmod/7.0/"], "delta": "0:00:00.010908", "end": "2018-04-19
14:05:07.669722", "failed": true, "rc": 1, "start": "2018-04-19
14:05:07.658814", "stderr": "", "stderr_lines": [], "stdout": "",
"stdout_lines": []}
...ignoring
fatal: [compute002]: FAILED! => {"changed": true, "cmd": ["test", "-w",
"/opt/lmod/7.0/"], "delta": "0:00:00.035474", "end": "2018-04-19
14:05:08.090735", "failed": true, "rc": 1, "start": "2018-04-19
14:05:08.055261", "stderr": "", "stderr_lines": [], "stdout": "",
"stdout_lines": []}
...ignoring
and other errors such as these:
compute001 : ok=7 changed=1 unreachable=0 failed=1
compute002 : ok=121 changed=79 unreachable=0 failed=0
compute003 : ok=121 changed=79 unreachable=0 failed=0
frontend001 : ok=124 changed=87 unreachable=0 failed=0
Command `ansible-playbook
--private-key=/home/orhan/.ssh/google_compute_engine
/home/elasticluster/share/playbooks/site.yml
--inventory=/home/orhan/.elasticluster/storage/slurm-on-gce.inventory
--become --become-user=root -e
elasticluster_output_dir=/tmp/elasticluster.2WFV9u.d` failed with exit code
2.
I think in my previous tries only lmod related errors existed. For some
reason I considered them as warnings instead of errors.
*Config:*
[cloud/google]
noauth_local_webserver=yes
provider=google
gce_client_id=<>
gce_client_secret=<>
gce_project_id=tailor-193612
[login/google]
image_user=orxan.shibli
image_sudo=yes
user_key_name=elasticluster
user_key_private=~/.ssh/google_compute_engine
user_key_public=~/.ssh/google_compute_engine.pub
[setup/slurm]
frontend_groups=slurm_master
compute_groups=slurm_worker
submit_groups=slurm_submit,glusterfs_client
global_var_multiuser_cluster=yes
[cluster/slurm-on-gce]
setup=slurm
frontend_nodes=1
compute_nodes=3
ssh_to=frontend
cloud=google
login=google
flavor=n1-standard-1
security_group=default
image_id=
https://www.googleapis.com/compute/v1/projects/tailor-193612/global/images/image-23
On Thu, Apr 19, 2018 at 3:17 PM, Riccardo Murri <[email protected]>
wrote:
> Hello Orxan,
>
> I cannot reproduce this error; with a freshly-started Ubuntu 16.04
> cluster, I get::
>
> ubuntu@frontend001:~$ cat test.sh
> #! /bin/sh
>
> echo hello
>
> ubuntu@frontend001:~$ sbatch test.sh
> Submitted batch job 2
>
> ubuntu@frontend001:~$ cat slurm-2.out
> hello
>
> One caveat: right after building the cluster, the SLURM controller
> daemon was not running -- I had to restart it with "sudo service
> slurmctld restart".
>
> Did you get any errors while building the cluster? What base OS are
> you using? What config?
>
> Ciao,
> R
>
--
You received this message because you are subscribed to the Google Groups
"elasticluster" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.