Hi Riccardo,
Thanks for your suggestions. It didn't make sense to me either; I've never
had that problem before. I was working around it yesterday by including
this in the [setup/ansible-slurm] section of the config file:
ansible_private_key_file=/home/dave/.ssh/myazcert.pem
--and when I did so, I was able to confirm in the debug output that Ansible
was seeing it, and the ssh command was using it. Without that line, there
was no "-o IdentityFile" in the ssh command, which failed.
Long story short, I think it was an artifact that on that particular
machine, I had cloned the Ansible repo and was running Ansible branch
stable-2.0.0.1 from source. (The reason I did that was so that I could run
it in the debugger to try to understand yet another weird problem, which
was that Ansible wasn't interpreting expressions like
'{{ansible_os_family}}' in playbooks - it was treating them as literal
text.)
I have had problems with any version of Ansible other than the one that
Elasticluster installs, so I'm going to avoid that. This morning I'm doing
a clean run on a new linux vm, and I'm not seeing these issues. Everything
runs fine, far into the Ansible provisioning, when I hit this:
TASK [slurm-master : Replace systemd unit file for SLURM services] *************
task path:
/home/dave/.virtualenvs/elasticluster/elasticluster/elasticluster/share/playbooks/roles/slurm-master/tasks/install-slurmdbd.yml:27
fatal: [frontend001]: FAILED! => {"failed": true, "msg": "the file_name
'/home/dave/.virtualenvs/elasticluster/elasticluster/elasticluster/share/playbooks/roles/usr/lib/systemd/system/slurm-llnl-slurmdbd.service'
does not exist, or is not readable"}
So that's my next challenge. (both the file, and even the path it's looking
for don't exist. What does exist is this:
'/home/dave/.virtualenvs/elasticluster/elasticluster/elasticluster/share/playbooks/roles/slurm-master/files/usr/lib/systemd/system/slurmdbd.service'.)
Thanks,
Dave
On Friday, September 23, 2016 at 7:22:39 AM UTC-6, Riccardo Murri wrote:
>
> Hi Dave,
>
> > The caveat: I can start and stop clusters, but Ansible provisioning is
> not
> > working for me at the moment. I think this is minor -- Ansible is not
> able
> > to make an ssh connection to the nodes, because it's not trying the
> right
> > private key. If I extract the ssh command and add "-i mykey", the
> command works.
>
> I'm a bit surprised here -- the code in ElastiCluster that invokes
> Ansible is pretty simple: as long as the correct file name is in
> `cluster.user_key_private` there should be no possibility of error...
>
> Can you please try to do the following with ElastiCluster 1.3.dev:
>
> - Create a cluster but prevent Ansible from running::
>
> elasticluster start --no-setup mycluster
>
> - Run Ansible setup with maximum debug::
>
> elasticluster -vv setup mycluster -- -vvv
>
> You can stop it with Ctrl+C as soon as the "TASK [setup]" part is
> done.
>
> The output should show:
>
> (a) All environment variables that ElastiCluster sets for running
> `ansible-playbook` (`ANSIBLE_PRIVATE_KEY_FILE` is the relevant one)
>
> (b) The command-line options that Ansible passes to the slave SSH
> (here `-o IdentityFile=...` is important)
>
>
> > Meanwhile, I'm starting work on updating the PR on the bobd00 fork so
> that
> > the changes will be ready to merge. Or maybe it would be faster to just
> > make a new fork and create the PR there.
>
> Whichever is best/faster for you; I'm not strict on process.
>
> Thanks again,
> Riccardo
>
> --
> Riccardo Murri, Schwerzenbacherstrasse 2, CH-8606 Nänikon, Switzerland
>
--
You received this message because you are subscribed to the Google Groups
"elasticluster" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.