Hi Riccardo,

Thanks for your suggestions. It didn't make sense to me either; I've never 
had that problem before. I was working around it yesterday by including 
this in the [setup/ansible-slurm] section of the config file:

ansible_private_key_file=/home/dave/.ssh/myazcert.pem

--and when I did so, I was able to confirm in the debug output that Ansible 
was seeing it, and the ssh command was using it. Without that line, there 
was no "-o IdentityFile" in the ssh command, which failed.

Long story short, I think it was an artifact that on that particular 
machine, I had cloned the Ansible repo and was running Ansible branch 
stable-2.0.0.1 from source. (The reason I did that was so that I could run 
it in the debugger to try to understand yet another weird problem, which 
was that Ansible wasn't interpreting expressions like 
'{{ansible_os_family}}' in playbooks - it was treating them as literal 
text.)

I have had problems with any version of Ansible other than the one that 
Elasticluster installs, so I'm going to avoid that. This morning I'm doing 
a clean run on a new linux vm, and I'm not seeing these issues. Everything 
runs fine, far into the Ansible provisioning, when I hit this:

TASK [slurm-master : Replace systemd unit file for SLURM services] *************
task path: 
/home/dave/.virtualenvs/elasticluster/elasticluster/elasticluster/share/playbooks/roles/slurm-master/tasks/install-slurmdbd.yml:27
fatal: [frontend001]: FAILED! => {"failed": true, "msg": "the file_name 
'/home/dave/.virtualenvs/elasticluster/elasticluster/elasticluster/share/playbooks/roles/usr/lib/systemd/system/slurm-llnl-slurmdbd.service'
 does not exist, or is not readable"}


So that's my next challenge. (both the file, and even the path it's looking 
for don't exist. What does exist is this: 
'/home/dave/.virtualenvs/elasticluster/elasticluster/elasticluster/share/playbooks/roles/slurm-master/files/usr/lib/systemd/system/slurmdbd.service'.)

Thanks,
Dave

On Friday, September 23, 2016 at 7:22:39 AM UTC-6, Riccardo Murri wrote:
>
> Hi Dave, 
>
> > The caveat: I can start and stop clusters, but Ansible provisioning is 
> not 
> > working for me at the moment. I think this is minor -- Ansible is not 
> able 
> > to make an ssh connection to the nodes, because it's not trying the 
> right 
> > private key. If I extract the ssh command and add "-i mykey", the 
> command works. 
>
> I'm a bit surprised here -- the code in ElastiCluster that invokes 
> Ansible is pretty simple: as long as the correct file name is in 
> `cluster.user_key_private` there should be no possibility of error... 
>
> Can you please try to do the following with ElastiCluster 1.3.dev: 
>
> - Create a cluster but prevent Ansible from running:: 
>
>     elasticluster start --no-setup mycluster 
>
> - Run Ansible setup with maximum debug:: 
>
>         elasticluster -vv setup mycluster -- -vvv 
>
>   You can stop it with Ctrl+C as soon as the "TASK [setup]" part is 
>   done. 
>
> The output should show: 
>
> (a) All environment variables that ElastiCluster sets for running 
>     `ansible-playbook` (`ANSIBLE_PRIVATE_KEY_FILE` is the relevant one) 
>
> (b) The command-line options that Ansible passes to the slave SSH 
>     (here `-o IdentityFile=...` is important) 
>
>
> > Meanwhile, I'm starting work on updating the PR on the bobd00 fork so 
> that 
> > the changes will be ready to merge. Or maybe it would be faster to just 
> > make a new fork and create the PR there. 
>
> Whichever is best/faster for you; I'm not strict on process. 
>
> Thanks again, 
> Riccardo 
>
> -- 
> Riccardo Murri, Schwerzenbacherstrasse 2, CH-8606 Nänikon, Switzerland 
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticluster" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticluster+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to