Hi Riccardo, Thanks for your suggestions. It didn't make sense to me either; I've never had that problem before. I was working around it yesterday by including this in the [setup/ansible-slurm] section of the config file:
ansible_private_key_file=/home/dave/.ssh/myazcert.pem --and when I did so, I was able to confirm in the debug output that Ansible was seeing it, and the ssh command was using it. Without that line, there was no "-o IdentityFile" in the ssh command, which failed. Long story short, I think it was an artifact that on that particular machine, I had cloned the Ansible repo and was running Ansible branch stable-2.0.0.1 from source. (The reason I did that was so that I could run it in the debugger to try to understand yet another weird problem, which was that Ansible wasn't interpreting expressions like '{{ansible_os_family}}' in playbooks - it was treating them as literal text.) I have had problems with any version of Ansible other than the one that Elasticluster installs, so I'm going to avoid that. This morning I'm doing a clean run on a new linux vm, and I'm not seeing these issues. Everything runs fine, far into the Ansible provisioning, when I hit this: TASK [slurm-master : Replace systemd unit file for SLURM services] ************* task path: /home/dave/.virtualenvs/elasticluster/elasticluster/elasticluster/share/playbooks/roles/slurm-master/tasks/install-slurmdbd.yml:27 fatal: [frontend001]: FAILED! => {"failed": true, "msg": "the file_name '/home/dave/.virtualenvs/elasticluster/elasticluster/elasticluster/share/playbooks/roles/usr/lib/systemd/system/slurm-llnl-slurmdbd.service' does not exist, or is not readable"} So that's my next challenge. (both the file, and even the path it's looking for don't exist. What does exist is this: '/home/dave/.virtualenvs/elasticluster/elasticluster/elasticluster/share/playbooks/roles/slurm-master/files/usr/lib/systemd/system/slurmdbd.service'.) Thanks, Dave On Friday, September 23, 2016 at 7:22:39 AM UTC-6, Riccardo Murri wrote: > > Hi Dave, > > > The caveat: I can start and stop clusters, but Ansible provisioning is > not > > working for me at the moment. I think this is minor -- Ansible is not > able > > to make an ssh connection to the nodes, because it's not trying the > right > > private key. If I extract the ssh command and add "-i mykey", the > command works. > > I'm a bit surprised here -- the code in ElastiCluster that invokes > Ansible is pretty simple: as long as the correct file name is in > `cluster.user_key_private` there should be no possibility of error... > > Can you please try to do the following with ElastiCluster 1.3.dev: > > - Create a cluster but prevent Ansible from running:: > > elasticluster start --no-setup mycluster > > - Run Ansible setup with maximum debug:: > > elasticluster -vv setup mycluster -- -vvv > > You can stop it with Ctrl+C as soon as the "TASK [setup]" part is > done. > > The output should show: > > (a) All environment variables that ElastiCluster sets for running > `ansible-playbook` (`ANSIBLE_PRIVATE_KEY_FILE` is the relevant one) > > (b) The command-line options that Ansible passes to the slave SSH > (here `-o IdentityFile=...` is important) > > > > Meanwhile, I'm starting work on updating the PR on the bobd00 fork so > that > > the changes will be ready to merge. Or maybe it would be faster to just > > make a new fork and create the PR there. > > Whichever is best/faster for you; I'm not strict on process. > > Thanks again, > Riccardo > > -- > Riccardo Murri, Schwerzenbacherstrasse 2, CH-8606 Nänikon, Switzerland > -- You received this message because you are subscribed to the Google Groups "elasticluster" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticluster+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.