This is what I do, to make sure that SSH comes up, but also wait until the
user has been created on my instance.
- set_fact:
ec2_ip: "{{ ec2_name | get_instance(aws_region, state='running') }}"
- name: Wait for SSH to come up on instance
wait_for:
host: "{{ ec2_ip }}"
port: 22
delay: 15
timeout: 320
state: started
- name: Wait until the ansible user can log into the host.
local_action: command ssh -oStrictHostKeyChecking=no ansible@{{ ec2_ip }} exit
register: ssh_output
until: ssh_output.rc == 0
retries: 20
delay: 10
On Monday, May 9, 2016 at 1:05:11 PM UTC-7, Jared Bristow wrote:
>
> I am having this same issue. Did you ever figure out a solution?
>
> I have 3 different images I'm testing against: CentOS6, CentOS7, Sles12.
> The strange thing is that I only seem to have a problem on CentOS7.
>
> On Monday, January 25, 2016 at 2:07:14 PM UTC-7, James Cuzella wrote:
>>
>> Hello,
>>
>> I believe I've found an interesting race condition during EC2 instance
>> creation due to a slow-running cloud-init process. The issue is that
>> cloud-init appears to create the initial login user & installs the public
>> SSH key onto a newly started EC2 instance, then restarts sshd. It takes a
>> while to do this, and creates a race condition where Ansible cannot connect
>> to the host and fails the playbook run. In my playbook, I'm using the ec2
>> module, followed by add_host, and then wait_for to wait for the SSH port to
>> be open. I have also experimented with using a simple "shell: echo
>> host_is_up" command with a retry / do-until loop. However this also fails
>> because Ansible wants the initial SSH connection to be successful, which it
>> will not in this case. So Ansible does not retry :-(
>>
>> It appears that due to the user not existing until ~3 minutes after it is
>> booted and sshd is listening on port 22, Ansible cannot connect as the
>> initial login user for the CentOS AMI ("centos"). So the SSH port open
>> check is not good enough to detect and wait for the port to be open AND the
>> login user to exist. The simple echo shell command with retry do/until
>> loop also does not work, because the very first SSH connection Ansible
>> tries to make to run the module fails also.
>>
>> For some detailed debug info, and a playbook to reproduce the issue,
>> please see this Gist:
>> https://gist.github.com/trinitronx/afd894c89384d413597b
>>
>> My question is: Has anyone run into a similar issue with EC2 instances
>> being slow to become available causing Ansible to fail to connect, and also
>> found a solution to this?
>>
>> I realize that a sleep task is one possible solution (and I may be forced
>> to reach for that sledgehammer), but it doesn't feel like the absolute best
>> solution because we really want to wait for both cloud-init to be finished
>> creating "centos" user on the instance AND SSH to be up. So really, the
>> only other way I can think of is to somehow tell SSH to retry connecting as
>> centos until it succeeds or a surpasses a very long timeout. Is this
>> possible? Are there better ways of handling this?
>>
>
--
You received this message because you are subscribed to the Google Groups
"Ansible Project" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/ansible-project/99e8d84a-7856-436a-bb31-10807317f47f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.