I am having this same issue. Did you ever figure out a solution?
I have 3 different images I'm testing against: CentOS6, CentOS7, Sles12.
The strange thing is that I only seem to have a problem on CentOS7.
On Monday, January 25, 2016 at 2:07:14 PM UTC-7, James Cuzella wrote:
>
> Hello,
>
> I believe I've found an interesting race condition during EC2 instance
> creation due to a slow-running cloud-init process. The issue is that
> cloud-init appears to create the initial login user & installs the public
> SSH key onto a newly started EC2 instance, then restarts sshd. It takes a
> while to do this, and creates a race condition where Ansible cannot connect
> to the host and fails the playbook run. In my playbook, I'm using the ec2
> module, followed by add_host, and then wait_for to wait for the SSH port to
> be open. I have also experimented with using a simple "shell: echo
> host_is_up" command with a retry / do-until loop. However this also fails
> because Ansible wants the initial SSH connection to be successful, which it
> will not in this case. So Ansible does not retry :-(
>
> It appears that due to the user not existing until ~3 minutes after it is
> booted and sshd is listening on port 22, Ansible cannot connect as the
> initial login user for the CentOS AMI ("centos"). So the SSH port open
> check is not good enough to detect and wait for the port to be open AND the
> login user to exist. The simple echo shell command with retry do/until
> loop also does not work, because the very first SSH connection Ansible
> tries to make to run the module fails also.
>
> For some detailed debug info, and a playbook to reproduce the issue,
> please see this Gist:
> https://gist.github.com/trinitronx/afd894c89384d413597b
>
> My question is: Has anyone run into a similar issue with EC2 instances
> being slow to become available causing Ansible to fail to connect, and also
> found a solution to this?
>
> I realize that a sleep task is one possible solution (and I may be forced
> to reach for that sledgehammer), but it doesn't feel like the absolute best
> solution because we really want to wait for both cloud-init to be finished
> creating "centos" user on the instance AND SSH to be up. So really, the
> only other way I can think of is to somehow tell SSH to retry connecting as
> centos until it succeeds or a surpasses a very long timeout. Is this
> possible? Are there better ways of handling this?
>
--
You received this message because you are subscribed to the Google Groups
"Ansible Project" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/ansible-project/c2ae9bdd-fe57-492c-a756-7137dc9310ab%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.