It is not consistent each attempt as far as which hosts fail. On one
attempt a server will fail, on the next attempt the same server will not
fail and if I attempt to gather facts manually after it failed, it is able
to gather facts successfully each time.
But here is a host that failed on the last attempt:
/usr/bin/ansible server74.prod.domain -m ping -vvvv -c ssh
<server74.prod.domain> ESTABLISH CONNECTION FOR USER: bmorriso
<server74.prod.domain> REMOTE_MODULE ping
<server74.prod.domain> EXEC ['ssh', '-C', '-vvv', '-o',
'ControlMaster=auto', '-o', 'ControlPersist=5m', '-o',
'ControlPath=/home/bmorriso/.ansible/cp/ansible-ssh-%h-%p-%r', '-o',
'StrictHostKeyChecking=no', '-o', 'Port=3422', '-o',
'KbdInteractiveAuthentication=no', '-o',
'PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey',
'-o', 'PasswordAuthentication=no', '-o', 'ConnectTimeout=10',
'server74.prod.domain', u"/bin/sh -c 'LANG=C LC_CTYPE=C /usr/bin/python'"]
server74.prod.domain | success >> {
"changed": false,
"ping": "pong"
}
--forks was set to 50 and I saw ~35 hosts fail
--forks set to 25, only 5 failed and it ran in 47s
--forks set to 15, none failed and it ran in 53s
--forks set to 20, none failed and it ran in 45s
The above are all with host checking off.
Here is another "twist". With --forks passed, if the fact gathering doesn't
fail, a task will, and like fact gathering before, it's never the same task
that fails.
Task fails with: "ssh connection closed waiting for sudo or su password
prompt"
With host checking on
--forks 25 = 10m
--forks 50 = 10m
FWIW, If I set forks: 50 in /etc/ansible/ansible.cfg -- it still acts as if
it is set to 5, only when I pass --forks 50 in the command does it actually
seem to run at 50.
Also, no "old" version of Ansible
which ansible
/usr/bin/ansible
/usr/bin/ansible --version
ansible 1.7.1
Hope this helps, but fear it may add to the confusion.
On Tuesday, September 23, 2014 6:39:47 PM UTC-7, Michael DeHaan wrote:
>
> "With it commented, no failures, I'm able to communicate with all
> servers. "
>
> This part is a little interesting.
>
> Turning off host checking and going slow you can talk to all your hosts.
> Going fast you cannot?
>
> (If this is repeatable, I wonder if maybe you have an SSH jumphost
> configured that might be getting overwhelmed? Or perhaps something
> similar on the network?)
>
> Can I ask what --forks is set to?
>
>
>
> On Tue, Sep 23, 2014 at 9:36 PM, Michael DeHaan <[email protected]
> <javascript:>> wrote:
>
>> Ok Barry,
>>
>> We'll get you sorted before you wander off and lose a limb :)
>>
>> These things seem to be unrelated.
>>
>> (A)
>>
>> This has happened in the past when the host key of a host doesn't
>> *appear* to Ansible's ssh.py connection type to be in the known hosts file,
>> and it creates a serial lock to ask you the question about whether it
>> should be added - but for whatever reason, knew it was actually there.
>> The result of this is that --forks is not used on the first task per host,
>> which makes things not be parallel. It's frustrating.
>>
>> This was fixed long ago, when we added knowledge about hashed known_hosts
>> entries, and should be quite good today, especially on a well tested OS
>> like 14.04, basically at the top of our test matrix. Finding it again now
>> is curious.
>>
>> I'd worry if something else might be interferring with the lock. My
>> first question is if (maybe privately), we could see your known_hosts file?
>>
>> So we're not quite out of that territory yet with host key checking on,
>> but I'm still curious about why it may still be doing that.
>>
>> There may be a slim chance you're actually using an older ansible
>> version, or they are hashed weirdly for some reason.
>>
>> I'll assume this is happening with "-c ssh".
>>
>> (I'd also be curious if this happens on the development branch, but I
>> don't anticipate any changes there)
>>
>> (B)
>>
>> On the second question, I'm expecting these 10 hosts are consistently
>> doing that between runs, as in the same hosts?
>>
>> Can I get the result of an /usr/bin/ansible hostname -m ping -vvvv -c ssh
>> against one of them?
>>
>> That will engage SSH debug mode and tell us a little more about what may
>> be up.
>>
>> They could actually be down, but I'm guessing you checked that. That
>> being returned extraneously is not expected.
>>
>> It could also be that ansible_ssh_port or something needs to be set in
>> inventory or whatever, and it's not normally set, firewall issues, or
>> things like that?
>>
>> Let's start with the "-vvvv" part.
>>
>> Thanks!
>>
>>
>>
>>
>>
>>
>> On Tue, Sep 23, 2014 at 9:20 PM, Barry Morrison <[email protected]
>> <javascript:>> wrote:
>>
>>> Oh, FWIW, I'm touching over 350 servers with this playbook and gathering
>>> facts from all of them.
>>>
>>> On Tuesday, September 23, 2014 6:17:53 PM UTC-7, Barry Morrison wrote:
>>>>
>>>> Spawned from Conversation with Michael on Twitter https://twitter.com/
>>>> esacteksab/status/514558427217936384
>>>>
>>>> Uncommenting host_key_checking = False, a playbook runs in 35s
>>>> Commenting host_key_checking = False, the playbook runs in 9m25s
>>>>
>>>> But with it uncommented, ~10% of the servers return: "SSH Error: data
>>>> could not be sent to the remote host. Make sure this host can be reached
>>>> over ssh"
>>>>
>>>> With it commented, no failures, I'm able to communicate with all
>>>> servers.
>>>>
>>>> This is a topic for to troubleshoot further, because Twitter and 140
>>>> chars isn't all that great.
>>>>
>>>> Ansible is 1.7.1 on Ubuntu 14.04
>>>> Servers are a combination of Ubuntu 12.04 and 14.04
>>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "Ansible Project" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected] <javascript:>.
>>> To post to this group, send email to [email protected]
>>> <javascript:>.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/ansible-project/78fd3ef2-1b80-4167-b2f6-99d49569a177%40googlegroups.com
>>>
>>> <https://groups.google.com/d/msgid/ansible-project/78fd3ef2-1b80-4167-b2f6-99d49569a177%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>
>
--
You received this message because you are subscribed to the Google Groups
"Ansible Project" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/ansible-project/819205f6-e110-47d2-a43c-1b93897322f6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.