It is not consistent each attempt as far as which hosts fail. On one 
attempt a server will fail, on the next attempt the same server will not 
fail and if I attempt to gather facts manually after it failed, it is able 
to gather facts successfully each time. 

But here is a host that failed on the last attempt:

/usr/bin/ansible server74.prod.domain -m ping -vvvv -c ssh
<server74.prod.domain> ESTABLISH CONNECTION FOR USER: bmorriso
<server74.prod.domain> REMOTE_MODULE ping
<server74.prod.domain> EXEC ['ssh', '-C', '-vvv', '-o', 
'ControlMaster=auto', '-o', 'ControlPersist=5m', '-o', 
'ControlPath=/home/bmorriso/.ansible/cp/ansible-ssh-%h-%p-%r', '-o', 
'StrictHostKeyChecking=no', '-o', 'Port=3422', '-o', 
'KbdInteractiveAuthentication=no', '-o', 
'PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey', 
'-o', 'PasswordAuthentication=no', '-o', 'ConnectTimeout=10', 
'server74.prod.domain', u"/bin/sh -c 'LANG=C LC_CTYPE=C /usr/bin/python'"]
server74.prod.domain | success >> {
    "changed": false, 
    "ping": "pong"
}


--forks was set to 50 and I saw ~35 hosts fail
--forks set to 25, only 5 failed and it ran in 47s
--forks set to 15, none failed and it ran in 53s
--forks set to 20, none failed and it ran in 45s

The above are all with host checking off. 

Here is another "twist". With --forks passed, if the fact gathering doesn't 
fail, a task will, and like fact gathering before, it's never the same task 
that fails. 

Task fails with: "ssh connection closed waiting for sudo or su password 
prompt"


With host checking on

--forks 25 = 10m
--forks 50 = 10m

FWIW, If I set forks: 50 in /etc/ansible/ansible.cfg -- it still acts as if 
it is set to 5, only when I pass --forks 50 in the command does it actually 
seem to run at 50. 

Also, no "old" version of Ansible

which ansible
/usr/bin/ansible

/usr/bin/ansible --version
ansible 1.7.1

Hope this helps, but fear it may add to the confusion. 

On Tuesday, September 23, 2014 6:39:47 PM UTC-7, Michael DeHaan wrote:
>
> "With it commented, no failures, I'm able to communicate with all 
> servers. "
>
> This part is a little interesting.
>
> Turning off host checking and going slow you can talk to all your hosts.  
> Going fast you cannot?
>
> (If this is repeatable, I wonder if maybe you have an SSH jumphost 
> configured that might be getting overwhelmed?   Or perhaps something 
> similar on the network?)
>
> Can I ask what --forks is set to?
>
>
>
> On Tue, Sep 23, 2014 at 9:36 PM, Michael DeHaan <[email protected] 
> <javascript:>> wrote:
>
>> Ok Barry,
>>
>> We'll get you sorted before you wander off and lose a limb :)
>>
>> These things seem to be unrelated.
>>
>> (A) 
>>
>> This has happened in the past when the host key of a host doesn't 
>> *appear* to Ansible's ssh.py connection type to be in the known hosts file, 
>> and it creates a serial lock to ask you the question about whether it 
>> should be added - but for whatever reason, knew it was actually there.   
>> The result of this is that --forks is not used on the first task per host, 
>> which makes things not be parallel.   It's frustrating.
>>
>> This was fixed long ago, when we added knowledge about hashed known_hosts 
>> entries, and should be quite good today, especially on a well tested OS 
>> like 14.04, basically at the top of our test matrix.   Finding it again now 
>> is curious.
>>
>> I'd worry if something else might be interferring with the lock.   My 
>> first question is if (maybe privately), we could see your known_hosts file?
>>
>> So we're not quite out of that territory yet with host key checking on, 
>> but I'm still curious about why it may still be doing that.
>>
>> There may be a slim chance you're actually using an older ansible 
>> version, or they are hashed weirdly for some reason.
>>
>> I'll assume this is happening with "-c ssh".
>>
>> (I'd also be curious if this happens on the development branch, but I 
>> don't anticipate any changes there)
>>
>> (B)
>>
>> On the second question, I'm expecting these 10 hosts are consistently 
>> doing that between runs, as in the same hosts? 
>>
>> Can I get the result of an /usr/bin/ansible hostname -m ping -vvvv -c ssh 
>> against one of them?
>>
>> That will engage SSH debug mode and tell us a little more about what may 
>> be up.
>>
>> They could actually be down, but I'm guessing you checked that.   That 
>> being returned extraneously is not expected.
>>
>> It could also be that ansible_ssh_port or something needs to be set in 
>> inventory or whatever, and it's not normally set, firewall issues, or 
>> things like that?
>>
>> Let's start with the "-vvvv" part.
>>
>> Thanks!
>>
>>
>>
>>
>>
>>
>> On Tue, Sep 23, 2014 at 9:20 PM, Barry Morrison <[email protected] 
>> <javascript:>> wrote:
>>
>>> Oh, FWIW, I'm touching over 350 servers with this playbook and gathering 
>>> facts from all of them. 
>>>
>>> On Tuesday, September 23, 2014 6:17:53 PM UTC-7, Barry Morrison wrote:
>>>>
>>>> Spawned from Conversation with Michael on Twitter https://twitter.com/
>>>> esacteksab/status/514558427217936384
>>>>
>>>> Uncommenting host_key_checking = False, a playbook runs in 35s
>>>> Commenting host_key_checking = False, the playbook runs in 9m25s
>>>>
>>>> But with it uncommented, ~10% of the servers return: "SSH Error: data 
>>>> could not be sent to the remote host. Make sure this host can be reached 
>>>> over ssh" 
>>>>
>>>> With it commented, no failures, I'm able to communicate with all 
>>>> servers. 
>>>>
>>>> This is a topic for to troubleshoot further, because Twitter and 140 
>>>> chars isn't all that great. 
>>>>
>>>> Ansible is 1.7.1 on Ubuntu 14.04
>>>> Servers are a combination of Ubuntu 12.04 and 14.04
>>>>
>>>  -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "Ansible Project" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to [email protected] <javascript:>.
>>> To post to this group, send email to [email protected] 
>>> <javascript:>.
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/ansible-project/78fd3ef2-1b80-4167-b2f6-99d49569a177%40googlegroups.com
>>>  
>>> <https://groups.google.com/d/msgid/ansible-project/78fd3ef2-1b80-4167-b2f6-99d49569a177%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Ansible Project" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/ansible-project/819205f6-e110-47d2-a43c-1b93897322f6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to