I should add, I am able to SSH into these hosts w/o being prompted to 
accept a key. I am able to hit these servers with Fabric, again w/o any 
prompting for a key. 

On Tuesday, September 23, 2014 7:29:21 PM UTC-7, Barry Morrison wrote:
>
> It is not consistent each attempt as far as which hosts fail. On one 
> attempt a server will fail, on the next attempt the same server will not 
> fail and if I attempt to gather facts manually after it failed, it is able 
> to gather facts successfully each time. 
>
> But here is a host that failed on the last attempt:
>
> /usr/bin/ansible server74.prod.domain -m ping -vvvv -c ssh
> <server74.prod.domain> ESTABLISH CONNECTION FOR USER: bmorriso
> <server74.prod.domain> REMOTE_MODULE ping
> <server74.prod.domain> EXEC ['ssh', '-C', '-vvv', '-o', 
> 'ControlMaster=auto', '-o', 'ControlPersist=5m', '-o', 
> 'ControlPath=/home/bmorriso/.ansible/cp/ansible-ssh-%h-%p-%r', '-o', 
> 'StrictHostKeyChecking=no', '-o', 'Port=3422', '-o', 
> 'KbdInteractiveAuthentication=no', '-o', 
> 'PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey', 
> '-o', 'PasswordAuthentication=no', '-o', 'ConnectTimeout=10', 
> 'server74.prod.domain', u"/bin/sh -c 'LANG=C LC_CTYPE=C /usr/bin/python'"]
> server74.prod.domain | success >> {
>     "changed": false, 
>     "ping": "pong"
> }
>
>
> --forks was set to 50 and I saw ~35 hosts fail
> --forks set to 25, only 5 failed and it ran in 47s
> --forks set to 15, none failed and it ran in 53s
> --forks set to 20, none failed and it ran in 45s
>
> The above are all with host checking off. 
>
> Here is another "twist". With --forks passed, if the fact gathering 
> doesn't fail, a task will, and like fact gathering before, it's never the 
> same task that fails. 
>
> Task fails with: "ssh connection closed waiting for sudo or su password 
> prompt"
>
>
> With host checking on
>
> --forks 25 = 10m
> --forks 50 = 10m
>
> FWIW, If I set forks: 50 in /etc/ansible/ansible.cfg -- it still acts as 
> if it is set to 5, only when I pass --forks 50 in the command does it 
> actually seem to run at 50. 
>
> Also, no "old" version of Ansible
>
> which ansible
> /usr/bin/ansible
>
> /usr/bin/ansible --version
> ansible 1.7.1
>
> Hope this helps, but fear it may add to the confusion. 
>
> On Tuesday, September 23, 2014 6:39:47 PM UTC-7, Michael DeHaan wrote:
>>
>> "With it commented, no failures, I'm able to communicate with all 
>> servers. "
>>
>> This part is a little interesting.
>>
>> Turning off host checking and going slow you can talk to all your hosts.  
>> Going fast you cannot?
>>
>> (If this is repeatable, I wonder if maybe you have an SSH jumphost 
>> configured that might be getting overwhelmed?   Or perhaps something 
>> similar on the network?)
>>
>> Can I ask what --forks is set to?
>>
>>
>>
>> On Tue, Sep 23, 2014 at 9:36 PM, Michael DeHaan <[email protected]> 
>> wrote:
>>
>>> Ok Barry,
>>>
>>> We'll get you sorted before you wander off and lose a limb :)
>>>
>>> These things seem to be unrelated.
>>>
>>> (A) 
>>>
>>> This has happened in the past when the host key of a host doesn't 
>>> *appear* to Ansible's ssh.py connection type to be in the known hosts file, 
>>> and it creates a serial lock to ask you the question about whether it 
>>> should be added - but for whatever reason, knew it was actually there.   
>>> The result of this is that --forks is not used on the first task per host, 
>>> which makes things not be parallel.   It's frustrating.
>>>
>>> This was fixed long ago, when we added knowledge about hashed 
>>> known_hosts entries, and should be quite good today, especially on a well 
>>> tested OS like 14.04, basically at the top of our test matrix.   Finding it 
>>> again now is curious.
>>>
>>> I'd worry if something else might be interferring with the lock.   My 
>>> first question is if (maybe privately), we could see your known_hosts file?
>>>
>>> So we're not quite out of that territory yet with host key checking on, 
>>> but I'm still curious about why it may still be doing that.
>>>
>>> There may be a slim chance you're actually using an older ansible 
>>> version, or they are hashed weirdly for some reason.
>>>
>>> I'll assume this is happening with "-c ssh".
>>>
>>> (I'd also be curious if this happens on the development branch, but I 
>>> don't anticipate any changes there)
>>>
>>> (B)
>>>
>>> On the second question, I'm expecting these 10 hosts are consistently 
>>> doing that between runs, as in the same hosts? 
>>>
>>> Can I get the result of an /usr/bin/ansible hostname -m ping -vvvv -c 
>>> ssh against one of them?
>>>
>>> That will engage SSH debug mode and tell us a little more about what may 
>>> be up.
>>>
>>> They could actually be down, but I'm guessing you checked that.   That 
>>> being returned extraneously is not expected.
>>>
>>> It could also be that ansible_ssh_port or something needs to be set in 
>>> inventory or whatever, and it's not normally set, firewall issues, or 
>>> things like that?
>>>
>>> Let's start with the "-vvvv" part.
>>>
>>> Thanks!
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Tue, Sep 23, 2014 at 9:20 PM, Barry Morrison <[email protected]> 
>>> wrote:
>>>
>>>> Oh, FWIW, I'm touching over 350 servers with this playbook and 
>>>> gathering facts from all of them. 
>>>>
>>>> On Tuesday, September 23, 2014 6:17:53 PM UTC-7, Barry Morrison wrote:
>>>>>
>>>>> Spawned from Conversation with Michael on Twitter https://twitter.com/
>>>>> esacteksab/status/514558427217936384
>>>>>
>>>>> Uncommenting host_key_checking = False, a playbook runs in 35s
>>>>> Commenting host_key_checking = False, the playbook runs in 9m25s
>>>>>
>>>>> But with it uncommented, ~10% of the servers return: "SSH Error: data 
>>>>> could not be sent to the remote host. Make sure this host can be reached 
>>>>> over ssh" 
>>>>>
>>>>> With it commented, no failures, I'm able to communicate with all 
>>>>> servers. 
>>>>>
>>>>> This is a topic for to troubleshoot further, because Twitter and 140 
>>>>> chars isn't all that great. 
>>>>>
>>>>> Ansible is 1.7.1 on Ubuntu 14.04
>>>>> Servers are a combination of Ubuntu 12.04 and 14.04
>>>>>
>>>>  -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "Ansible Project" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to [email protected].
>>>> To post to this group, send email to [email protected].
>>>> To view this discussion on the web visit 
>>>> https://groups.google.com/d/msgid/ansible-project/78fd3ef2-1b80-4167-b2f6-99d49569a177%40googlegroups.com
>>>>  
>>>> <https://groups.google.com/d/msgid/ansible-project/78fd3ef2-1b80-4167-b2f6-99d49569a177%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"Ansible Project" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/ansible-project/5850d5e2-7651-4e26-9e16-7964c656c5c8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to