[ansible-project] Re: Windows - Ansible freezes when network connection disrupted

Matt Davis Tue, 28 Jun 2016 15:53:46 -0700

Great- sorry we had to go so far afield there, but I'll make sure I try 
this test on the Windows async stuff so you can hopefully throw it away 
soon. :)


On Tuesday, June 28, 2016 at 3:44:59 PM UTC-7, Rahul Garg wrote:
>
> Thank you for all the help Matt, I was finally able to solve this today :)
> I did it using scheduled job as you had mentioned, basically deferring the 
> install action on the host to a later time so I don't lose the connection.
>
> Fun times!
>
> On Monday, 27 June 2016 17:57:41 UTC-7, Matt Davis wrote:
>>
>> The closest thing I've been able to approximate what you're doing is 
>> using devmanview /disable_enable to bounce my WinRM connection's NIC- it 
>> definitely hangs on the Receive in that case, as I expected. Regardless, 
>> it's a race, so without deferring the action on the Windows side, it's 
>> possible that the NIC could bounce even before you've gotten the response 
>> from the WinRM Command POST, much less the actual process results via the 
>> next WinRM Receive call. 
>>
>> Unless you want to start working with things at the winrm level (trust 
>> me, you probably don't), the trick is going to be deferring the device 
>> bounce until *after* the WinRM session has completed (where I was going 
>> with the sleep before the command in a separate process). This is further 
>> complicated by WinRM's aggressive nuking of child processes once the parent 
>> shell has exited, so it's also possible you're running into that (eg, WinRM 
>> call completes, while Powershell subprocess is still sleeping, WinRM 
>> helpfully nukes the process for you before/during the action you want). 
>>
>> You might also want to look into doing this in a scheduled job- that 
>> would at least let you escape the constraints of the WinRM environment, 
>> though it brings a whole host of other problems, too...
>>
>> Or just wait for async in 2.2. :)
>>
>>
>>
>>
>>
>> On Monday, June 27, 2016 at 3:03:20 PM UTC-7, Rahul Garg wrote:
>>>
>>> Thanks for the suggestions Matt. I've tried both the approaches. No luck 
>>> unfortunately :|
>>> From what I understand, Ansible seems to be waiting for the result to 
>>> get added to the results dictionary. Even though I have changed the 
>>> timeouts in win_reconnect (the plugin which I wrote).
>>>
>>> I've tried running it programmatically and through playbook as well. 
>>> Same findings on both.
>>>
>>> Here's <http://pastebin.com/Ydy0i7xp> a bit of a traceback, this 
>>> happens when I manually stop the run (ctrl + c), if it helps. 
>>> I've put the code over here 
>>> <https://bitbucket.org/vihu89/ansible_windows_reconnect> (it is highly 
>>> under developed as of now), you might have to modify certain things to make 
>>> it work if you want to reproduce in your own environment.
>>>
>>> Thanks for the time you've taken in helping me figure this out!
>>>
>>> On Monday, 27 June 2016 13:15:29 UTC-7, Matt Davis wrote:
>>>>
>>>> Without being able to reproduce what it's actually doing on my end, I 
>>>> suspect it's blocking on the winrm Receive (you could verify that by 
>>>> inserting Fiddler or another proxy in the middle). That *should* time out 
>>>> eventually when no output comes back within the read timeout window- how 
>>>> long have you waited? (could also try setting 
>>>> ansible_winrm_read_timeout_sec to a nice low number to make it come back 
>>>> faster)
>>>>
>>>> Another way you might be able to handle this (as kind of a poor-man's 
>>>> async) would be to spawn the command in a new process via exec_command, 
>>>> and 
>>>> include a delay to prevent the hang during result fetch/disconnect, like:
>>>>
>>>> start-process -nonewwindow powershell.exe "-command sleep 2; 
>>>> pnputil.exe -i -a driver/path"
>>>>
>>>> Unless you capture and marshal the results to a file yourself, you 
>>>> wouldn't be able to detect a failure (this is the heavy-lifting that async 
>>>> does for you), but should get the job done on the happy path.
>>>>
>>>> On Monday, June 27, 2016 at 9:45:37 AM UTC-7, Rahul Garg wrote:
>>>>>
>>>>> Hi Matt,
>>>>>
>>>>> Thank you for the advice, appreciate it!
>>>>>
>>>>> I tried doing it the 'cleaner' way using similar logic as in 
>>>>> win_reboot.py however after some initial testing Ansible still seems to 
>>>>> freeze on the connection.
>>>>> Could you please take a look at my plugin 
>>>>> <http://pastebin.com/n8E0Jruh> and let me know where I'm going wrong 
>>>>> or if I'm missing some tiny little detail.
>>>>>
>>>>> It basically freezes after the install driver command is sent to the 
>>>>> windows host (using pnputil).
>>>>>
>>>>> Thank you!
>>>>>
>>>>> On Monday, 20 June 2016 09:55:58 UTC-7, Matt Davis wrote:
>>>>>>
>>>>>> The module subsystem alone is not (and pretty much cannot safely) be 
>>>>>> made resilient to modules that interrupt the network connection.
>>>>>>
>>>>>> That said, all the bits and pieces are there to do what you need if 
>>>>>> you're doing custom work, but you'd have to string them together 
>>>>>> yourself 
>>>>>> to make an action/module pair that can be resilient to changes that 
>>>>>> interrupt the network connection. Take a look at the way the win_reboot 
>>>>>> action 
>>>>>> <https://github.com/ansible/ansible/blob/devel/lib/ansible/plugins/action/win_reboot.py>
>>>>>>  
>>>>>> works- you can follow a similar pattern yourself: write a custom action 
>>>>>> plugin (and stuff it in action_plugins/ next to your playbook). The 
>>>>>> action 
>>>>>> plugin can exec the module, catch the network failure, poke at the box 
>>>>>> until it responds again, then ensure the changes were made correctly. 
>>>>>> There 
>>>>>> are several different approaches to doing this that work, depending on 
>>>>>> how 
>>>>>> exactly correct you want to be about races, failures that look like 
>>>>>> successes, etc, but the naive "happy path" case is very simple to 
>>>>>> implement.
>>>>>>
>>>>>> Or you could just wait for async. :)
>>>>>>
>>>>>>
>>>>>> On Sunday, June 19, 2016 at 5:15:49 PM UTC-7, Rahul Garg wrote:
>>>>>>>
>>>>>>> Hi Matt,
>>>>>>>
>>>>>>> You mention the async support is going to be put in 2.2. Is there 
>>>>>>> any other workaround for this problem other than the win_scheduled_task 
>>>>>>> module.
>>>>>>> For example, can we use polling/pinging to see whether the 
>>>>>>> connection is back up?
>>>>>>>
>>>>>>> I have tried several methods but none seem to work, 
>>>>>>> http://pastebin.com/PS82PnBF is what I came up with but even this 
>>>>>>> freezes after installation.
>>>>>>>
>>>>>>> Appreciate any help/advice.
>>>>>>>
>>>>>>>
>>>>>>> On Tuesday, 3 May 2016 11:10:37 UTC-7, Matt Davis wrote:
>>>>>>>>
>>>>>>>> Depending on what you're trying to do, doing it as a scheduled 
>>>>>>>> task/script might make sense in the interim (eg, see 
>>>>>>>> http://docs.ansible.com/ansible/win_scheduled_task_module.html)
>>>>>>>>
>>>>>>>> On Tuesday, May 3, 2016 at 11:09:12 AM UTC-7, Matt Davis wrote:
>>>>>>>>>
>>>>>>>>> SSH seems to be very tolerant of momentary connection losses, so 
>>>>>>>>> long as the connection isn't actually "refused". 
>>>>>>>>>
>>>>>>>>> WinRM under the covers is a very different beast (HTTP-based, 
>>>>>>>>> logical connection instead of a single fixed TCP connection). It 
>>>>>>>>> might be 
>>>>>>>>> possible to retry certain parts of the WinRM exchange, but in general 
>>>>>>>>> it's 
>>>>>>>>> not safe to blanket retry requests (eg, you don't want to 
>>>>>>>>> accidentally run 
>>>>>>>>> something twice). The problem case is where a connectivity change 
>>>>>>>>> like that 
>>>>>>>>> happens before we receive the HTTP response from the Command/Send 
>>>>>>>>> actions 
>>>>>>>>> (retrying Receive would probably be OK).
>>>>>>>>>
>>>>>>>>> The "right" way to deal with this would probably be to use async, 
>>>>>>>>> but that didn't make it in for Windows for 2.1 (should be in 2.2). 
>>>>>>>>> Async 
>>>>>>>>> *should* be tolerant of most kinds of dodgy/unstable connections...
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tuesday, May 3, 2016 at 10:52:09 AM UTC-7, [email protected] 
>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> I am running Windows modules that disrupt the network connection. 
>>>>>>>>>> For instance, the installation of a network driver or the creation 
>>>>>>>>>> of a 
>>>>>>>>>> Network Team. The IP address doesn't change, and the network 
>>>>>>>>>> connection is 
>>>>>>>>>> only out for a few moments. But when these run, my Ansible playbook 
>>>>>>>>>> basically freezes - it just sits there running the task until 
>>>>>>>>>> Ansible times 
>>>>>>>>>> out and the playbook fails. My colleagues tell me Linux handles this 
>>>>>>>>>> gracefully, reconnecting and continuing when the connection is back 
>>>>>>>>>> up. Any 
>>>>>>>>>> idea how I can get this behavior with Windows?
>>>>>>>>>>
>>>>>>>>>

-- 
You received this message because you are subscribed to the Google Groups 
"Ansible Project" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/ansible-project/aabaf77c-b019-4423-b4bd-5da0c261bb12%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[ansible-project] Re: Windows - Ansible freezes when network connection disrupted

Reply via email to