[ 
https://issues.apache.org/jira/browse/LIBCLOUD-157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13210743#comment-13210743
 ] 

Tomaz Muraus commented on LIBCLOUD-157:
---------------------------------------

I agree that debugging deployment issues is currently pretty hard. I had this 
problem myself so I have recently added some changes so now if you use 
LIBCLOUD_DEBUG=<file obj> this will also turn on paramiko debug mode so this 
way you at least see paramiko debug messages.

In any case I like the suggestion #2, and #4. As far as the #3 goes I think 
max_retries=1 is too low, because in many cases node is returned in the 
response, but the actually server hasn't been fully started yet (SSH server is 
not yet listening).

In cases like this paramiko throws a socket timeout errors and if max_retries=1 
deployment would fail.
                
> Deployment script retries are brain-dead
> ----------------------------------------
>
>                 Key: LIBCLOUD-157
>                 URL: https://issues.apache.org/jira/browse/LIBCLOUD-157
>             Project: Libcloud
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.8.0
>            Reporter: Mark Nottingham
>
> in common/base, NodeDriver._run_deployment_script has the following retry 
> wrapper:
>         tries = 0
>         while tries < max_tries:
>             try:
>                 node = task.run(node, ssh_client)
>             except Exception:
>                 tries += 1
>                 if tries >= max_tries:
>                     raise LibcloudError(value='Failed after %d tries'
>                                         % (max_tries), driver=self)
>             else:
>                 ssh_client.close()
>                 return node
> The except Exception swallows *all* errors, making debugging very hard.
> Furthermore, max_tries is effectively hard-coded in deploy_node():
>             self._run_deployment_script(task=kwargs['deploy'],
>                                         node=node,
>                                         ssh_client=ssh_client,
>                                         max_tries=3)
> ... forcing people who want to control retries to spin their own 
> deploy_node().
> Suggestions:
>   - at a minimum, log or warn about the error that's caught in the retry loop
>   - better yet, make the catch more fine-grained, so that errors that we know 
> won't be retry-able will fail out immediately. 
>   - think about making the default number of max_tries 1
>   - make max_tries controllable from deploy_node

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to