Deployment script retries are brain-dead
----------------------------------------

                 Key: LIBCLOUD-157
                 URL: https://issues.apache.org/jira/browse/LIBCLOUD-157
             Project: Libcloud
          Issue Type: Bug
          Components: Core
    Affects Versions: 0.8.0
            Reporter: Mark Nottingham


in common/base, NodeDriver._run_deployment_script has the following retry 
wrapper:

        tries = 0
        while tries < max_tries:
            try:
                node = task.run(node, ssh_client)
            except Exception:
                tries += 1
                if tries >= max_tries:
                    raise LibcloudError(value='Failed after %d tries'
                                        % (max_tries), driver=self)
            else:
                ssh_client.close()
                return node

The except Exception swallows *all* errors, making debugging very hard.

Furthermore, max_tries is effectively hard-coded in deploy_node():

            self._run_deployment_script(task=kwargs['deploy'],
                                        node=node,
                                        ssh_client=ssh_client,
                                        max_tries=3)

... forcing people who want to control retries to spin their own deploy_node().

Suggestions:
  - at a minimum, log or warn about the error that's caught in the retry loop
  - better yet, make the catch more fine-grained, so that errors that we know 
won't be retry-able will fail out immediately. 
  - think about making the default number of max_tries 1
  - make max_tries controllable from deploy_node

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to