[ https://issues.apache.org/jira/browse/WHIRR-414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13137790#comment-13137790 ]
David Alves commented on WHIRR-414: ----------------------------------- +1 I think beyond the fact that orphaned instances remain, the user should be allow to try again not discarding the instances that did boot (this happens ever more frequently for larger clusters). A possible solution would be to implement (and let the user choose) some policies, like: 1 (Default) - If the cluster cannot start make sure everything is terminated 2 - Whirr detects that not enough instances come alive until a given time and kills the slow ones and boots new ones. Bootstrap goes on for N attempts. 3 - Do nothing, if the cluster fails to boot let the user handle it. what do you think? > whirr can have a non-zero return code and unterminated (orphaned) host > instances > -------------------------------------------------------------------------------- > > Key: WHIRR-414 > URL: https://issues.apache.org/jira/browse/WHIRR-414 > Project: Whirr > Issue Type: Bug > Components: core > Affects Versions: 0.6.0 > Environment: EC2, commandline whirr > Reporter: Paul Baclace > Priority: Critical > > Whirr can fail to completely start a cluster and indicates this with a > non-zero return code. In many (currently intermittent) partial failure > scenarios, there are resources still active (EC2 machine instances, in my > experience) that are not cleaned up. > The log contains "IOException: Too many instance failed while bootstrapping!" > when I have seen orphaned nodes. > A non-zero return code should guarantee that all resources are cleaned up. > Without this post-condition, these failures require manual inspection and > cleanup to stop useless expenses (which is why I marked this bug critical; it > needs to be addressed for any kind of cron job triggered whirr). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira