[ https://issues.apache.org/jira/browse/SPARK-6900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14511283#comment-14511283 ]
Nicholas Chammas commented on SPARK-6900: ----------------------------------------- That is correct. So again the solution I would favor in this instance is: {quote} It would be good to detect if instances terminate prematurely and will never be ready, but in that case I think spark-ec2 should error out somehow instead of continuing. {quote} I think that solves the problem of the infinite loop, and makes it easier for an external script to detect that something went wrong and automatically try {{--resume}}. > spark ec2 script enters infinite loop when run-instance fails > ------------------------------------------------------------- > > Key: SPARK-6900 > URL: https://issues.apache.org/jira/browse/SPARK-6900 > Project: Spark > Issue Type: Bug > Components: EC2 > Affects Versions: 1.3.0 > Reporter: Guodong Wang > > I am using spark-ec2 scripts to launch spark cluters in AWS. > Recently, in our AWS region, there were some tech issues about AWS EC2 > service. > When spark-ec2 send the run-instance requests to EC2, not all the requested > instances were launched. Some instance was terminated by AWS-EC2 service > before it was up. > But spark-ec2 script would wait for all the instances to enter 'ssh-ready' > status. So, the script enters the infinite loop. Because the terminated > instances would never be 'ssh-ready'. > In my opinion, it should be OK if some of the slave instances were > terminated. As long as the master node is running, the terminated slaves > should be filtered and the cluster should be setup. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org