Guodong Wang created SPARK-6900:
-----------------------------------

             Summary: spark ec2 script enters infinite loop when run-instance 
fails
                 Key: SPARK-6900
                 URL: https://issues.apache.org/jira/browse/SPARK-6900
             Project: Spark
          Issue Type: Bug
          Components: EC2
    Affects Versions: 1.3.0
            Reporter: Guodong Wang


I am using spark-ec2 scripts to launch spark cluters in AWS.

Recently, in our AWS region,  there were some tech issues about AWS EC2 
service. 
When spark-ec2 send the run-instance requests to EC2, not all the requested 
instances were launched. Some instance was terminated by AWS-EC2 service  
before it was up.
But spark-ec2 script would wait for all the instances to enter 'ssh-ready' 
status. So, the script enters the infinite loop. Because the terminated 
instances would never be 'ssh-ready'.

In my opinion, it should be OK if some of the slave instances were terminated. 
As long as the master node is running, the terminated slaves should be filtered 
and the cluster should be setup.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to