[ 
https://issues.apache.org/jira/browse/SPARK-5851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14324986#comment-14324986
 ] 

Florian Verhein commented on SPARK-5851:
----------------------------------------

That makes sense.

Yeah, I ran into it yesterday. My spark-ec2/setup.sh failed (had set -u set on 
a new component I was testing), resulting in looping over setup.sh calls. 
In this case, spark_ec2.py shouldn't retry, but fail gracefully (ideally after 
performing cleanup of the cluster, and returning a failure code)

> spark_ec2.py ssh failure retry handling not always appropriate
> --------------------------------------------------------------
>
>                 Key: SPARK-5851
>                 URL: https://issues.apache.org/jira/browse/SPARK-5851
>             Project: Spark
>          Issue Type: Bug
>          Components: EC2
>            Reporter: Florian Verhein
>            Priority: Minor
>
> The following function doesn't distinguish between the ssh failing (e.g. 
> presumably a connection issue) and the remote command that it executes 
> failing (e.g. setup.sh). The latter should probably not result in a retry. 
> Perhaps tries could be an argument that is set to 1 for certain usages. 
> # Run a command on a host through ssh, retrying up to five times
> # and then throwing an exception if ssh continues to fail.
> spark-ec2: [{{def ssh(host, opts, 
> command)}}|https://github.com/apache/spark/blob/d8f69cf78862d13a48392a0b94388b8d403523da/ec2/spark_ec2.py#L953-L975]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to