[
https://issues.apache.org/jira/browse/SPARK-5851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14324986#comment-14324986
]
Florian Verhein commented on SPARK-5851:
----------------------------------------
That makes sense.
Yeah, I ran into it yesterday. My spark-ec2/setup.sh failed (had set -u set on
a new component I was testing), resulting in looping over setup.sh calls.
In this case, spark_ec2.py shouldn't retry, but fail gracefully (ideally after
performing cleanup of the cluster, and returning a failure code)
> spark_ec2.py ssh failure retry handling not always appropriate
> --------------------------------------------------------------
>
> Key: SPARK-5851
> URL: https://issues.apache.org/jira/browse/SPARK-5851
> Project: Spark
> Issue Type: Bug
> Components: EC2
> Reporter: Florian Verhein
> Priority: Minor
>
> The following function doesn't distinguish between the ssh failing (e.g.
> presumably a connection issue) and the remote command that it executes
> failing (e.g. setup.sh). The latter should probably not result in a retry.
> Perhaps tries could be an argument that is set to 1 for certain usages.
> # Run a command on a host through ssh, retrying up to five times
> # and then throwing an exception if ssh continues to fail.
> spark-ec2: [{{def ssh(host, opts,
> command)}}|https://github.com/apache/spark/blob/d8f69cf78862d13a48392a0b94388b8d403523da/ec2/spark_ec2.py#L953-L975]
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]