[ https://issues.apache.org/jira/browse/SPARK-16709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hong Shen updated SPARK-16709: ------------------------------ Description: In our cluster, we set spark.speculation=true, but when a task throw exception at SparkHadoopMapRedUtil.performCommit(), this task can retry infinite. https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/mapred/SparkHadoopMapRedUtil.scala#L83 was: In our cluster, we set spark.speculation=true, but when a task throw exception at SparkHadoopMapRedUtil.performCommit(), this task can retry infinite. https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/mapred/SparkHadoopMapRedUtil.scala#L83 Index ▴ ID Attempt Status Locality Level Executor ID / Host Launch Time Duration GC Time Output Size / Records Shuffle Read Size / Records Errors 0 3091 0 SUCCESS PROCESS_LOCAL 1554 / 10.215.134.227 2016/07/25 02:33:39 19 min 1.2 min 0.0 B / 14952528 1121.5 MB / 15286949 0 3794 0 FAILED PROCESS_LOCAL 2027 / 10.215.146.29 2016/07/25 02:47:04 / / TaskCommitDenied (Driver denied task commit) for job: 2, partition: 0, attemptNumber: 3794 0 4094 1 FAILED PROCESS_LOCAL 2546 / 10.196.150.233 2016/07/25 03:08:58 / / TaskCommitDenied (Driver denied task commit) for job: 2, partition: 0, attemptNumber: 4094 0 4384 2 FAILED PROCESS_LOCAL 2823 / 10.215.155.155 2016/07/25 03:29:50 / / TaskCommitDenied (Driver denied task commit) for job: 2, partition: 0, attemptNumber: 4384 0 4573 3 FAILED PROCESS_LOCAL 3011 / 10.215.139.24 2016/07/25 03:46:50 / / TaskCommitDenied (Driver denied task commit) for job: 2, partition: 0, attemptNumber: 4573 0 4805 4 SUCCESS PROCESS_LOCAL 3246 / 10.196.138.215 2016/07/25 04:06:12 0 ms 1.3 min 0.0 B / 14952448 1121.5 MB / 15286949 1 3092 0 SUCCESS PROCESS_LOCAL 1505 / 10.196.130.102 2016/07/25 02:33:39 22 min 4.9 min 0.0 B / 14953692 1121.9 MB / 15288628 1 3795 0 FAILED PROCESS_LOCAL 2253 / 10.196.145.33 2016/07/25 02:48:28 / / TaskCommitDenied (Driver denied task commit) for job: 2, partition: 1, attemptNumber: 3795 1 4074 1 FAILED PROCESS_LOCAL 2493 / 10.196.148.109 2016/07/25 03:08:49 / / TaskCommitDenied (Driver denied task commit) for job: 2, partition: 1, attemptNumber: 4074 1 4263 2 FAILED PROCESS_LOCAL 2705 / 10.196.149.21 2016/07/25 03:25:05 / / TaskCommitDenied (Driver denied task commit) for job: 2, partition: 1, attemptNumber: 4263 > Task with commit failed will retry infinite when speculation set to true > ------------------------------------------------------------------------ > > Key: SPARK-16709 > URL: https://issues.apache.org/jira/browse/SPARK-16709 > Project: Spark > Issue Type: Bug > Affects Versions: 1.6.0 > Reporter: Hong Shen > > In our cluster, we set spark.speculation=true, but when a task throw > exception at SparkHadoopMapRedUtil.performCommit(), this task can retry > infinite. > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/mapred/SparkHadoopMapRedUtil.scala#L83 -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org