[ 
https://issues.apache.org/jira/browse/SPARK-16709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Shen updated SPARK-16709:
------------------------------
    Description: 
In our cluster, we set spark.speculation=true,  but when a task throw exception 
at SparkHadoopMapRedUtil.performCommit(), this task can retry infinite.
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/mapred/SparkHadoopMapRedUtil.scala#L83



  was:
In our cluster, we set spark.speculation=true,  but when a task throw exception 
at SparkHadoopMapRedUtil.performCommit(), this task can retry infinite.
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/mapred/SparkHadoopMapRedUtil.scala#L83


Index  ▴        ID      Attempt Status  Locality Level  Executor ID / Host      
Launch Time     Duration        GC Time Output Size / Records   Shuffle Read 
Size / Records     Errors
0       3091    0       SUCCESS PROCESS_LOCAL   1554 / 10.215.134.227   
2016/07/25 02:33:39     19 min  1.2 min 0.0 B / 14952528        1121.5 MB / 
15286949    
0       3794    0       FAILED  PROCESS_LOCAL   2027 / 10.215.146.29    
2016/07/25 02:47:04                     /       /       TaskCommitDenied 
(Driver denied task commit) for job: 2, partition: 0, attemptNumber: 3794
0       4094    1       FAILED  PROCESS_LOCAL   2546 / 10.196.150.233   
2016/07/25 03:08:58                     /       /       TaskCommitDenied 
(Driver denied task commit) for job: 2, partition: 0, attemptNumber: 4094
0       4384    2       FAILED  PROCESS_LOCAL   2823 / 10.215.155.155   
2016/07/25 03:29:50                     /       /       TaskCommitDenied 
(Driver denied task commit) for job: 2, partition: 0, attemptNumber: 4384
0       4573    3       FAILED  PROCESS_LOCAL   3011 / 10.215.139.24    
2016/07/25 03:46:50                     /       /       TaskCommitDenied 
(Driver denied task commit) for job: 2, partition: 0, attemptNumber: 4573
0       4805    4       SUCCESS PROCESS_LOCAL   3246 / 10.196.138.215   
2016/07/25 04:06:12     0 ms    1.3 min 0.0 B / 14952448        1121.5 MB / 
15286949    
1       3092    0       SUCCESS PROCESS_LOCAL   1505 / 10.196.130.102   
2016/07/25 02:33:39     22 min  4.9 min 0.0 B / 14953692        1121.9 MB / 
15288628    
1       3795    0       FAILED  PROCESS_LOCAL   2253 / 10.196.145.33    
2016/07/25 02:48:28                     /       /       TaskCommitDenied 
(Driver denied task commit) for job: 2, partition: 1, attemptNumber: 3795
1       4074    1       FAILED  PROCESS_LOCAL   2493 / 10.196.148.109   
2016/07/25 03:08:49                     /       /       TaskCommitDenied 
(Driver denied task commit) for job: 2, partition: 1, attemptNumber: 4074
1       4263    2       FAILED  PROCESS_LOCAL   2705 / 10.196.149.21    
2016/07/25 03:25:05                     /       /       TaskCommitDenied 
(Driver denied task commit) for job: 2, partition: 1, attemptNumber: 4263


> Task with commit failed will retry infinite when speculation set to true
> ------------------------------------------------------------------------
>
>                 Key: SPARK-16709
>                 URL: https://issues.apache.org/jira/browse/SPARK-16709
>             Project: Spark
>          Issue Type: Bug
>    Affects Versions: 1.6.0
>            Reporter: Hong Shen
>
> In our cluster, we set spark.speculation=true,  but when a task throw 
> exception at SparkHadoopMapRedUtil.performCommit(), this task can retry 
> infinite.
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/mapred/SparkHadoopMapRedUtil.scala#L83



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to