[ 
https://issues.apache.org/jira/browse/SPARK-10607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-10607:
---------------------------------
    Labels: bulk-closed  (was: )

> Scheduler should include defensive measures against infinite loops due to 
> task commit denial
> --------------------------------------------------------------------------------------------
>
>                 Key: SPARK-10607
>                 URL: https://issues.apache.org/jira/browse/SPARK-10607
>             Project: Spark
>          Issue Type: Bug
>          Components: Scheduler
>    Affects Versions: 1.3.1, 1.4.1, 1.5.0
>            Reporter: Josh Rosen
>            Priority: Minor
>              Labels: bulk-closed
>
> If OutputCommitter.commitTask() repeatedly fails due to the 
> OutputCommitCoordinator denying the right to commit, then scheduler may get 
> stuck in an infinite task retry loop. The reason for this behavior is the 
> fact  that DAGScheduler treats failures due to CommitDenied separately from 
> other failures: they don't count towards the typical count of maximum task 
> failures which can trigger a job failure. The correct fix is to add an 
> upper-bound on the number of times that a commit can be denied as a 
> last-ditch safety net to avoid infinite loop behavior.
> See SPARK-10381 for additional context. This is not a high priority issue to 
> fix right now, since the fix in SPARK-10381 should prevent this scenario from 
> happening in the first place. However, another layer of conservative 
> defensive limits / timeouts certainly would not hurt.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to