[
https://issues.apache.org/jira/browse/SPARK-10607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hyukjin Kwon updated SPARK-10607:
---------------------------------
Labels: bulk-closed (was: )
> Scheduler should include defensive measures against infinite loops due to
> task commit denial
> --------------------------------------------------------------------------------------------
>
> Key: SPARK-10607
> URL: https://issues.apache.org/jira/browse/SPARK-10607
> Project: Spark
> Issue Type: Bug
> Components: Scheduler
> Affects Versions: 1.3.1, 1.4.1, 1.5.0
> Reporter: Josh Rosen
> Priority: Minor
> Labels: bulk-closed
>
> If OutputCommitter.commitTask() repeatedly fails due to the
> OutputCommitCoordinator denying the right to commit, then scheduler may get
> stuck in an infinite task retry loop. The reason for this behavior is the
> fact that DAGScheduler treats failures due to CommitDenied separately from
> other failures: they don't count towards the typical count of maximum task
> failures which can trigger a job failure. The correct fix is to add an
> upper-bound on the number of times that a commit can be denied as a
> last-ditch safety net to avoid infinite loop behavior.
> See SPARK-10381 for additional context. This is not a high priority issue to
> fix right now, since the fix in SPARK-10381 should prevent this scenario from
> happening in the first place. However, another layer of conservative
> defensive limits / timeouts certainly would not hurt.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]