liupengcheng created SPARK-26634:
------------------------------------
Summary: OutputCommitCoordinator may allow task of
FetchFailureStage commit again
Key: SPARK-26634
URL: https://issues.apache.org/jira/browse/SPARK-26634
Project: Spark
Issue Type: Bug
Components: Spark Core
Affects Versions: 2.4.0, 2.1.0
Reporter: liupengcheng
In our production spark cluster, we encoutered a case that the task of retry
stage due to FetchFailure is denied to commit. However, the task is the first
attempt of this retry stage.
After carefully investigating, it was found that the call of canCommit of
OutputCommitCoordinator would allow the task of FetchFailure stage(with the
same parition number as new task of retry stage) commit. which result in the
TaskCommitDenied for all the task of retry stage. This is a correctness bug.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]