Ryan Blue created SPARK-24684:
---------------------------------
Summary: DAGScheduler reports the wrong attempt number to the
commit coordinator
Key: SPARK-24684
URL: https://issues.apache.org/jira/browse/SPARK-24684
Project: Spark
Issue Type: Bug
Components: Spark Core, SQL
Affects Versions: 2.1.3, 2.3.2
Reporter: Ryan Blue
SPARK-24552 changes writers to pass the task ID to the output coordinator so
that the coordinator tracks each task uniquely because attempt numbers can be
reused across stage attempts. However, the DAGScheduler still passes the
attempt number when notifying the coordinator that a task has finished. The
result is that when a task is authorized and then fails due to OOM or a similar
error, the scheduler is notified but doesn't remove the commit authorization
because the attempt number doesn't match. This causes infinite task retries.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]