[
https://issues.apache.org/jira/browse/SPARK-24684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ryan Blue resolved SPARK-24684.
-------------------------------
Resolution: Not A Problem
Closing this. In master, the attempt number is still used. Looks like this was
just backported incorrectly by me.
> DAGScheduler reports the wrong attempt number to the commit coordinator
> -----------------------------------------------------------------------
>
> Key: SPARK-24684
> URL: https://issues.apache.org/jira/browse/SPARK-24684
> Project: Spark
> Issue Type: Bug
> Components: Spark Core, SQL
> Affects Versions: 2.1.3, 2.3.2
> Reporter: Ryan Blue
> Priority: Major
>
> SPARK-24552 changes writers to pass the task ID to the output coordinator so
> that the coordinator tracks each task uniquely because attempt numbers can be
> reused across stage attempts. However, the DAGScheduler still passes the
> attempt number when notifying the coordinator that a task has finished. The
> result is that when a task is authorized and then fails due to OOM or a
> similar error, the scheduler is notified but doesn't remove the commit
> authorization because the attempt number doesn't match. This causes infinite
> task retries.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]