[
https://issues.apache.org/jira/browse/SPARK-53575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tengfei Huang updated SPARK-53575:
----------------------------------
Description:
With https://issues.apache.org/jira/browse/SPARK-51756 we have computed a
checksum for shuffle output which can be used to detect shuffle output changes
while retrying.
Once ShuffleFetchFailed happens and checksum mismatch detected, we need to
retry the entire consumer stages since the producer stage generates different
data across different attempts.
In this case, retrying the failed tasks for the consumer stages is not
sufficient, as the data may have been changed for other consumer tasks by the
non-deterministic producers.
was:
Retry the entire consumer stages during ShuffleFetchFailed, in the case that
the producer stage generates different data across different attempts.
In this case, retrying the failed tasks for the consumer stages is not
sufficient, as the data may have been changed for other consumer tasks by the
non-deterministic producers.
> Retry entire consumer stages for ShuffleFetchFailed when the producer stage
> is non-deterministic
> ------------------------------------------------------------------------------------------------
>
> Key: SPARK-53575
> URL: https://issues.apache.org/jira/browse/SPARK-53575
> Project: Spark
> Issue Type: Improvement
> Components: Spark Core
> Affects Versions: 4.0.1
> Reporter: Tengfei Huang
> Priority: Major
>
> With https://issues.apache.org/jira/browse/SPARK-51756 we have computed a
> checksum for shuffle output which can be used to detect shuffle output
> changes while retrying.
> Once ShuffleFetchFailed happens and checksum mismatch detected, we need to
> retry the entire consumer stages since the producer stage generates different
> data across different attempts.
> In this case, retrying the failed tasks for the consumer stages is not
> sufficient, as the data may have been changed for other consumer tasks by the
> non-deterministic producers.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]