[ https://issues.apache.org/jira/browse/SPARK-42689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Apache Spark reassigned SPARK-42689: ------------------------------------ Assignee: (was: Apache Spark) > Allow ShuffleDriverComponent to declare if shuffle data is reliably stored > -------------------------------------------------------------------------- > > Key: SPARK-42689 > URL: https://issues.apache.org/jira/browse/SPARK-42689 > Project: Spark > Issue Type: Sub-task > Components: Spark Core > Affects Versions: 3.1.0, 3.2.0, 3.3.0, 3.4.0 > Reporter: Mridul Muralidharan > Priority: Major > > Currently, if there is an executor node loss, we assume the shuffle data on > that node is also lost. This is not necessarily the case if there is a > shuffle component managing the shuffle data and reliably maintaining it (for > example, in distributed filesystem or in a disaggregated shuffle cluster). > Downstream projects have patches to Apache Spark in order to workaround this > issue, for example Apache Celeborn has > [this|https://github.com/apache/incubator-celeborn/blob/main/assets/spark-patch/RSS_RDA_spark3.patch]. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org