[
https://issues.apache.org/jira/browse/SPARK-54729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated SPARK-54729:
-----------------------------------
Labels: pull-request-available (was: )
> Proactively replicate shuffle data to FallbackStorage
> -----------------------------------------------------
>
> Key: SPARK-54729
> URL: https://issues.apache.org/jira/browse/SPARK-54729
> Project: Spark
> Issue Type: New Feature
> Components: Spark Core
> Affects Versions: 4.2.0
> Reporter: Enrico Minack
> Priority: Major
> Labels: pull-request-available
>
> In a Kubernetes environment, the {{FallbackStorage}} can used when an
> executor is gracefully decommissioned to migrate its shuffle data. This
> allows for dynamic allocation in Kubernetes.
> Let's adds a mode where shuffle data of a task can be replicate to the
> {{FallbackStorage}} as soon as the task finishes. The shuffle data are still
> being served by the executor while the {{FallbackStorage}} simply holds a
> proactively copied replica of the data.
> This brings the following advantages:
> # *The decommissioning phase speed up:* The decommissioning phase is sped up
> since all data already exist on the {{FallbackStorage}}. The decommissioning
> phase simplifies to merely updating the location of shuffle data to the
> {{FallbackStorage}}.
> # *Node failure resiliency:* Shuffle data of executors that did not went
> through the decommissioning phase can be recovered by simply reading from the
> {{FallbackStorage}}.
> There are two modes:
> # *Async copy (best-effort mode):* Shuffle data are asynchroniously copied
> *after* a task finishes. No delay is added as data are copied in the
> background. There is a high chance of the replica to exist, but no guarantee.
> # *Sync copy (reliable mode):* Shuffle data are copied *at the end* of the
> task. This defers the task to finish by the time needed to copy the shuffle
> data. A successful task guarantees the shuffle data replica exists.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]