Enrico Minack created SPARK-54729:
-------------------------------------
Summary: Proactively replicate shuffle data to FallbackStorage
Key: SPARK-54729
URL: https://issues.apache.org/jira/browse/SPARK-54729
Project: Spark
Issue Type: New Feature
Components: Spark Core
Affects Versions: 4.2.0
Reporter: Enrico Minack
In a Kubernetes environment, the {{FallbackStorage}} can used when an executor
is gracefully decommissioned to migrate its shuffle data. This allows for
dynamic allocation in Kubernetes.
Let's adds a mode where shuffle data of a task can be replicate to the
{{FallbackStorage}} as soon as the task finishes. The shuffle data are still
being served by the executor while the {{FallbackStorage}} simply holds a
proactively copied replica of the data.
This brings the following advantages:
# *The decommissioning phase speed up:* The decommissioning phase is sped up
since all data already exist on the {{FallbackStorage}}. The decommissioning
phase simplifies to merely updating the location of shuffle data to the
{{FallbackStorage}}.
# *Node failure resiliency:* Shuffle data of executors that did not went
through the decommissioning phase can be recovered by simply reading from the
{{FallbackStorage}}.
There are two modes:
# *Async copy (best-effort mode):* Shuffle data are asynchroniously copied
*after* a task finishes. No delay is added as data are copied in the
background. There is a high chance of the replica to exist, but no guarantee.
# *Sync copy (reliable mode):* Shuffle data are copied *at the end* of the
task. This defers the task to finish by the time needed to copy the shuffle
data. A successful task guarantees the shuffle data replica exists.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]