[
https://issues.apache.org/jira/browse/SPARK-54729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Enrico Minack updated SPARK-54729:
----------------------------------
Description:
In a Kubernetes environment, the {{FallbackStorage}} can used when an executor
is gracefully decommissioned to migrate its shuffle data. This allows for
dynamic allocation in Kubernetes.
Let's adds a mode where shuffle data of a task can be replicate to the
{{FallbackStorage}} as soon as the task finishes. The shuffle data are still
being served by the executor while the {{FallbackStorage}} simply holds a
proactively copied replica of the data.
This brings the following advantages:
# *The decommissioning phase speed up:* The decommissioning phase is sped up
since all data already exist on the {{{}FallbackStorage{}}}. The
decommissioning phase simplifies to merely updating the location of shuffle
data to the {{{}FallbackStorage{}}}.
# *Node failure resiliency:* Shuffle data of executors that did not went
through the decommissioning phase can be recovered by simply reading from the
{{{}FallbackStorage{}}}.
There are two modes:
# *Async copy (best-effort mode):* Shuffle data are asynchronously copied
*after* a task finishes. No delay is added as data are copied in the
background. There is a high chance of the replica to exist, but no guarantee.
# *Sync copy (reliable mode):* Shuffle data are copied *at the end* of the
task. This defers the task to finish by the time needed to copy the shuffle
data. A successful task guarantees the shuffle data replica exists.
was:
In a Kubernetes environment, the {{FallbackStorage}} can used when an executor
is gracefully decommissioned to migrate its shuffle data. This allows for
dynamic allocation in Kubernetes.
Let's adds a mode where shuffle data of a task can be replicate to the
{{FallbackStorage}} as soon as the task finishes. The shuffle data are still
being served by the executor while the {{FallbackStorage}} simply holds a
proactively copied replica of the data.
This brings the following advantages:
# *The decommissioning phase speed up:* The decommissioning phase is sped up
since all data already exist on the {{FallbackStorage}}. The decommissioning
phase simplifies to merely updating the location of shuffle data to the
{{FallbackStorage}}.
# *Node failure resiliency:* Shuffle data of executors that did not went
through the decommissioning phase can be recovered by simply reading from the
{{FallbackStorage}}.
There are two modes:
# *Async copy (best-effort mode):* Shuffle data are asynchroniously copied
*after* a task finishes. No delay is added as data are copied in the
background. There is a high chance of the replica to exist, but no guarantee.
# *Sync copy (reliable mode):* Shuffle data are copied *at the end* of the
task. This defers the task to finish by the time needed to copy the shuffle
data. A successful task guarantees the shuffle data replica exists.
> Proactively replicate shuffle data to FallbackStorage
> -----------------------------------------------------
>
> Key: SPARK-54729
> URL: https://issues.apache.org/jira/browse/SPARK-54729
> Project: Spark
> Issue Type: New Feature
> Components: Spark Core
> Affects Versions: 4.2.0
> Reporter: Enrico Minack
> Priority: Major
> Labels: pull-request-available
>
> In a Kubernetes environment, the {{FallbackStorage}} can used when an
> executor is gracefully decommissioned to migrate its shuffle data. This
> allows for dynamic allocation in Kubernetes.
> Let's adds a mode where shuffle data of a task can be replicate to the
> {{FallbackStorage}} as soon as the task finishes. The shuffle data are still
> being served by the executor while the {{FallbackStorage}} simply holds a
> proactively copied replica of the data.
> This brings the following advantages:
> # *The decommissioning phase speed up:* The decommissioning phase is sped up
> since all data already exist on the {{{}FallbackStorage{}}}. The
> decommissioning phase simplifies to merely updating the location of shuffle
> data to the {{{}FallbackStorage{}}}.
> # *Node failure resiliency:* Shuffle data of executors that did not went
> through the decommissioning phase can be recovered by simply reading from the
> {{{}FallbackStorage{}}}.
> There are two modes:
> # *Async copy (best-effort mode):* Shuffle data are asynchronously copied
> *after* a task finishes. No delay is added as data are copied in the
> background. There is a high chance of the replica to exist, but no guarantee.
> # *Sync copy (reliable mode):* Shuffle data are copied *at the end* of the
> task. This defers the task to finish by the time needed to copy the shuffle
> data. A successful task guarantees the shuffle data replica exists.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]