[
https://issues.apache.org/jira/browse/SPARK-21097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16144043#comment-16144043
]
Apache Spark commented on SPARK-21097:
--------------------------------------
User 'brad-kaiser' has created a pull request for this issue:
https://github.com/apache/spark/pull/19041
> Dynamic allocation will preserve cached data
> --------------------------------------------
>
> Key: SPARK-21097
> URL: https://issues.apache.org/jira/browse/SPARK-21097
> Project: Spark
> Issue Type: Improvement
> Components: Block Manager, Scheduler, Spark Core
> Affects Versions: 2.2.0, 2.3.0
> Reporter: Brad
> Attachments: Preserving Cached Data with Dynamic Allocation.pdf
>
>
> We want to use dynamic allocation to distribute resources among many notebook
> users on our spark clusters. One difficulty is that if a user has cached data
> then we are either prevented from de-allocating any of their executors, or we
> are forced to drop their cached data, which can lead to a bad user experience.
> We propose adding a feature to preserve cached data by copying it to other
> executors before de-allocation. This behavior would be enabled by a simple
> spark config like "spark.dynamicAllocation.recoverCachedData". Now when an
> executor reaches its configured idle timeout, instead of just killing it on
> the spot, we will stop sending it new tasks, replicate all of its rdd blocks
> onto other executors, and then kill it. If there is an issue while we
> replicate the data, like an error, it takes too long, or there isn't enough
> space, then we will fall back to the original behavior and drop the data and
> kill the executor.
> This feature should allow anyone with notebook users to use their cluster
> resources more efficiently. Also since it will be completely opt-in it will
> unlikely to cause problems for other use cases.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]