[
https://issues.apache.org/jira/browse/SPARK-31208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Holden Karau updated SPARK-31208:
---------------------------------
Affects Version/s: 3.0.0
> Expose the ability for user to cleanup shuffle files
> ----------------------------------------------------
>
> Key: SPARK-31208
> URL: https://issues.apache.org/jira/browse/SPARK-31208
> Project: Spark
> Issue Type: Improvement
> Components: Kubernetes
> Affects Versions: 3.0.0, 3.1.0
> Reporter: Holden Karau
> Assignee: Holden Karau
> Priority: Major
>
> Dynamic scaling on Kubernetes (introduced in Spark 3) depends on only
> shutting down executors without shuffle files. However Spark does not
> aggressively clean up shuffle files (see SPARK-5836) and instead depends on
> JVM GC on the driver to trigger deletes. We already have a mechanism to
> explicitly clean up shuffle files from the ALS algorithm where we create a
> lot of quickly orphaned shuffle files. We should expose this as an advanced
> developer feature to enable people to better clean-up shuffle files improving
> dynamic scaling of their jobs on Kubernetes.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]