[
https://issues.apache.org/jira/browse/SPARK-4834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Josh Rosen resolved SPARK-4834.
-------------------------------
Resolution: Fixed
Fix Version/s: 1.2.1
1.3.0
Issue resolved by pull request 3705
[https://github.com/apache/spark/pull/3705]
> Spark fails to clean up cache / lock files in local dirs
> --------------------------------------------------------
>
> Key: SPARK-4834
> URL: https://issues.apache.org/jira/browse/SPARK-4834
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Affects Versions: 1.2.0
> Reporter: Marcelo Vanzin
> Fix For: 1.3.0, 1.2.1
>
>
> This issue was caused by https://github.com/apache/spark/commit/7aacb7bfa.
> That change shares downloaded jar / files among multiple executors running on
> the same host by using a lock file and a cache file for each file the
> executor needs to download. The problem is that these lock and cache files
> are never deleted.
> On Yarn, the app's dir is automatically deleted when the app ends, so no
> files are left behind. But on standalone, there's no such thing as "the app's
> dir"; files will end up in "/tmp" or in whatever place the user configure in
> "SPARK_LOCAL_DIRS", and will eventually start to fill that volume.
> We should add a way to clean up these files. It's not as simple as "hey, just
> call File.deleteOnExit()!" because we're talking about multiple processes
> accessing these files, so to maintain the efficiency gains of the original
> change, the files should only be deleted when the application is finished.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]