[
https://issues.apache.org/jira/browse/HIVE-9017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14244944#comment-14244944
]
Marcelo Vanzin commented on HIVE-9017:
--------------------------------------
These files are created by Spark when downloading resources for the app (e.g.
application jars). In standalone mode, by default, these files will end up in
/tmp (java.io.tmpdir). The problem is that the app doesn't clean up these
files; in fact, it can't, because they are supposed to be shared in case
multiple executors run on the same host - so one executor cannot unilaterally
decide to delete them.
(That's not entirely true; I guess it could, but then it would cause other
executors to re-download the file when needed, so more overhead.)
This is not a problem in Yarn mode, since the temp dir is under a Yarn-managed
directory that is deleted when the app shuts down.
So, while I think of a clean way to fix this in Spark, the following can be
done on the Hive side:
- create an app-specific temp directory before launching the Spark app
- set {{spark.local.dir}} to that location
- delete the directory when the client shuts down
> Clean up temp files of RSC [Spark Branch]
> -----------------------------------------
>
> Key: HIVE-9017
> URL: https://issues.apache.org/jira/browse/HIVE-9017
> Project: Hive
> Issue Type: Sub-task
> Components: Spark
> Reporter: Rui Li
>
> Currently RSC will leave a lot of temp files in {{/tmp}}, including
> {{*_lock}}, {{*_cache}}, {{spark-submit.*.properties}}, etc.
> We should clean up these files or it will exhaust disk space.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)