[
https://issues.apache.org/jira/browse/SPARK-5970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sean Owen updated SPARK-5970:
-----------------------------
Priority: Minor (was: Major)
Assignee: Milan Straka
> Temporary directories are not removed (but their content is)
> ------------------------------------------------------------
>
> Key: SPARK-5970
> URL: https://issues.apache.org/jira/browse/SPARK-5970
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Affects Versions: 1.2.1
> Environment: Linux, 64bit
> spark-1.2.1-bin-hadoop2.4.tgz
> Reporter: Milan Straka
> Assignee: Milan Straka
> Priority: Minor
> Fix For: 1.4.0
>
>
> How to reproduce:
> - extract spark-1.2.1-bin-hadoop2.4.tgz
> - without any further configuration, run bin/pyspark
> - run sc.stop() and close python shell
> Expected results:
> - no temporary directories are left in /tmp
> Actual results:
> - four empty temporary directories are created in /tmp, for example after
> {{ls -d /tmp/spark*}}:{code}
> /tmp/spark-1577b13d-4b9a-4e35-bac2-6e84e5605f53
> /tmp/spark-96084e69-77fd-42fb-ab10-e1fc74296fe3
> /tmp/spark-ab2ea237-d875-485e-b16c-5b0ac31bd753
> /tmp/spark-ddeb0363-4760-48a4-a189-81321898b146
> {code}
> The issue is caused by changes in {{util/Utils.scala}}. Consider the
> {{createDirectory}}:
> {code} /**
> * Create a directory inside the given parent directory. The directory is
> guaranteed to be
> * newly created, and is not marked for automatic deletion.
> */
> def createDirectory(root: String, namePrefix: String = "spark"): File = ...
> {code}
> The {{createDirectory}} is used in two places. The first is in
> {{createTempDir}}, where it is marked for automatic deletion:
> {code}
> def createTempDir(
> root: String = System.getProperty("java.io.tmpdir"),
> namePrefix: String = "spark"): File = {
> val dir = createDirectory(root, namePrefix)
> registerShutdownDeleteDir(dir)
> dir
> }
> {code}
> Nevertheless, it is also used in {{getOrCreateLocalDirs}} where it is _not_
> marked for automatic deletion:
> {code}
> private[spark] def getOrCreateLocalRootDirs(conf: SparkConf): Array[String]
> = {
> if (isRunningInYarnContainer(conf)) {
> // If we are in yarn mode, systems can have different disk layouts so
> we must set it
> // to what Yarn on this system said was available. Note this assumes
> that Yarn has
> // created the directories already, and that they are secured so that
> only the
> // user has access to them.
> getYarnLocalDirs(conf).split(",")
> } else {
> // In non-Yarn mode (or for the driver in yarn-client mode), we cannot
> trust the user
> // configuration to point to a secure directory. So create a
> subdirectory with restricted
> // permissions under each listed directory.
> Option(conf.getenv("SPARK_LOCAL_DIRS"))
> .getOrElse(conf.get("spark.local.dir",
> System.getProperty("java.io.tmpdir")))
> .split(",")
> .flatMap { root =>
> try {
> val rootDir = new File(root)
> if (rootDir.exists || rootDir.mkdirs()) {
> Some(createDirectory(root).getAbsolutePath())
> } else {
> logError(s"Failed to create dir in $root. Ignoring this
> directory.")
> None
> }
> } catch {
> case e: IOException =>
> logError(s"Failed to create local root dir in $root. Ignoring
> this directory.")
> None
> }
> }
> .toArray
> }
> }
> {code}
> Therefore I think the
> {code}
> Some(createDirectory(root).getAbsolutePath())
> {code}
> should be replaced by something like (I am not an experienced Scala
> programmer):
> {code}
> val dir = createDirectory(root)
> registerShutdownDeleteDir(dir)
> Some(dir.getAbsolutePath())
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]