Milan Straka created SPARK-5970:
-----------------------------------
Summary: Temporary directories are not removed (but their content
is)
Key: SPARK-5970
URL: https://issues.apache.org/jira/browse/SPARK-5970
Project: Spark
Issue Type: Bug
Components: Spark Core
Affects Versions: 1.2.1
Environment: Linux, 64bit
spark-1.2.1-bin-hadoop2.4.tgz
Reporter: Milan Straka
How to reproduce:
- extract spark-1.2.1-bin-hadoop2.4.tgz
- without any further configuration, run bin/pyspark
- run sc.stop() and close python shell
Expected results:
- no temporary directories are left in /tmp
Actual results:
- four empty temporary directories are created in /tmp, for example after {{ls
-d /tmp/spark*}}:{code}
/tmp/spark-1577b13d-4b9a-4e35-bac2-6e84e5605f53
/tmp/spark-96084e69-77fd-42fb-ab10-e1fc74296fe3
/tmp/spark-ab2ea237-d875-485e-b16c-5b0ac31bd753
/tmp/spark-ddeb0363-4760-48a4-a189-81321898b146
{code}
The issue is caused by changes in {{util/Utils.scala}}. Consider the
{{createDirectory}}:
{code} /**
* Create a directory inside the given parent directory. The directory is
guaranteed to be
* newly created, and is not marked for automatic deletion.
*/
def createDirectory(root: String, namePrefix: String = "spark"): File = ...
{code}
The {{createDirectory}} is used in two places. The first is in
{{createTempDir}}, where it is marked for automatic deletion:
{code}
def createTempDir(
root: String = System.getProperty("java.io.tmpdir"),
namePrefix: String = "spark"): File = {
val dir = createDirectory(root, namePrefix)
registerShutdownDeleteDir(dir)
dir
}
{code}
Nevertheless, it is also used in {{getOrCreateLocalDirs}} where it is _not_
marked for automatic deletion:
{code}
private[spark] def getOrCreateLocalRootDirs(conf: SparkConf): Array[String] =
{
if (isRunningInYarnContainer(conf)) {
// If we are in yarn mode, systems can have different disk layouts so we
must set it
// to what Yarn on this system said was available. Note this assumes that
Yarn has
// created the directories already, and that they are secured so that
only the
// user has access to them.
getYarnLocalDirs(conf).split(",")
} else {
// In non-Yarn mode (or for the driver in yarn-client mode), we cannot
trust the user
// configuration to point to a secure directory. So create a subdirectory
with restricted
// permissions under each listed directory.
Option(conf.getenv("SPARK_LOCAL_DIRS"))
.getOrElse(conf.get("spark.local.dir",
System.getProperty("java.io.tmpdir")))
.split(",")
.flatMap { root =>
try {
val rootDir = new File(root)
if (rootDir.exists || rootDir.mkdirs()) {
Some(createDirectory(root).getAbsolutePath())
} else {
logError(s"Failed to create dir in $root. Ignoring this
directory.")
None
}
} catch {
case e: IOException =>
logError(s"Failed to create local root dir in $root. Ignoring this
directory.")
None
}
}
.toArray
}
}
{code}
Therefore I think the
{code}
Some(createDirectory(root).getAbsolutePath())
{code}
should be replaced by something like (I am not an experienced Scala programmer):
{code}
val dir = createDirectory(root)
registerShutdownDeleteDir(dir)
Some(dir.getAbsolutePath())
{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]