Milan Straka created SPARK-5970:
-----------------------------------

             Summary: Temporary directories are not removed (but their content 
is)
                 Key: SPARK-5970
                 URL: https://issues.apache.org/jira/browse/SPARK-5970
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 1.2.1
         Environment: Linux, 64bit
spark-1.2.1-bin-hadoop2.4.tgz
            Reporter: Milan Straka


How to reproduce: 
- extract spark-1.2.1-bin-hadoop2.4.tgz
- without any further configuration, run bin/pyspark
- run sc.stop() and close python shell

Expected results:
- no temporary directories are left in /tmp

Actual results:
- four empty temporary directories are created in /tmp, for example after {{ls 
-d /tmp/spark*}}:{code}
/tmp/spark-1577b13d-4b9a-4e35-bac2-6e84e5605f53
/tmp/spark-96084e69-77fd-42fb-ab10-e1fc74296fe3
/tmp/spark-ab2ea237-d875-485e-b16c-5b0ac31bd753
/tmp/spark-ddeb0363-4760-48a4-a189-81321898b146
{code}

The issue is caused by changes in {{util/Utils.scala}}. Consider the 
{{createDirectory}}:
{code}  /**
   * Create a directory inside the given parent directory. The directory is 
guaranteed to be
   * newly created, and is not marked for automatic deletion.
   */
  def createDirectory(root: String, namePrefix: String = "spark"): File = ...
{code}

The {{createDirectory}} is used in two places. The first is in 
{{createTempDir}}, where it is marked for automatic deletion:
{code}
  def createTempDir(
      root: String = System.getProperty("java.io.tmpdir"),
      namePrefix: String = "spark"): File = {
    val dir = createDirectory(root, namePrefix)
    registerShutdownDeleteDir(dir)
    dir
  }
{code}

Nevertheless, it is also used in {{getOrCreateLocalDirs}} where it is _not_ 
marked for automatic deletion:
{code}
  private[spark] def getOrCreateLocalRootDirs(conf: SparkConf): Array[String] = 
{
    if (isRunningInYarnContainer(conf)) {
      // If we are in yarn mode, systems can have different disk layouts so we 
must set it
      // to what Yarn on this system said was available. Note this assumes that 
Yarn has
      // created the directories already, and that they are secured so that 
only the
      // user has access to them.
      getYarnLocalDirs(conf).split(",")
    } else {
      // In non-Yarn mode (or for the driver in yarn-client mode), we cannot 
trust the user
      // configuration to point to a secure directory. So create a subdirectory 
with restricted
      // permissions under each listed directory.
      Option(conf.getenv("SPARK_LOCAL_DIRS"))
        .getOrElse(conf.get("spark.local.dir", 
System.getProperty("java.io.tmpdir")))
        .split(",")
        .flatMap { root =>
          try {
            val rootDir = new File(root)
            if (rootDir.exists || rootDir.mkdirs()) {
              Some(createDirectory(root).getAbsolutePath())
            } else {
              logError(s"Failed to create dir in $root. Ignoring this 
directory.")
              None
            }
          } catch {
            case e: IOException =>
            logError(s"Failed to create local root dir in $root. Ignoring this 
directory.")
            None
          }
        }
        .toArray
    }
  }
{code}

Therefore I think the
{code}
Some(createDirectory(root).getAbsolutePath())
{code}
should be replaced by something like (I am not an experienced Scala programmer):
{code}
val dir = createDirectory(root)
registerShutdownDeleteDir(dir)
Some(dir.getAbsolutePath())
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to