Dongjoon Hyun created SPARK-23457:
-------------------------------------

             Summary: Register task completion listeners first for Parquet
                 Key: SPARK-23457
                 URL: https://issues.apache.org/jira/browse/SPARK-23457
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 2.3.0
            Reporter: Dongjoon Hyun


ParquetFileFormat leaks open files in some cases.

{code}
  test("SPARK-23390 Register task completion listeners first in 
ParquetFileFormat") {
    withSQLConf(SQLConf.PARQUET_VECTORIZED_READER_BATCH_SIZE.key -> 
s"${Int.MaxValue}") {
      withTempDir { dir =>
        val basePath = dir.getCanonicalPath
        Seq(0).toDF("a").write.format("parquet").save(new Path(basePath, 
"first").toString)
        Seq(1).toDF("a").write.format("parquet").save(new Path(basePath, 
"second").toString)
        val df = spark.read.parquet(
          new Path(basePath, "first").toString,
          new Path(basePath, "second").toString)
        val e = intercept[SparkException] {
          df.collect()
        }
        assert(e.getCause.isInstanceOf[OutOfMemoryError])
      }
    }
  }
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to