Repository: spark
Updated Branches:
refs/heads/master 704af4bd6 -> 17cdabb88
[SPARK-19809][SQL][TEST] NullPointerException on zero-size ORC file
## What changes were proposed in this pull request?
Until 2.2.1, Spark raises `NullPointerException` on zero-size ORC files.
Usually, these zero-size ORC files are generated by 3rd-party apps like Flume.
```scala
scala> sql("create table empty_orc(a int) stored as orc location
'/tmp/empty_orc'")
$ touch /tmp/empty_orc/zero.orc
scala> sql("select * from empty_orc").show
java.lang.RuntimeException: serious problem at
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1021)
...
Caused by: java.lang.NullPointerException at
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$BISplitStrategy.getSplits(OrcInputFormat.java:560)
```
After [SPARK-22279](https://github.com/apache/spark/pull/19499), Apache Spark
with the default configuration doesn't have this bug. Although Hive 1.2.1
library code path still has the problem, we had better have a test coverage on
what we have now in order to prevent future regression on it.
## How was this patch tested?
Pass a newly added test case.
Author: Dongjoon Hyun <[email protected]>
Closes #19948 from dongjoon-hyun/SPARK-19809-EMPTY-FILE.
Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/17cdabb8
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/17cdabb8
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/17cdabb8
Branch: refs/heads/master
Commit: 17cdabb88761e67ca555299109f89afdf02a4280
Parents: 704af4b
Author: Dongjoon Hyun <[email protected]>
Authored: Wed Dec 13 07:42:24 2017 +0900
Committer: hyukjinkwon <[email protected]>
Committed: Wed Dec 13 07:42:24 2017 +0900
----------------------------------------------------------------------
.../spark/sql/hive/execution/SQLQuerySuite.scala | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/spark/blob/17cdabb8/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala
----------------------------------------------------------------------
diff --git
a/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala
b/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala
index f2562c3..93c91d3 100644
---
a/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala
+++
b/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala
@@ -2172,4 +2172,21 @@ class SQLQuerySuite extends QueryTest with SQLTestUtils
with TestHiveSingleton {
}
}
}
+
+ test("SPARK-19809 NullPointerException on zero-size ORC file") {
+ Seq("native", "hive").foreach { orcImpl =>
+ withSQLConf(SQLConf.ORC_IMPLEMENTATION.key -> orcImpl) {
+ withTempPath { dir =>
+ withTable("spark_19809") {
+ sql(s"CREATE TABLE spark_19809(a int) STORED AS ORC LOCATION
'$dir'")
+ Files.touch(new File(s"${dir.getCanonicalPath}", "zero.orc"))
+
+ withSQLConf(HiveUtils.CONVERT_METASTORE_ORC.key -> "true") { //
default since 2.3.0
+ checkAnswer(sql("SELECT * FROM spark_19809"), Seq.empty)
+ }
+ }
+ }
+ }
+ }
+ }
}
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]