This is an automated email from the ASF dual-hosted git repository.
dongjoon pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/branch-2.4 by this push:
new ba9e12d [SPARK-26745][SQL][TESTS] JsonSuite test case: empty line ->
0 record count
ba9e12d is described below
commit ba9e12d55d043d6331df59a3829d40e41a9e2171
Author: Branden Smith <[email protected]>
AuthorDate: Wed Feb 6 13:55:19 2019 +0800
[SPARK-26745][SQL][TESTS] JsonSuite test case: empty line -> 0 record count
This PR consists of the `test` components of #23665 only, minus the
associated patch from that PR.
It adds a new unit test to `JsonSuite` which verifies that the `count()`
returned from a `DataFrame` loaded from JSON containing empty lines does not
include those empty lines in the record count. The test runs `count` prior to
otherwise reading data from the `DataFrame`, so as to catch future cases where
a pre-parsing optimization might result in `count` results inconsistent with
existing behavior.
This PR is intended to be deployed alongside #23667; `master` currently
causes the test to fail, as described in
[SPARK-26745](https://issues.apache.org/jira/browse/SPARK-26745).
Manual testing, existing `JsonSuite` unit tests.
Closes #23674 from sumitsu/json_emptyline_count_test.
Authored-by: Branden Smith <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
(cherry picked from commit 63bced9375ec1ec6ded220d768cd746050861a09)
Signed-off-by: Dongjoon Hyun <[email protected]>
---
.../spark/sql/execution/datasources/json/JsonSuite.scala | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git
a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala
b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala
index 3e4cc8f..5ca430a 100644
---
a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala
+++
b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala
@@ -2515,4 +2515,16 @@ class JsonSuite extends QueryTest with SharedSQLContext
with TestJsonData {
checkCount(2)
countForMalformedJSON(0, Seq(""))
}
+
+ test("SPARK-26745: count() for non-multiline input with empty lines") {
+ withTempPath { tempPath =>
+ val path = tempPath.getCanonicalPath
+ Seq("""{ "a" : 1 }""", "", """ { "a" : 2 }""", " \t ")
+ .toDS()
+ .repartition(1)
+ .write
+ .text(path)
+ assert(spark.read.json(path).count() === 2)
+ }
+ }
}
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]