This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
     new ba9e12d  [SPARK-26745][SQL][TESTS] JsonSuite test case: empty line -> 
0 record count
ba9e12d is described below

commit ba9e12d55d043d6331df59a3829d40e41a9e2171
Author: Branden Smith <[email protected]>
AuthorDate: Wed Feb 6 13:55:19 2019 +0800

    [SPARK-26745][SQL][TESTS] JsonSuite test case: empty line -> 0 record count
    
    This PR consists of the `test` components of #23665 only, minus the 
associated patch from that PR.
    
    It adds a new unit test to `JsonSuite` which verifies that the `count()` 
returned from a `DataFrame` loaded from JSON containing empty lines does not 
include those empty lines in the record count. The test runs `count` prior to 
otherwise reading data from the `DataFrame`, so as to catch future cases where 
a pre-parsing optimization might result in `count` results inconsistent with 
existing behavior.
    
    This PR is intended to be deployed alongside #23667; `master` currently 
causes the test to fail, as described in 
[SPARK-26745](https://issues.apache.org/jira/browse/SPARK-26745).
    
    Manual testing, existing `JsonSuite` unit tests.
    
    Closes #23674 from sumitsu/json_emptyline_count_test.
    
    Authored-by: Branden Smith <[email protected]>
    Signed-off-by: Hyukjin Kwon <[email protected]>
    (cherry picked from commit 63bced9375ec1ec6ded220d768cd746050861a09)
    Signed-off-by: Dongjoon Hyun <[email protected]>
---
 .../spark/sql/execution/datasources/json/JsonSuite.scala     | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala
index 3e4cc8f..5ca430a 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala
@@ -2515,4 +2515,16 @@ class JsonSuite extends QueryTest with SharedSQLContext 
with TestJsonData {
     checkCount(2)
     countForMalformedJSON(0, Seq(""))
   }
+
+  test("SPARK-26745: count() for non-multiline input with empty lines") {
+    withTempPath { tempPath =>
+      val path = tempPath.getCanonicalPath
+      Seq("""{ "a" : 1 }""", "", """     { "a" : 2 }""", " \t ")
+        .toDS()
+        .repartition(1)
+        .write
+        .text(path)
+      assert(spark.read.json(path).count() === 2)
+    }
+  }
 }


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to