(spark) branch branch-3.5 updated: [SPARK-47125][SQL] Return null if Univocity never triggers parsing

gurwls223 Wed, 21 Feb 2024 19:15:29 -0800

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/branch-3.5 by this push:
     new e81df1f39e8f [SPARK-47125][SQL] Return null if Univocity never 
triggers parsing
e81df1f39e8f is described below

commit e81df1f39e8fd2d1babd5cadd4d4b76e2df95791
Author: Hyukjin Kwon <[email protected]>
AuthorDate: Thu Feb 22 12:13:24 2024 +0900

    [SPARK-47125][SQL] Return null if Univocity never triggers parsing
    
    ### What changes were proposed in this pull request?
    
    This PR proposes to prevent `null` for `tokenizer.getContext`. This is 
similar with https://github.com/apache/spark/pull/28029. `getContext` seemingly 
via the univocity library, it can return null if `begingParsing` is not invoked 
(https://github.com/uniVocity/univocity-parsers/blob/master/src/main/java/com/univocity/parsers/common/AbstractParser.java#L53).
 This can happen when `parseLine` is not invoked at 
https://github.com/apache/spark/blob/e081f06ea401a2b6b8c214a36126583d35eaf55f/ 
[...]
    
    ### Why are the changes needed?
    
    To fix up a bug.
    
    ### Does this PR introduce _any_ user-facing change?
    
    Yes. In a very rare case, when `CsvToStructs` is used as a sole predicate 
against an empty row, it might trigger NPE. This PR fixes it.
    
    ### How was this patch tested?
    
    Manually tested, but test case will be done in a separate PR. We should 
backport this to all branches.
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    No.
    
    Closes #45210 from HyukjinKwon/SPARK-47125.
    
    Authored-by: Hyukjin Kwon <[email protected]>
    Signed-off-by: Hyukjin Kwon <[email protected]>
---
 .../main/scala/org/apache/spark/sql/catalyst/csv/UnivocityParser.scala   | 1 +
 1 file changed, 1 insertion(+)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityParser.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityParser.scala
index 804c5d358ad6..f0663ddd69b1 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityParser.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityParser.scala
@@ -139,6 +139,7 @@ class UnivocityParser(
 
   // Retrieve the raw record string.
   private def getCurrentInput: UTF8String = {
+    if (tokenizer.getContext == null) return null
     val currentContent = tokenizer.getContext.currentParsedContent()
     if (currentContent == null) null else 
UTF8String.fromString(currentContent.stripLineEnd)
   }


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(spark) branch branch-3.5 updated: [SPARK-47125][SQL] Return null if Univocity never triggers parsing

Reply via email to