This is an automated email from the ASF dual-hosted git repository.
gurwls223 pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/branch-3.4 by this push:
new 3192c8c68423 [SPARK-47125][SQL] Return null if Univocity never
triggers parsing
3192c8c68423 is described below
commit 3192c8c68423fb2d5f73da238ba60c3d1fb3559f
Author: Hyukjin Kwon <[email protected]>
AuthorDate: Thu Feb 22 12:13:24 2024 +0900
[SPARK-47125][SQL] Return null if Univocity never triggers parsing
This PR proposes to prevent `null` for `tokenizer.getContext`. This is
similar with https://github.com/apache/spark/pull/28029. `getContext` seemingly
via the univocity library, it can return null if `begingParsing` is not invoked
(https://github.com/uniVocity/univocity-parsers/blob/master/src/main/java/com/univocity/parsers/common/AbstractParser.java#L53).
This can happen when `parseLine` is not invoked at
https://github.com/apache/spark/blob/e081f06ea401a2b6b8c214a36126583d35eaf55f/
[...]
To fix up a bug.
Yes. In a very rare case, when `CsvToStructs` is used as a sole predicate
against an empty row, it might trigger NPE. This PR fixes it.
Manually tested, but test case will be done in a separate PR. We should
backport this to all branches.
No.
Closes #45210 from HyukjinKwon/SPARK-47125.
Authored-by: Hyukjin Kwon <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
(cherry picked from commit a87015efb5cf36103bc4eb82ae8613874e2eb408)
Signed-off-by: Hyukjin Kwon <[email protected]>
---
.../main/scala/org/apache/spark/sql/catalyst/csv/UnivocityParser.scala | 1 +
1 file changed, 1 insertion(+)
diff --git
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityParser.scala
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityParser.scala
index 59b2857f6b60..ba2ef14e4fad 100644
---
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityParser.scala
+++
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityParser.scala
@@ -139,6 +139,7 @@ class UnivocityParser(
// Retrieve the raw record string.
private def getCurrentInput: UTF8String = {
+ if (tokenizer.getContext == null) return null
val currentContent = tokenizer.getContext.currentParsedContent()
if (currentContent == null) null else
UTF8String.fromString(currentContent.stripLineEnd)
}
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]