This is an automated email from the ASF dual-hosted git repository.
dongjoon pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/orc.git
The following commit(s) were added to refs/heads/main by this push:
new 467a293bd ORC-1898: When column is all null, NULL_SAFE_EQUALS pushdown
doesn't get evaluated correctly
467a293bd is described below
commit 467a293bdc8b78a6392bd097cd593bbc2224fdae
Author: Jay Han <[email protected]>
AuthorDate: Tue May 20 17:19:15 2025 -0700
ORC-1898: When column is all null, NULL_SAFE_EQUALS pushdown doesn't get
evaluated correctly
### What changes were proposed in this pull request?
When all values in column `col_0` are `NULL`s within a row group, and we
attempt to apply the predicate pushdown `col_0 <=> 'xxx'`, the
`evaluatePredicateProto` function returns `TruthValue.NULL`. In this case, we
can directly determine the result based on the literal value: if the literal is
`NULL`, return `TruthValue.YES`, otherwise, return `TruthValue.NO`.
### Why are the changes needed?
See
[SPARK-52032](https://issues.apache.org/jira/projects/SPARK/issues/SPARK-52032).
When we pushdown the NULL_SAFE_EQUALS predicate, all values of the column
are `NULL`. The `evaluatePredicateProto` returns `TruthValue.NULL`, whose
`isNeeded` returns false so that the whole row group is skipped by
`SargApplier.pickRowGroups`, which actually is incorrect.
### How was this patch tested?
There already exists unit test -- `TestOrcTimezonePPD.testTimestampAllNulls`
### Was this patch authored or co-authored using generative AI tooling?
Co-authored using generative AI tooling.
Closes #2223 from jayhan94/fix_null_safe_equals_pred_push.
Authored-by: Jay Han <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
---
java/core/src/java/org/apache/orc/impl/RecordReaderImpl.java | 7 +++++++
java/core/src/test/org/apache/orc/TestOrcTimezonePPD.java | 2 +-
2 files changed, 8 insertions(+), 1 deletion(-)
diff --git a/java/core/src/java/org/apache/orc/impl/RecordReaderImpl.java
b/java/core/src/java/org/apache/orc/impl/RecordReaderImpl.java
index c9256964e..5bd980925 100644
--- a/java/core/src/java/org/apache/orc/impl/RecordReaderImpl.java
+++ b/java/core/src/java/org/apache/orc/impl/RecordReaderImpl.java
@@ -763,6 +763,13 @@ public class RecordReaderImpl implements RecordReader {
if (!range.hasValues()) {
if (predicate.getOperator() == PredicateLeaf.Operator.IS_NULL) {
return TruthValue.YES;
+ } else if (predicate.getOperator() ==
PredicateLeaf.Operator.NULL_SAFE_EQUALS) {
+ Object literal = predicate.getLiteral();
+ if (literal == null) {
+ return TruthValue.YES;
+ } else {
+ return TruthValue.NO;
+ }
} else {
return TruthValue.NULL;
}
diff --git a/java/core/src/test/org/apache/orc/TestOrcTimezonePPD.java
b/java/core/src/test/org/apache/orc/TestOrcTimezonePPD.java
index f21ef810c..593e0a964 100644
--- a/java/core/src/test/org/apache/orc/TestOrcTimezonePPD.java
+++ b/java/core/src/test/org/apache/orc/TestOrcTimezonePPD.java
@@ -387,7 +387,7 @@ public class TestOrcTimezonePPD {
PredicateLeaf pred = createPredicateLeaf(
PredicateLeaf.Operator.NULL_SAFE_EQUALS, PredicateLeaf.Type.TIMESTAMP,
"x",
Timestamp.valueOf("2007-08-01 00:00:00.0"), null);
- assertEquals(SearchArgument.TruthValue.NULL,
RecordReaderImpl.evaluatePredicate(colStats[1], pred, bf));
+ assertEquals(SearchArgument.TruthValue.NO,
RecordReaderImpl.evaluatePredicate(colStats[1], pred, bf));
pred = createPredicateLeaf(PredicateLeaf.Operator.IS_NULL,
PredicateLeaf.Type.TIMESTAMP, "x", null, null);
assertEquals(SearchArgument.TruthValue.YES,
RecordReaderImpl.evaluatePredicate(colStats[1], pred, bf));