This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-1.9
in repository https://gitbox.apache.org/repos/asf/orc.git


The following commit(s) were added to refs/heads/branch-1.9 by this push:
     new a8b6a6179 ORC-1898: When column is all null, NULL_SAFE_EQUALS pushdown 
doesn't get evaluated correctly
a8b6a6179 is described below

commit a8b6a61799217acb0f19909b7baf2feddcf7fb2f
Author: Jay Han <[email protected]>
AuthorDate: Tue May 20 17:19:15 2025 -0700

    ORC-1898: When column is all null, NULL_SAFE_EQUALS pushdown doesn't get 
evaluated correctly
    
    ### What changes were proposed in this pull request?
    
    When all values in column `col_0` are `NULL`s within a row group, and we 
attempt to apply the predicate pushdown `col_0 <=> 'xxx'`, the 
`evaluatePredicateProto` function returns `TruthValue.NULL`. In this case, we 
can directly determine the result based on the literal value: if the literal is 
`NULL`, return `TruthValue.YES`, otherwise, return `TruthValue.NO`.
    
    ### Why are the changes needed?
    
    See 
[SPARK-52032](https://issues.apache.org/jira/projects/SPARK/issues/SPARK-52032).
    When we pushdown the NULL_SAFE_EQUALS predicate, all values of the column 
are `NULL`. The `evaluatePredicateProto` returns `TruthValue.NULL`, whose 
`isNeeded` returns false so that the whole row group is skipped by 
`SargApplier.pickRowGroups`, which actually is incorrect.
    
    ### How was this patch tested?
    
    There already exists unit test -- `TestOrcTimezonePPD.testTimestampAllNulls`
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    Co-authored using generative AI tooling.
    
    Closes #2223 from jayhan94/fix_null_safe_equals_pred_push.
    
    Authored-by: Jay Han <[email protected]>
    Signed-off-by: Dongjoon Hyun <[email protected]>
    (cherry picked from commit 467a293bdc8b78a6392bd097cd593bbc2224fdae)
    Signed-off-by: Dongjoon Hyun <[email protected]>
---
 java/core/src/java/org/apache/orc/impl/RecordReaderImpl.java | 7 +++++++
 java/core/src/test/org/apache/orc/TestOrcTimezonePPD.java    | 2 +-
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/java/core/src/java/org/apache/orc/impl/RecordReaderImpl.java 
b/java/core/src/java/org/apache/orc/impl/RecordReaderImpl.java
index 88b44972e..c42b700d5 100644
--- a/java/core/src/java/org/apache/orc/impl/RecordReaderImpl.java
+++ b/java/core/src/java/org/apache/orc/impl/RecordReaderImpl.java
@@ -766,6 +766,13 @@ public class RecordReaderImpl implements RecordReader {
     if (!range.hasValues()) {
       if (predicate.getOperator() == PredicateLeaf.Operator.IS_NULL) {
         return TruthValue.YES;
+      } else if (predicate.getOperator() == 
PredicateLeaf.Operator.NULL_SAFE_EQUALS) {
+        Object literal = predicate.getLiteral();
+        if (literal == null) {
+          return TruthValue.YES;
+        } else {
+          return TruthValue.NO;
+        }
       } else {
         return TruthValue.NULL;
       }
diff --git a/java/core/src/test/org/apache/orc/TestOrcTimezonePPD.java 
b/java/core/src/test/org/apache/orc/TestOrcTimezonePPD.java
index f21ef810c..593e0a964 100644
--- a/java/core/src/test/org/apache/orc/TestOrcTimezonePPD.java
+++ b/java/core/src/test/org/apache/orc/TestOrcTimezonePPD.java
@@ -387,7 +387,7 @@ public class TestOrcTimezonePPD {
     PredicateLeaf pred = createPredicateLeaf(
       PredicateLeaf.Operator.NULL_SAFE_EQUALS, PredicateLeaf.Type.TIMESTAMP, 
"x",
       Timestamp.valueOf("2007-08-01 00:00:00.0"), null);
-    assertEquals(SearchArgument.TruthValue.NULL, 
RecordReaderImpl.evaluatePredicate(colStats[1], pred, bf));
+    assertEquals(SearchArgument.TruthValue.NO, 
RecordReaderImpl.evaluatePredicate(colStats[1], pred, bf));
 
     pred = createPredicateLeaf(PredicateLeaf.Operator.IS_NULL, 
PredicateLeaf.Type.TIMESTAMP, "x", null, null);
     assertEquals(SearchArgument.TruthValue.YES, 
RecordReaderImpl.evaluatePredicate(colStats[1], pred, bf));

Reply via email to