viirya commented on a change in pull request #1595:
URL: https://github.com/apache/arrow-datafusion/pull/1595#discussion_r787104696



##########
File path: datafusion/src/physical_plan/file_format/parquet.rs
##########
@@ -757,10 +784,8 @@ mod tests {
             .enumerate()
             .map(|(i, g)| row_group_predicate(g, i))
             .collect::<Vec<_>>();
-        // no row group is filtered out because the predicate expression can't 
be evaluated
-        // when a null array is generated for a statistics column,
-        // because the null values propagate to the end result, making the 
predicate result undefined
-        assert_eq!(row_group_filter, vec![true, true]);
+        // First row group was filtered out because it contains no null value 
on "c2".
+        assert_eq!(row_group_filter, vec![false, true]);

Review comment:
       I am not sure about the expression semantics in datafusion. In Spark, 
the predicate should be `IsNull` that checks the null value. Here I follow the 
original expression `bool = NULL`.
   
   I see there is also `IsNull` predicate expression, but I don't see `IsNull` 
is handled in predicate pushdown. I don't know if this is intentional (i.e. 
using `=` to do null predicate pushdown) or a bug.
   
   I can fix it if you agree that `IsNull` is correct way to handle null 
predicate here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to