houqp opened a new issue #1591:
URL: https://github.com/apache/arrow-datafusion/issues/1591


   **Describe the bug**
   
   `col = null` expression evaluation throws a runtime error when getting 
evaluated against statistics array, which resulted in incorrect `true` result 
when the stats has null count set to 0.
   
   The other problem is `col = null` expression is converted into `col_min <= 
NULL AND NULL <= col_max` predicate expression. I believe we should be handling 
null as a special case and return an expression that checks against null count 
column instead.
   
   **To Reproduce**
   
   See our test cases at: 
https://github.com/apache/arrow-datafusion/blob/f027e5f4d9a44ad9cc879c133abc913f78fa76f0/datafusion/src/physical_plan/file_format/parquet.rs#L722-L763
   
   The test case asserts that results for both row groups should return `true`, 
while them should both be `false` instead because both row groups have null 
count set to 0.
   
   **Expected behavior**
   
   `col = null` row group should be evaluated by taking row group null count 
stats into account.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to