shardulm94 opened a new pull request #1536:
URL: https://github.com/apache/iceberg/pull/1536


   ORC uses SQL semantics for Search Arguments, so an expression like `col != 
1` will exclude rows where col is NULL along with rows where `col = 1`. In 
contrast, Iceberg's Expressions will keep rows with NULL values, so the 
equivalent ORC Search Argument for an Iceberg Expression `col != x` is `col IS 
NULL OR col != x`.
   
   This PR fixes the issue of the ORC pushdown returning less rows than what 
Iceberg expects. 
https://github.com/apache/iceberg/blob/78e80d2a35c4c93e68776c31c24cc8ccb06fed4b/data/src/test/java/org/apache/iceberg/data/TestMetricsRowGroupFilter.java#L741-L742
 mentions that this might be a bug in ORC, but in fact its just a case of 
mismatched semantics. During conversion of Iceberg expression to ORC Search 
Arguments, we now take care of this semantic difference.
   
   The wider discussion of SQL compatibility for Iceberg expressions is 
discussed on [the dev 
list](https://lists.apache.org/thread.html/r1f02485a6c007715939f9ee9aa33344b8cfc90e5f72fa68e5d989056%40%3Cdev.iceberg.apache.org%3E).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to