shardulm94 opened a new pull request #1536: URL: https://github.com/apache/iceberg/pull/1536
ORC uses SQL semantics for Search Arguments, so an expression like `col != 1` will exclude rows where col is NULL along with rows where `col = 1`. In contrast, Iceberg's Expressions will keep rows with NULL values, so the equivalent ORC Search Argument for an Iceberg Expression `col != x` is `col IS NULL OR col != x`. This PR fixes the issue of the ORC pushdown returning less rows than what Iceberg expects. https://github.com/apache/iceberg/blob/78e80d2a35c4c93e68776c31c24cc8ccb06fed4b/data/src/test/java/org/apache/iceberg/data/TestMetricsRowGroupFilter.java#L741-L742 mentions that this might be a bug in ORC, but in fact its just a case of mismatched semantics. During conversion of Iceberg expression to ORC Search Arguments, we now take care of this semantic difference. The wider discussion of SQL compatibility for Iceberg expressions is discussed on [the dev list](https://lists.apache.org/thread.html/r1f02485a6c007715939f9ee9aa33344b8cfc90e5f72fa68e5d989056%40%3Cdev.iceberg.apache.org%3E). ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
