aokolnychyi opened a new pull request #1664: URL: https://github.com/apache/iceberg/pull/1664
This PR optimizes the evaluation of IN predicates on dictionary encoded columns in Parquet. The previous solution relied on `isEmpty` on top of `Sets$intersection`. That, in turn, used `Collections$disjoint(set2, set1)`. The latter checks whether the first argument is a set or not. If yes, it would simply iterate over the second argument ignoring the fact that the second argument can be also a set and may be even bigger. All of that led to the fact that we iterated through all dictionary values to evaluate IN predicates. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
