kbendick opened a new pull request #2062: URL: https://github.com/apache/iceberg/pull/2062
Adds support for a `NOT_STARTS_WITH` operator and closes https://github.com/apache/iceberg/issues/1952. This also ensures that pushdown happens when evaluating Parquet dictionaries as well as Parquet row groups. It also ensures that Spark will push this filter down, which is particularly important for queries to remove string partition columns, especially in the case of the identity partition spec or in the truncation partition spec when truncation length is less than or equal to the notStartsWith predicate term. I've added quite a number of tests. Admittedly, many of them were added in order to aide my own understanding of the codebase so that I could better contribute in the future. So please feel free to suggest any that should be removed in order to spare CI running time and cut down on potential code rot. I also added a few tests around `startsWith`, which I'd be happy to factor out into their own PR. I'm adding some comments to explain my reasoning for the changes. cc @shardulm94 @RussellSpitzer @rdblue ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
