kbendick edited a comment on pull request #2081: URL: https://github.com/apache/iceberg/pull/2081#issuecomment-759369189
> I see [Anton] added the check for `length > 0`, was there any particular reason for that? Do you expect `length == 0` to be handled before the pushdown? As for whether or not length == 0 should be handled before the pushdown, it should be if I understand correctly. By the time that this function is invoked in the places I listed elsewhere, all expressions should have been rewritten to remove NOT nodes, and then been bound (both the term to the correct id or predicate id as well as the literal used for comparison in the literal expressions). Given that this is always called either in metrics evaluation or row value evaluation against a predicate - typically START_WITH or soon to be NOT_STARTS_WITH - the expressions being evaluated are all `BoundLiteralPredicates` and the input of an empty predicate in that situation (e.g. `data not like '%'`) is not necessarily an invalid expression (nor are empty rows such as `title = ''` showing up in a row that must be evaluated once the `title = 'apple%'` predicate needs to be evaluated at the row level, which is where the precondition causes problems and which is what's most likely causing the user's error). Though for that particular query, it does get rewritten to be `data != ''` or both `title is not null and title like 'apple%'` as well as `title != 'apple'` in strict metrics evaluation at one point via parsing, binding, and rewriting. I'm not as closely familiar with the `min` and `max` functions, but my understanding after spending a decent period of time working with it when implementing NOT_STARTS_WITH, is that they too should be under these same conditions and only applied to parquet metadata after expressions have also become rewritten into some kind of `BoundPredicate` that can be applied on parquet `Binary` type lower and upper bounds metrics. Please correct me if I am mistaken 🙂 . ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
