kbendick edited a comment on pull request #2081:
URL: https://github.com/apache/iceberg/pull/2081#issuecomment-759369189


   > I see [Anton] added the check for `length > 0`, was there any particular 
reason for that? Do you expect `length == 0` to be handled before the pushdown?
   
   As for whether or not length == 0 should be handled before the pushdown, it 
should be if I understand correctly. By the time that this function is invoked 
in the places I listed elsewhere, all expressions should have been rewritten to 
remove NOT nodes, and then been bound (both the term to the correct id or 
predicate id as well as the literal used for comparison in the literal 
expressions).
   
   Given that this is always called either in metrics evaluation or row value 
evaluation against a predicate - typically START_WITH or soon to be 
NOT_STARTS_WITH - the expressions being evaluated are all 
`BoundLiteralPredicates` and the input of an empty predicate in that situation 
(e.g. `data not like '%'`) is not necessarily an invalid expression (nor are 
empty rows such as `title = ''` showing up in a row that must be evaluated once 
the `title = 'apple%'` predicate needs to be evaluated at the row level, which 
is where the precondition causes problems and which is what's most likely 
causing the user's error).
   
   Though for that particular query, it does get rewritten to be `data != ''` 
or both `title is not null and title like 'apple%'` as well as `title != 
'apple'` in strict metrics evaluation at one point via parsing, binding, and 
rewriting.
   
   I'm not as closely familiar with the `min` and `max` functions, but my 
understanding after spending a decent period of time working with it when 
implementing NOT_STARTS_WITH,  is that they too should be under these same 
conditions and only applied to parquet metadata after expressions have also 
become rewritten into some kind of `BoundPredicate` that can be applied on 
parquet `Binary` type lower and upper bounds metrics.
   
   Please correct me if I am mistaken 🙂 .
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to