[GitHub] [iceberg] kbendick commented on pull request #2081: Allow binary truncation length to be zero to handle evaluators that encounter empty string values

GitBox Wed, 13 Jan 2021 01:30:36 -0800


kbendick commented on pull request #2081:
URL: https://github.com/apache/iceberg/pull/2081#issuecomment-759323889



   > This looks fine to me, given the tests all passed. But I wonder why empty 
string was not handled and added for testing from the very beginning, I may 
lack some context here. I see @aokolnychyi added the check for `length > 0`, 
was there any particular reason for that? Do you expect `length == 0` to be 
handled before the pushdown?
   
   Yes I would also be very interested to hear if there's any reason to not 
allow for the truncation function to alter the precondition. I'm open to there 
being something I didn't catch.
   
   For reference, I checked and the only place that `truncateBinary` is used is 
in metrics evaluation - namely for `startsWith` (and soon to be notStartsWith).
   
   To be exact, in `InclusiveMetricsEvaluator`,  `ManifestEvaluator`, and 
`ParquetMetricsRowGroupFilter` (all of them in the visitor and exclusively with 
startsWith - which is where the bug occurs presumably due to a row with an 
empty string). It's also used in updating the lower bounds and upper bounds for 
min/max parquet metrics for `FIXED` and `BINARY` types (in `ParquetUtils`). So 
it's only used with metrics and literals and it's overall relatively well 
tested. I can add more tests as a follow up on the binary and fixed types if we 
feel it necessary, but overall those are all decently well covered in the tests 
- especially after I add my tests from the `NOT_STARTS_WITH` PR, which covers 
it further: https://github.com/apache/iceberg/pull/2062.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] kbendick commented on pull request #2081: Allow binary truncation length to be zero to handle evaluators that encounter empty string values

Reply via email to