kbendick opened a new pull request #2081:
URL: https://github.com/apache/iceberg/pull/2081


   This PR closes https://github.com/apache/iceberg/issues/2065
   
   This allows for the `BinaryUtil.truncateBinary` function to allow for a zero 
length truncation. This is necessary for handling
   scans where STARTS_WITH filters are applied to the table (or any filter that 
uses  `truncateBinary`) when a (possibly non-partition) column has an empty 
string value in it.
   
   It would be more efficient to return early in each of the given metrics 
evaluators if the length is zero, but I did not want to introduce too many 
changes that conflict with this PR: 
https://github.com/apache/iceberg/pull/2062. If we'd like, we can go in and 
update all of those functions individually, but this is the minimum possible 
change set (to the non-test code) to fix the issue now.
   
   I chose to update the `TestFilteredScan` spark test suites to have one 
record be an empty string. This will cover unpartitioned tables, tables with 
filters that are partitioned on a column that has the empty string, as well as 
tables that are partitioned on columns that do not have the empty string value 
(which was the case for the original ticket).
   
   I also added a test that uses the Iceberg API directly.
   
   cc @ changquanyou @aokolnychyi @RussellSpitzer 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to