kbendick opened a new pull request #2081: URL: https://github.com/apache/iceberg/pull/2081
This PR closes https://github.com/apache/iceberg/issues/2065 This allows for the `BinaryUtil.truncateBinary` function to allow for a zero length truncation. This is necessary for handling scans where STARTS_WITH filters are applied to the table (or any filter that uses `truncateBinary`) when a (possibly non-partition) column has an empty string value in it. It would be more efficient to return early in each of the given metrics evaluators if the length is zero, but I did not want to introduce too many changes that conflict with this PR: https://github.com/apache/iceberg/pull/2062. If we'd like, we can go in and update all of those functions individually, but this is the minimum possible change set (to the non-test code) to fix the issue now. I chose to update the `TestFilteredScan` spark test suites to have one record be an empty string. This will cover unpartitioned tables, tables with filters that are partitioned on a column that has the empty string, as well as tables that are partitioned on columns that do not have the empty string value (which was the case for the original ticket). I also added a test that uses the Iceberg API directly. cc @ changquanyou @aokolnychyi @RussellSpitzer ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
