bodduv commented on PR #14500: URL: https://github.com/apache/iceberg/pull/14500#issuecomment-3541401314
Thank you for the comment @pvary > * We have a table with an UUID column > * We inserted 2 rows to the table with UUID_MIN and UUID_MAX with Java Iceberg 1.10.0, and calculated column stats (min=UUID_MAX, max=UUID_MIN) It matter how a query engine prepares min, max values for UUID columns to handle them over for writing manifest file and manifest lists. Some engines could use min and max values as prepared by Parquet Java (which is RFC compliant) during writes. > * We run a query which filter on UUID_MIDDLE. > > * I expect that the metadata filtering will return the new file (UUID_MAX < UUID_MIDDLE < UUID_MIN), and we will find the row > > Am I correct, that after the upgrade the metadata filtering will skip the new file (UUID_MIDDLE < UUID_MAX) - filtered out by the wrong min value? Yes, if the min and max metrics persisted in manifest file and manifest list are constructed using the faulty non-RFC compliant UUID comparisons, then yes we would not be able to read the new file back with such a filter (on UUID column) after upgrading. What is even more problematic (evident in my testing) that even an equality filter `uuid_col = ...` will leave out records that are supposed to be returned. Note that with a full table scan we will be read the new file. A remedy we would be to migrate the table (doing a full table scan) and rewriting metrics accurately. Note: This issue is only in Java implementation of the spec. Go, Rust, Cpp implementations are RFC compliant making the bug more severe. I.e., If the same table is read with a filter using Go implementation, it produced correct, but different records than when Java implementation is used. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
