bodduv commented on PR #14500: URL: https://github.com/apache/iceberg/pull/14500#issuecomment-3541833750
> This is a serious behavioral change which could effect correctness. It is. But I also think this is a serious data correctness bug in Java implementation of the Iceberg spec. > I agree that we should find a way to move forward, but the current RFC incompatible solution works for java only implementation and this change would break them. We should need to find a solution which allows fixing this issue without effecting correctness. I should clarify regarding ^this. We do __NOT__ need a solution for implementations other than Java as other implementations are not affected by this UUID comparison bug. Let me clarify: If one uses Go implementation of the spec to create Iceberg table with a UUID column just like above. In this case, min=UUID_MIN and max=UUID_MAX compliant with RFC. No surprises while using a filter on UUID_MIDDLE, the new file should be read correctly. Any query engine that is written in Java or uses Iceberg Java implementation __should__ revisit UUID comparisons in their entire stack. There is another approach of disabling any manifest entry filtering (data file filtering) and manifest file filtering (partition pruning) so as to not trigger any UUID comparisons (via Iceberg Java APIs). I believe this the approach Trino currently employing, and also we are employing, although this come with significant performance implications. But query engines must approach this from data correctness POV. > I would try to resurrect the thread with a summary (short/easily understandable problem statement), and with a focused more detailed description. > > Also, I would add this to the next community sync topics. Thank you @pvary for effort and taking a closer look into this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
