bodduv commented on PR #14500:
URL: https://github.com/apache/iceberg/pull/14500#issuecomment-3541833750

   > This is a serious behavioral change which could effect correctness.
   
   It is. But I also think this is a serious data correctness bug in Java 
implementation of the Iceberg spec.
   
   > I agree that we should find a way to move forward, but the current RFC 
incompatible solution works for java only implementation and this change would 
break them. We should need to find a solution which allows fixing this issue 
without effecting correctness.
   
   I should clarify regarding ^this. We do __NOT__  need a solution for 
implementations other than Java as other implementations are not affected by 
this UUID comparison bug. Let me clarify: If one uses Go implementation of the 
spec to create Iceberg table with a UUID column just like above. In this case, 
min=UUID_MIN and max=UUID_MAX compliant with RFC. No surprises while using a 
filter on UUID_MIDDLE, the new file should be read correctly.
   
   Any query engine that is written in Java or uses Iceberg Java implementation 
__should__  revisit UUID comparisons in their entire stack.
   
   There is another approach of disabling any manifest entry filtering (data 
file filtering) and manifest file filtering (partition pruning) so as to not 
trigger any UUID comparisons (via Iceberg Java APIs). I believe this the 
approach Trino currently employing, and also we are employing, although this 
come with significant performance implications. But query engines must approach 
this from data correctness POV.
   
   > I would try to resurrect the thread with a summary (short/easily 
understandable problem statement), and with a focused more detailed description.
   > 
   > Also, I would add this to the next community sync topics.
   
   Thank you @pvary for effort and taking a closer look into this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to