findepi commented on pull request #2891:
URL: https://github.com/apache/iceberg/pull/2891#issuecomment-889685056


   > Is this specific to Java? Are -NaN values ordered in other languages?
   
   @rdblue this i do not know. Since the spec points at Java sorting as the 
'reference', so i focused on that.
   
   > This brings up the question of how different NaN representations should be 
handled in Iceberg. Should writers canonicalize them? 
   
   @electrum this is a good question, and i was thinking about this too.
   it seems that, from Trino perspective, it doesn't matter much, because we 
treat all NaN values as indistinguishable. The canonicalzation is applied at 
comparison time, in the engine, so storage is not required to canonicalize. Of 
course, it would be better to have writers canonicalize, but I am concerned we 
will be never able to assume that at read time, because of pre-existing data.
   
   However, even if we follow this path, we still could want to define how NaNs 
interact with `distinct_counts` in manifest.
   Or, we would ignore `distinct_counts` whenever `nan_value_counts > 0`.
   (I don't know yet, whether this is important. We may or may not use 
`distinct_counts`.)
   
   > What do ORC and Parquet do for non-canonical values?
   
   @electrum you mean the reference writer implementations? i don't know.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to