yyanyy commented on issue #2492:
URL: https://github.com/apache/iceberg/issues/2492#issuecomment-822842065


   Sorry to make sure I understand, are you suggesting if both upper/lower 
bounds are NaN, the column will only contain NaN so that we can return false? I 
think it's possible that a column may contain a mix of NaN and normal values 
but still report min/max as both NaN, which is what ORC files metric collection 
does today: 
https://github.com/apache/iceberg/blob/master/core/src/test/java/org/apache/iceberg/TestMetrics.java#L373-L374
   
   But after thinking about it for a bit, I think it may not worth adding this 
logic to avoid including unnecessary files, as 1) the chance might be low if 
people to do their own implementation on it since implementing manifest list 
may be non-trivial, 2) it's not a new thing we introduced, and 3) we start to 
mention in spec that upper/lower shouldn't include NaN: 
https://iceberg.apache.org/spec/#snapshots:
   
   ```
   510 lower_bound
   ...
   Lower bound for the non-null, non-NaN values in the partition field, or null 
if all values are null or NaN [2]
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to