yyanyy commented on issue #2492:
URL: https://github.com/apache/iceberg/issues/2492#issuecomment-822812288


   @RussellSpitzer Thank you for the quick fix! Regarding handling NaN in 
min/max, I think you are right that they seem fine today if they are created by 
iceberg library: 
   - `compare(NaN, xxx) = 1`; `compare(xxx, NaN) = -1`
   - when we [create them before handling 
NaN](https://github.com/apache/iceberg/blob/469f6c3f640293df6c629a693450736350aa6fab/core/src/main/java/org/apache/iceberg/PartitionSummary.java#L88-L92),
 if input is NaN, min will never be NaN but max will always be NaN
   - In 
[`ManifestFileUtil`](https://github.com/apache/iceberg/blob/469f6c3f640293df6c629a693450736350aa6fab/core/src/main/java/org/apache/iceberg/util/ManifestFileUtil.java#L72-L76)
 now, input as NaN should be intercepted before reaching these comparison 
evaluation, and we can ignore comparison with lower bound since it won't be 
NaN. When comparison with upper bound, `comparator.compare(typedValue, NaN) > 
0`  will always be false and thus the method `canContain` will reach true. 
   
   However I guess this is when it's guaranteed that the summary is generated 
by iceberg library. Since direct comparison with NaN will also be false (e.g. 
`3D > Double.NaN` -> false, `Double.NaN > 3D` -> false), I think if there are 
custom implementations for this summary info that uses this approach, the 
assumption that `lowerBound` will not be `NaN` may not hold. and in this case 
`compare(xxx, lowerBound) < 0` will lead to true which lets the `canContain` 
return false which may not be correct. In this case I think it might be safer 
if we add a logic to explicitly check for `lowerbound` for NaN value. What do 
you think? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to