yyanyy commented on issue #2492: URL: https://github.com/apache/iceberg/issues/2492#issuecomment-822812288
@RussellSpitzer Thank you for the quick fix! Regarding handling NaN in min/max, I think you are right that they seem fine today if they are created by iceberg library: - `compare(NaN, xxx) = 1`; `compare(xxx, NaN) = -1` - when we [create them before handling NaN](https://github.com/apache/iceberg/blob/469f6c3f640293df6c629a693450736350aa6fab/core/src/main/java/org/apache/iceberg/PartitionSummary.java#L88-L92), if input is NaN, min will never be NaN but max will always be NaN - In [`ManifestFileUtil`](https://github.com/apache/iceberg/blob/469f6c3f640293df6c629a693450736350aa6fab/core/src/main/java/org/apache/iceberg/util/ManifestFileUtil.java#L72-L76) now, input as NaN should be intercepted before reaching these comparison evaluation, and we can ignore comparison with lower bound since it won't be NaN. When comparison with upper bound, `comparator.compare(typedValue, NaN) > 0` will always be false and thus the method `canContain` will reach true. However I guess this is when it's guaranteed that the summary is generated by iceberg library. Since direct comparison with NaN will also be false (e.g. `3D > Double.NaN` -> false, `Double.NaN > 3D` -> false), I think if there are custom implementations for this summary info that uses this approach, the assumption that `lowerBound` will not be `NaN` may not hold. and in this case `compare(xxx, lowerBound) < 0` will lead to true which lets the `canContain` return false which may not be correct. In this case I think it might be safer if we add a logic to explicitly check for `lowerbound` for NaN value. What do you think? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
