codope commented on code in PR #12081:
URL: https://github.com/apache/hudi/pull/12081#discussion_r1802472034
##########
hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataUtil.java:
##########
@@ -2174,21 +2174,14 @@ public static HoodieMetadataColumnStats
mergeColumnStatsRecords(HoodieMetadataCo
return newColumnStats;
}
- Comparable minValue =
- (Comparable) Stream.of(
- (Comparable)
unwrapAvroValueWrapper(prevColumnStats.getMinValue()),
- (Comparable)
unwrapAvroValueWrapper(newColumnStats.getMinValue()))
- .filter(Objects::nonNull)
- .min(Comparator.naturalOrder())
- .orElse(null);
-
- Comparable maxValue =
- (Comparable) Stream.of(
- (Comparable)
unwrapAvroValueWrapper(prevColumnStats.getMaxValue()),
- (Comparable)
unwrapAvroValueWrapper(newColumnStats.getMaxValue()))
- .filter(Objects::nonNull)
- .max(Comparator.naturalOrder())
- .orElse(null);
+ Comparable minValue = castAndCompare(
+ unwrapAvroValueWrapper(newColumnStats.getMinValue()),
+ unwrapAvroValueWrapper(prevColumnStats.getMinValue()),
+ true);
Review Comment:
+1 we have to be careful about this. Numerical promotions that retain the
numerical nature of the data (like int to long, float, or double) and some
well-structured type conversions (e.g., timestamp to string in ISO format) are
typically order-preserving. However, transitions involving strings, especially
when converting from numeric types, are often not order-preserving.
This should be an issue even with column stats. To move forward, I would
suggest to catch the cast exception and throw a meaningful error and ask user
to disable column stats/partition stats to unblock ingestion. Wdyt @yihua
@linliu-code @nsivabalan ?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]