jonvex commented on code in PR #13711:
URL: https://github.com/apache/hudi/pull/13711#discussion_r2353725863
##########
hudi-common/src/main/java/org/apache/hudi/common/util/FileFormatUtils.java:
##########
@@ -64,10 +66,24 @@ public abstract class FileFormatUtils {
* @param fileColumnRanges List of column range statistics for each file in
a partition
*/
public static <T extends Comparable<T>> HoodieColumnRangeMetadata<T>
getColumnRangeInPartition(String relativePartitionPath,
+
String columnName,
@Nonnull List<HoodieColumnRangeMetadata<T>> fileColumnRanges,
-
Map<String, Schema> colsToIndexSchemaMap) {
+
Map<String, Schema> colsToIndexSchemaMap,
+
HoodieIndexVersion indexVersion) {
ValidationUtils.checkArgument(!fileColumnRanges.isEmpty(),
"fileColumnRanges should not be empty.");
+ if (indexVersion.greaterThanOrEquals(HoodieIndexVersion.V2)) {
+ ValueMetadata valueMetadata =
ValueMetadata.getValueMetadata(colsToIndexSchemaMap.get(columnName),
indexVersion);
+ return fileColumnRanges.stream()
+ .map(e -> {
+ T minValue = (T)
valueMetadata.standardizeJavaTypeAndPromote(e.getMinValue());
+ T maxValue = (T)
valueMetadata.standardizeJavaTypeAndPromote(e.getMaxValue());
+ return HoodieColumnRangeMetadata.create(
+ relativePartitionPath, e.getColumnName(), minValue, maxValue,
e.getNullCount(), e.getValueCount(), e.getTotalSize(),
+ e.getTotalUncompressedSize(), valueMetadata);
+ }).reduce(HoodieColumnRangeMetadata::merge).orElseThrow(() -> new
HoodieException("MergingColumnRanges failed."));
+ }
+ // we are reducing using merge so IDK why we think there are multiple cols
that need to go through schema evolution
Review Comment:
My point was that "colToIndex" should be the same for every range in
fileColumnRanges. And that the code acknowledges that fact by merging all of
them together right after this. So I was saying that it's weird that this code
acts like they could be different and uses a map schemaSeenForColsToIndex when
there will at most be 1 key. In fact in HoodieColumnRangeMetadata.merge we even
validate that the col names are the same
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]