yihua commented on code in PR #12081:
URL: https://github.com/apache/hudi/pull/12081#discussion_r1801788952
##########
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/io/TestHoodieTimelineArchiver.java:
##########
@@ -243,6 +243,7 @@ private HoodieWriteConfig
initTestTableAndGetWriteConfig(boolean enableMetadata,
.withFileSystemViewConfig(FileSystemViewStorageConfig.newBuilder()
.withRemoteServerPort(timelineServicePort).build())
.withMetadataConfig(HoodieMetadataConfig.newBuilder().enable(enableMetadata)
+ .withMetadataIndexPartitionStats(false)
Review Comment:
Could we have a config to turn off MDT as a whole instead of turning off
each MDT index individually?
##########
hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataUtil.java:
##########
@@ -2174,21 +2174,14 @@ public static HoodieMetadataColumnStats
mergeColumnStatsRecords(HoodieMetadataCo
return newColumnStats;
}
- Comparable minValue =
- (Comparable) Stream.of(
- (Comparable)
unwrapAvroValueWrapper(prevColumnStats.getMinValue()),
- (Comparable)
unwrapAvroValueWrapper(newColumnStats.getMinValue()))
- .filter(Objects::nonNull)
- .min(Comparator.naturalOrder())
- .orElse(null);
-
- Comparable maxValue =
- (Comparable) Stream.of(
- (Comparable)
unwrapAvroValueWrapper(prevColumnStats.getMaxValue()),
- (Comparable)
unwrapAvroValueWrapper(newColumnStats.getMaxValue()))
- .filter(Objects::nonNull)
- .max(Comparator.naturalOrder())
- .orElse(null);
+ Comparable minValue = castAndCompare(
+ unwrapAvroValueWrapper(newColumnStats.getMinValue()),
+ unwrapAvroValueWrapper(prevColumnStats.getMinValue()),
+ true);
Review Comment:
We need to think more about type promotion since the column and partition
stats may need to be regenerated in this case for correctness. For example,
take a list of values of `[1,2,3,4,5,6,7,8,9,10]`, when promoting the type from
integer to String, the min-max range is changed from `[1,10]` to `["1", "9"]`,
which cannot be done through simple type cast. I think it's better to avoid
the type cast here and handle the stats merge somewhere else to avoid possible
corruption of stats.
##########
hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataUtil.java:
##########
@@ -2174,21 +2174,14 @@ public static HoodieMetadataColumnStats
mergeColumnStatsRecords(HoodieMetadataCo
return newColumnStats;
}
- Comparable minValue =
- (Comparable) Stream.of(
- (Comparable)
unwrapAvroValueWrapper(prevColumnStats.getMinValue()),
- (Comparable)
unwrapAvroValueWrapper(newColumnStats.getMinValue()))
- .filter(Objects::nonNull)
- .min(Comparator.naturalOrder())
- .orElse(null);
-
- Comparable maxValue =
- (Comparable) Stream.of(
- (Comparable)
unwrapAvroValueWrapper(prevColumnStats.getMaxValue()),
- (Comparable)
unwrapAvroValueWrapper(newColumnStats.getMaxValue()))
- .filter(Objects::nonNull)
- .max(Comparator.naturalOrder())
- .orElse(null);
+ Comparable minValue = castAndCompare(
+ unwrapAvroValueWrapper(newColumnStats.getMinValue()),
+ unwrapAvroValueWrapper(prevColumnStats.getMinValue()),
+ true);
Review Comment:
Is this for supporting schema evolution?
##########
hudi-common/src/main/java/org/apache/hudi/metadata/FileSystemBackedTableMetadata.java:
##########
@@ -196,8 +196,8 @@ private List<String>
getPartitionPathWithPathPrefixUsingFilterExpression(String
} else if
(!path.getName().equals(HoodieTableMetaClient.METAFOLDER_NAME)) {
return Pair.of(Option.empty(), Option.of(path));
}
- } else if (path.getName()
-
.startsWith(HoodiePartitionMetadata.HOODIE_PARTITION_METAFILE_PREFIX)) {
+ } else if
(path.getName().startsWith(HoodiePartitionMetadata.HOODIE_PARTITION_METAFILE_PREFIX)
+ ||
MetadataPartitionType.isValidPartitionFilePath(path.getName())) {
Review Comment:
Let's not add MDT specific logic in this method, which should not be aware
of whether the table is a data table or metadata table. Could we fix the the
issue of writing partition metadata file in MDT directly?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]