codope commented on code in PR #10352:
URL: https://github.com/apache/hudi/pull/10352#discussion_r1559793410
##########
hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataUtil.java:
##########
@@ -1915,4 +1924,162 @@ private static Path filePath(String basePath, String
partition, String filename)
return new Path(basePath, partition + StoragePath.SEPARATOR + filename);
}
}
+
+ public static HoodieData<HoodieRecord>
convertFilesToPartitionStatsRecords(HoodieEngineContext engineContext,
+
List<DirectoryInfo> partitionInfoList,
+
MetadataRecordsGenerationParams recordsGenerationParams) {
+ // Find the columns to index
+ HoodieTableMetaClient dataTableMetaClient =
recordsGenerationParams.getDataMetaClient();
+ final List<String> columnsToIndex = getColumnsToIndex(
+ recordsGenerationParams,
+ Lazy.lazily(() -> tryResolveSchemaForTable(dataTableMetaClient)));
+ if (columnsToIndex.isEmpty()) {
+ return engineContext.emptyHoodieData();
+ }
+ LOG.debug(String.format("Indexing %d columns for partition stats index",
columnsToIndex.size()));
+ // Create records for MDT
+ int parallelism = Math.max(Math.min(partitionInfoList.size(),
recordsGenerationParams.getPartitionStatsIndexParallelism()), 1);
+ return engineContext.parallelize(partitionInfoList,
parallelism).flatMap(partitionFiles -> {
+ final String partitionName = partitionFiles.getRelativePath();
+ Stream<HoodieColumnRangeMetadata<Comparable>>
partitionStatsRangeMetadata =
partitionFiles.getFileNameToSizeMap().keySet().stream()
+ .map(fileName -> getFileStatsRangeMetadata(partitionName,
partitionName + "/" + fileName, dataTableMetaClient, columnsToIndex, false))
+ .map(BaseFileUtils::getColumnRangeInPartition);
Review Comment:
This is fixed now and added a test `testConvertFilesToPartitionStatsRecords`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]