codope commented on code in PR #10352:
URL: https://github.com/apache/hudi/pull/10352#discussion_r1499498618
##########
hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataUtil.java:
##########
@@ -1901,4 +1910,162 @@ private static Path filePath(String basePath, String
partition, String filename)
return new Path(basePath, partition + Path.SEPARATOR + filename);
}
}
+
+ public static HoodieData<HoodieRecord>
convertFilesToPartitionStatsRecords(HoodieEngineContext engineContext,
+
List<DirectoryInfo> partitionInfoList,
+
MetadataRecordsGenerationParams recordsGenerationParams) {
+ // Find the columns to index
+ HoodieTableMetaClient dataTableMetaClient =
recordsGenerationParams.getDataMetaClient();
+ final List<String> columnsToIndex = getColumnsToIndex(
+ recordsGenerationParams,
+ Lazy.lazily(() -> tryResolveSchemaForTable(dataTableMetaClient)));
+ if (columnsToIndex.isEmpty()) {
+ return engineContext.emptyHoodieData();
+ }
+ LOG.debug(String.format("Indexing %d columns for partition stats index",
columnsToIndex.size()));
+ // Create records for MDT
+ int parallelism = Math.max(Math.min(partitionInfoList.size(),
recordsGenerationParams.getPartitionStatsIndexParallelism()), 1);
+ return engineContext.parallelize(partitionInfoList,
parallelism).flatMap(partitionFiles -> {
+ final String partitionName = partitionFiles.getRelativePath();
+ Stream<HoodieColumnRangeMetadata<Comparable>>
partitionStatsRangeMetadata =
partitionFiles.getFileNameToSizeMap().keySet().stream()
+ .map(fileName -> getFileStatsRangeMetadata(partitionName,
partitionName + "/" + fileName, dataTableMetaClient, columnsToIndex, false))
+ .map(BaseFileUtils::getColumnRangeInPartition);
+ return HoodieMetadataPayload.createPartitionStatsRecords(partitionName,
partitionStatsRangeMetadata.collect(toList()), false).iterator();
+ });
+ }
+
+ private static List<HoodieColumnRangeMetadata<Comparable>>
getFileStatsRangeMetadata(String partitionPath,
+
String filePath,
+
HoodieTableMetaClient datasetMetaClient,
+
List<String> columnsToIndex,
+
boolean isDeleted) {
+ String filePartitionPath = filePath.startsWith("/") ?
filePath.substring(1) : filePath;
Review Comment:
This `filePartitionPath` is a combination of partition name + "/" + file
name. For non-partitioned tables, we can have "/" in the beginning.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]