codope commented on code in PR #12050:
URL: https://github.com/apache/hudi/pull/12050#discussion_r1790575404
##########
hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataUtil.java:
##########
@@ -2051,48 +2069,53 @@ private static Stream<HoodieRecord>
collectAndProcessColumnMetadata(
.map(entry ->
FileFormatUtils.getColumnRangeInPartition(partitionPath, entry.getValue()));
// Create Partition Stats Records
- return HoodieMetadataPayload.createPartitionStatsRecords(partitionPath,
partitionStatsRangeMetadata.collect(Collectors.toList()), false);
+ return HoodieMetadataPayload.createPartitionStatsRecords(partitionPath,
partitionStatsRangeMetadata.collect(Collectors.toList()), false, isTightBound);
}
public static HoodieData<HoodieRecord>
convertFilesToPartitionStatsRecords(HoodieEngineContext engineContext,
List<DirectoryInfo> partitionInfoList,
HoodieMetadataConfig metadataConfig,
-
HoodieTableMetaClient dataTableMetaClient) {
+
HoodieTableMetaClient dataTableMetaClient,
+
Option<Schema> writerSchemaOpt) {
+ Lazy<Option<Schema>> lazyWriterSchemaOpt = writerSchemaOpt.isPresent() ?
Lazy.eagerly(writerSchemaOpt) : Lazy.lazily(() ->
tryResolveSchemaForTable(dataTableMetaClient));
final List<String> columnsToIndex = getColumnsToIndex(
metadataConfig.isPartitionStatsIndexEnabled(),
metadataConfig.getColumnsEnabledForColumnStatsIndex(),
- Lazy.lazily(() -> tryResolveSchemaForTable(dataTableMetaClient)));
+ lazyWriterSchemaOpt);
if (columnsToIndex.isEmpty()) {
LOG.warn("No columns to index for partition stats index");
return engineContext.emptyHoodieData();
}
LOG.debug("Indexing following columns for partition stats index: {}",
columnsToIndex);
// Create records for MDT
int parallelism = Math.max(Math.min(partitionInfoList.size(),
metadataConfig.getPartitionStatsIndexParallelism()), 1);
+ Option<Schema> writerSchema = lazyWriterSchemaOpt.get();
Review Comment:
This is intentional. There are other places from which `getColumnsToIndex`
is called, and schema is fetched lazily.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]