nsivabalan commented on code in PR #13664:
URL: https://github.com/apache/hudi/pull/13664#discussion_r2248072735
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java:
##########
@@ -1485,14 +1485,53 @@ public void update(HoodieRestoreMetadata
restoreMetadata, String instantTime) {
// We need to choose a timestamp which would be a validInstantTime for
MDT. This is either a commit timestamp completed on the dataset
// or a new timestamp which we use for MDT clean, compaction etc.
String syncCommitTime = createRestoreInstantTime();
+ // For Files partition.
processAndCommit(syncCommitTime, () ->
HoodieTableMetadataUtil.convertMissingPartitionRecords(engineContext,
partitionsToDelete, partitionFilesToAdd, partitionFilesToDelete,
syncCommitTime));
+ // For Column Stats partition.
+ processAndCommit(syncCommitTime, () -> convertToColumnStatsRecord(
Review Comment:
this results in two delta commits to MDT man.
we should do it in one commit (both FILES partition and COL_STATS partition)
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java:
##########
@@ -1485,14 +1485,53 @@ public void update(HoodieRestoreMetadata
restoreMetadata, String instantTime) {
// We need to choose a timestamp which would be a validInstantTime for
MDT. This is either a commit timestamp completed on the dataset
// or a new timestamp which we use for MDT clean, compaction etc.
String syncCommitTime = createRestoreInstantTime();
+ // For Files partition.
processAndCommit(syncCommitTime, () ->
HoodieTableMetadataUtil.convertMissingPartitionRecords(engineContext,
partitionsToDelete, partitionFilesToAdd, partitionFilesToDelete,
syncCommitTime));
+ // For Column Stats partition.
+ processAndCommit(syncCommitTime, () -> convertToColumnStatsRecord(
+ partitionFilesToAdd, partitionFilesToDelete, engineContext,
dataMetaClient,
+ dataWriteConfig.getMetadataConfig(),
Option.of(dataWriteConfig.getRecordMerger().getRecordType()),
+
dataWriteConfig.getMetadataConfig().getColumnStatsIndexParallelism()));
+ // Close.
closeInternal();
} catch (IOException e) {
throw new HoodieMetadataException("IOException during MDT restore sync",
e);
}
}
+ static Map<String, HoodieData<HoodieRecord>>
convertToColumnStatsRecord(Map<String, Map<String, Long>> partitionFilesToAdd,
+
Map<String, List<String>> partitionFilesToDelete,
+
HoodieEngineContext engineContext,
+
HoodieTableMetaClient dataMetaClient,
+
HoodieMetadataConfig metadataConfig,
+
Option<HoodieRecord.HoodieRecordType> recordTypeOpt,
+ int
columnStatsIndexParallelism) {
+ if (partitionFilesToDelete.isEmpty() && partitionFilesToAdd.isEmpty()) {
+ return Collections.emptyMap();
+ }
+ Lazy<Option<Schema>> tableSchema =
+ Lazy.lazily(() ->
HoodieTableMetadataUtil.tryResolveSchemaForTable(dataMetaClient));
Review Comment:
We are doing a fs based listing already for restore. so, a timeline listing
is not going to be that bad compared to that.
unfortunately, we don't have a good solution to this.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]