[GitHub] [hudi] nsivabalan commented on a diff in pull request #8684: [HUDI-6200] Enhancements to the MDT for improving performance of larger indexes.

via GitHub Fri, 12 May 2023 11:33:30 -0700


nsivabalan commented on code in PR #8684:
URL: https://github.com/apache/hudi/pull/8684#discussion_r1192675584



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/index/ScheduleIndexActionExecutor.java:
##########
@@ -100,15 +99,6 @@ public Option<HoodieIndexPlan> execute() {
       // get last completed instant
       Option<HoodieInstant> indexUptoInstant = 
table.getActiveTimeline().getContiguousCompletedWriteTimeline().lastInstant();
       if (indexUptoInstant.isPresent()) {
-        // start initializing file groups
-        // in case FILES partition itself was not initialized before (i.e. 
metadata was never enabled), this will initialize synchronously
-        HoodieTableMetadataWriter metadataWriter = 
table.getMetadataWriter(instantTime)
-            .orElseThrow(() -> new HoodieIndexException(String.format("Could 
not get metadata writer to initialize filegroups for indexing for instant: %s", 
instantTime)));
-        if 
(!finalPartitionsToIndex.get(0).getPartitionPath().equals(MetadataPartitionType.FILES.getPartitionPath()))
 {
-          // initialize metadata partition only if not for FILES partition.
-          metadataWriter.initializeMetadataPartitions(table.getMetaClient(), 
finalPartitionsToIndex, indexUptoInstant.get().getTimestamp());

Review Comment:
   So, prior to this patch, during schedule indexing, we will initialize the 
partition of interest w/ empty delete log block. and when executing the 
indexing action, we will fully populate the valid records in MDT for the resp 
partition. 
   but after this patch, during schedule, we don't do such initialization. Only 
when we are executing the indexing action, we will initialize the partition of 
interest and populate the records as well 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] nsivabalan commented on a diff in pull request #8684: [HUDI-6200] Enhancements to the MDT for improving performance of larger indexes.

Reply via email to