BalaMahesh commented on issue #9758:
URL: https://github.com/apache/hudi/issues/9758#issuecomment-1792328313

   hoodie.clean.async=false
   
   after setting this false compaction is being triggered for the metadata 
table, earlier always there are pendingInstants of delta commits because async 
clean kicks in and creates a new delta commit on metadata table timeline which 
is earlier than the one on data timeline. 
   
   ```
   List<HoodieInstant> pendingInstants = 
dataMetaClient.reloadActiveTimeline().filterInflightsAndRequested()
           
.findInstantsBeforeOrEquals(latestDeltaCommitTimeInMetadataTable).getInstants();
   
       if (!pendingInstants.isEmpty()) {
         LOG.info(String.format(
             "Cannot compact metadata table as there are %d inflight instants 
in data table before latest deltacommit in metadata table: %s. Inflight 
instants in data table: %s",
             pendingInstants.size(), latestDeltaCommitTimeInMetadataTable, 
Arrays.toString(pendingInstants.toArray())));
         return;
       }
   ``` 
   
   This piece of code in `HoodieBackedTableMetadataWriter` , 
   
   
   ```
   if (lastCompletedCompactionInstant.isPresent()
           && metadataMetaClient.getActiveTimeline().filterCompletedInstants()
               
.findInstantsAfter(lastCompletedCompactionInstant.get().getTimestamp()).countInstants()
 < 3) {
   ```
   To clean the metadata files , we should always keep: 
    
   hoodie.metadata.compact.max.delta.commits=3
   
   
   else files will be never cleaned and piled up. 
   
   After making these changes in config, I can see cleaner metadata files 
partition, earlier it was piled up with old files. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to