nsivabalan commented on issue #3890:
URL: https://github.com/apache/hudi/issues/3890#issuecomment-997541013


   yeah, I inspected the code and looks like there could be a gap. 
   ie. if every day you create new partitions and once you get past the date, 
if older partitions may never get updated, and if you fail to sync daily, and 
if archival is aggressive such that it trimmed some commits pertaining to 
partitions which was never synced, our hive sync tool might miss to sync those 
partitions. 
   
   From AbstractSyncHoodieClient: 
   ```
         LOG.info("Last commit time synced is " + lastCommitTimeSynced.get() + 
", Getting commits since then");
         return 
TimelineUtils.getPartitionsWritten(metaClient.getActiveTimeline().getCommitsTimeline()
             .findInstantsAfter(lastCommitTimeSynced.get(), Integer.MAX_VALUE));
       }
   ```
   So, here, we look at commits from last synced instant in active timeline and 
fetch the commit metadata and find the partitions to sync. 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to