yihua opened a new pull request, #8388:
URL: https://github.com/apache/hudi/pull/8388

   ### Change Logs
   
   This PR adds the fallback mechanism in Hive and Glue catalog sync so that if 
the last commit time synced falls behind to be before the start of the active 
timeline of Hudi table, the sync gets all partition paths on storage and 
resolves the difference compared to what's in the metastore, instead of reading 
archived timeline which can be expensive in I/O.  The PR also enhances the 
tests to cover the new logic.
   
   Note that, the last commit time synced CAN fall behind, especially for Glue 
catalog, where `hoodie.datasource.meta_sync.condition.sync` is recommended to 
be set to `true` so that the last commit time synced is only updated upon 
partition changes, to limit the number of versions of data in Glue catalog.
   
   ### Impact
   
   Avoids loading archived timeline during Hive and Glue Sync.
   
   ### Risk level
   
   low
   
   ### Documentation Update
   
   No documentation update needed.
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to