yihua opened a new pull request, #8388: URL: https://github.com/apache/hudi/pull/8388
### Change Logs This PR adds the fallback mechanism in Hive and Glue catalog sync so that if the last commit time synced falls behind to be before the start of the active timeline of Hudi table, the sync gets all partition paths on storage and resolves the difference compared to what's in the metastore, instead of reading archived timeline which can be expensive in I/O. The PR also enhances the tests to cover the new logic. Note that, the last commit time synced CAN fall behind, especially for Glue catalog, where `hoodie.datasource.meta_sync.condition.sync` is recommended to be set to `true` so that the last commit time synced is only updated upon partition changes, to limit the number of versions of data in Glue catalog. ### Impact Avoids loading archived timeline during Hive and Glue Sync. ### Risk level low ### Documentation Update No documentation update needed. ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
