[
https://issues.apache.org/jira/browse/HUDI-5816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17706620#comment-17706620
]
Ethan Guo commented on HUDI-5816:
---------------------------------
1. For syncing to Glue, advise users not to use Hive Sync.
2. Rewrite Hudi GlueSync to not have this versioning problem when updating
timestamp checkpoint. (TBD if this is actually doable)
[https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-catalog.html]
3. [for both Glue and HMS] When sync falls behind, force a
HoodieMetadata#getAllPartitionPaths() or sth and do a diff against metastore
and sync once .. update timestamp, let the usual flow happen out of active
timeline
> Avoid loading archived timeline during meta sync
> ------------------------------------------------
>
> Key: HUDI-5816
> URL: https://issues.apache.org/jira/browse/HUDI-5816
> Project: Apache Hudi
> Issue Type: Improvement
> Reporter: Ethan Guo
> Assignee: Ethan Guo
> Priority: Critical
>
> We still load archived timeline when the last sync timestamp is before the
> active timeline, during the meta sync. Instead, we can list all partitions
> as the fallback, and this is faster if the metadata table is enabled.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)