[
https://issues.apache.org/jira/browse/HUDI-3068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Harshal Patil updated HUDI-3068:
--------------------------------
Status: In Progress (was: Open)
> Add support to sync all partitions in hive sync tool
> ----------------------------------------------------
>
> Key: HUDI-3068
> URL: https://issues.apache.org/jira/browse/HUDI-3068
> Project: Apache Hudi
> Issue Type: Improvement
> Components: Hive Integration
> Reporter: sivabalan narayanan
> Assignee: Harshal Patil
> Priority: Major
> Labels: pull-request-available, sev:critical
>
> If a user runs hive sync occationally and if archival kicked in and trimmed
> some commits and if there were partitions added during those commits which
> was never updated later, hive sync will miss out those partitions.
> {code:java}
> LOG.info("Last commit time synced is " + lastCommitTimeSynced.get() + ",
> Getting commits since then");
> return
> TimelineUtils.getPartitionsWritten(metaClient.getActiveTimeline().getCommitsTimeline()
> .findInstantsAfter(lastCommitTimeSynced.get(), Integer.MAX_VALUE));
> } {code}
> bcoz, we for recurrent syncs, we always fetch new commits from timeline after
> the last synced instant and fetch commit metadata and go on to fetch the
> partitions added as part of it.
>
> We can add a new config to hive sync tool to override this behavior.
> --sync-all-partitions
> when this config is set to true, we should ignore last synced instant and
> should go the below route which is done when syncing for the first time.
>
> {code:java}
> if (!lastCommitTimeSynced.isPresent()) {
> LOG.info("Last commit time synced is not known, listing all partitions in "
> + basePath + ",FS :" + fs);
> HoodieLocalEngineContext engineContext = new
> HoodieLocalEngineContext(metaClient.getHadoopConf());
> return FSUtils.getAllPartitionPaths(engineContext, basePath,
> useFileListingFromMetadata, assumeDatePartitioning);
> } {code}
>
>
> Ref issue:
> https://github.com/apache/hudi/issues/3890
--
This message was sent by Atlassian Jira
(v8.20.1#820001)