Manoj Govindassamy created HUDI-2603:
----------------------------------------

             Summary: Metadata table bootstrapping is missed out when the 
feature is disabled intermittently
                 Key: HUDI-2603
                 URL: https://issues.apache.org/jira/browse/HUDI-2603
             Project: Apache Hudi
          Issue Type: Bug
          Components: bootstrap
            Reporter: Manoj Govindassamy
            Assignee: Manoj Govindassamy


Metadata table is boostrapped whenever it finds its commits not synced up with 
data table. Each instantiation of metadata table does this check. When the 
metadata table is turned on at the start, and after few commits turned off, 
followed by more commits and then turned on again, the current check for 
bootstrapping doesn't seem to catch the intermittent breakages in the commit 
syncup and missing out the bootstrap.

 

```

protected void bootstrapIfNeeded(HoodieEngineContext engineContext, 
HoodieTableMetaClient dataMetaClient) throws IOException {
 HoodieTimer timer = new HoodieTimer().startTimer();
 boolean exists = dataMetaClient.getFs().exists(new 
Path(metadataWriteConfig.getBasePath(), HoodieTableMetaClient.METAFOLDER_NAME));
 boolean rebootstrap = false;
 if (exists) {
 // If the un-synched instants have been archived then the metadata table will 
need to be bootstrapped again
 HoodieTableMetaClient metadataMetaClient = 
HoodieTableMetaClient.builder().setConf(hadoopConf.get())
 .setBasePath(metadataWriteConfig.getBasePath()).build();
 Option<HoodieInstant> latestMetadataInstant = 
metadataMetaClient.getActiveTimeline().filterCompletedInstants().lastInstant();
 if (!latestMetadataInstant.isPresent()) {
 LOG.warn("Metadata Table will need to be re-bootstrapped as no instants were 
found");
 rebootstrap = true;
 } else if 
(!latestMetadataInstant.get().getTimestamp().equals(SOLO_COMMIT_TIMESTAMP)
 && 
dataMetaClient.getActiveTimeline().getAllCommitsTimeline().isBeforeTimelineStarts(latestMetadataInstant.get().getTimestamp()))
 {
 // TODO: Revisit this logic and validate that filtering for all commits 
timeline is the right thing to do
 LOG.warn("Metadata Table will need to be re-bootstrapped as un-synced instants 
have been archived."
 + " latestMetadataInstant=" + latestMetadataInstant.get().getTimestamp()
 + ", latestDataInstant=" + 
dataMetaClient.getActiveTimeline().firstInstant().get().getTimestamp());
 rebootstrap = true;
 }
 }

```

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to