Manoj Govindassamy created HUDI-2603:
----------------------------------------
Summary: Metadata table bootstrapping is missed out when the
feature is disabled intermittently
Key: HUDI-2603
URL: https://issues.apache.org/jira/browse/HUDI-2603
Project: Apache Hudi
Issue Type: Bug
Components: bootstrap
Reporter: Manoj Govindassamy
Assignee: Manoj Govindassamy
Metadata table is boostrapped whenever it finds its commits not synced up with
data table. Each instantiation of metadata table does this check. When the
metadata table is turned on at the start, and after few commits turned off,
followed by more commits and then turned on again, the current check for
bootstrapping doesn't seem to catch the intermittent breakages in the commit
syncup and missing out the bootstrap.
```
protected void bootstrapIfNeeded(HoodieEngineContext engineContext,
HoodieTableMetaClient dataMetaClient) throws IOException {
HoodieTimer timer = new HoodieTimer().startTimer();
boolean exists = dataMetaClient.getFs().exists(new
Path(metadataWriteConfig.getBasePath(), HoodieTableMetaClient.METAFOLDER_NAME));
boolean rebootstrap = false;
if (exists) {
// If the un-synched instants have been archived then the metadata table will
need to be bootstrapped again
HoodieTableMetaClient metadataMetaClient =
HoodieTableMetaClient.builder().setConf(hadoopConf.get())
.setBasePath(metadataWriteConfig.getBasePath()).build();
Option<HoodieInstant> latestMetadataInstant =
metadataMetaClient.getActiveTimeline().filterCompletedInstants().lastInstant();
if (!latestMetadataInstant.isPresent()) {
LOG.warn("Metadata Table will need to be re-bootstrapped as no instants were
found");
rebootstrap = true;
} else if
(!latestMetadataInstant.get().getTimestamp().equals(SOLO_COMMIT_TIMESTAMP)
&&
dataMetaClient.getActiveTimeline().getAllCommitsTimeline().isBeforeTimelineStarts(latestMetadataInstant.get().getTimestamp()))
{
// TODO: Revisit this logic and validate that filtering for all commits
timeline is the right thing to do
LOG.warn("Metadata Table will need to be re-bootstrapped as un-synced instants
have been archived."
+ " latestMetadataInstant=" + latestMetadataInstant.get().getTimestamp()
+ ", latestDataInstant=" +
dataMetaClient.getActiveTimeline().firstInstant().get().getTimestamp());
rebootstrap = true;
}
}
```
--
This message was sent by Atlassian Jira
(v8.3.4#803005)