[
https://issues.apache.org/jira/browse/HUDI-2603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Manoj Govindassamy updated HUDI-2603:
-------------------------------------
Description:
Metadata table is boostrapped whenever it finds its commits not synced up with
data table. Each instantiation of metadata table does this check. When the
metadata table is turned on at the start, and after few commits turned off,
followed by more commits and then turned on again, the current check for
bootstrapping doesn't seem to catch the intermittent breakages in the commit
sync-up and missing out the bootstrap.
{code:java}
protected void bootstrapIfNeeded(HoodieEngineContext engineContext,
HoodieTableMetaClient dataMetaClient) throws IOException {
HoodieTimer timer = new HoodieTimer().startTimer();
boolean exists = dataMetaClient.getFs().exists(new
Path(metadataWriteConfig.getBasePath(), HoodieTableMetaClient.METAFOLDER_NAME));
boolean rebootstrap = false;
if (exists) {
// If the un-synched instants have been archived then the metadata table
will need to be bootstrapped again
HoodieTableMetaClient metadataMetaClient =
HoodieTableMetaClient.builder().setConf(hadoopConf.get())
.setBasePath(metadataWriteConfig.getBasePath()).build();
Option<HoodieInstant> latestMetadataInstant =
metadataMetaClient.getActiveTimeline().filterCompletedInstants().lastInstant();
if (!latestMetadataInstant.isPresent()) {
LOG.warn("Metadata Table will need to be re-bootstrapped as no instants
were found");
rebootstrap = true;
} else if
(!latestMetadataInstant.get().getTimestamp().equals(SOLO_COMMIT_TIMESTAMP)
&&
dataMetaClient.getActiveTimeline().getAllCommitsTimeline().isBeforeTimelineStarts(latestMetadataInstant.get().getTimestamp()))
{
// TODO: Revisit this logic and validate that filtering for all commits
timeline is the right thing to do
LOG.warn("Metadata Table will need to be re-bootstrapped as un-synced
instants have been archived."
+ " latestMetadataInstant=" +
latestMetadataInstant.get().getTimestamp()
+ ", latestDataInstant=" +
dataMetaClient.getActiveTimeline().firstInstant().get().getTimestamp());
rebootstrap = true;
}
}
{code}
was:
Metadata table is boostrapped whenever it finds its commits not synced up with
data table. Each instantiation of metadata table does this check. When the
metadata table is turned on at the start, and after few commits turned off,
followed by more commits and then turned on again, the current check for
bootstrapping doesn't seem to catch the intermittent breakages in the commit
syncup and missing out the bootstrap.
```
protected void bootstrapIfNeeded(HoodieEngineContext engineContext,
HoodieTableMetaClient dataMetaClient) throws IOException {
HoodieTimer timer = new HoodieTimer().startTimer();
boolean exists = dataMetaClient.getFs().exists(new
Path(metadataWriteConfig.getBasePath(), HoodieTableMetaClient.METAFOLDER_NAME));
boolean rebootstrap = false;
if (exists) {
// If the un-synched instants have been archived then the metadata table will
need to be bootstrapped again
HoodieTableMetaClient metadataMetaClient =
HoodieTableMetaClient.builder().setConf(hadoopConf.get())
.setBasePath(metadataWriteConfig.getBasePath()).build();
Option<HoodieInstant> latestMetadataInstant =
metadataMetaClient.getActiveTimeline().filterCompletedInstants().lastInstant();
if (!latestMetadataInstant.isPresent()) {
LOG.warn("Metadata Table will need to be re-bootstrapped as no instants were
found");
rebootstrap = true;
} else if
(!latestMetadataInstant.get().getTimestamp().equals(SOLO_COMMIT_TIMESTAMP)
&&
dataMetaClient.getActiveTimeline().getAllCommitsTimeline().isBeforeTimelineStarts(latestMetadataInstant.get().getTimestamp()))
{
// TODO: Revisit this logic and validate that filtering for all commits
timeline is the right thing to do
LOG.warn("Metadata Table will need to be re-bootstrapped as un-synced instants
have been archived."
+ " latestMetadataInstant=" + latestMetadataInstant.get().getTimestamp()
+ ", latestDataInstant=" +
dataMetaClient.getActiveTimeline().firstInstant().get().getTimestamp());
rebootstrap = true;
}
}
```
> Metadata table bootstrapping is missed out when the feature is disabled
> intermittently
> --------------------------------------------------------------------------------------
>
> Key: HUDI-2603
> URL: https://issues.apache.org/jira/browse/HUDI-2603
> Project: Apache Hudi
> Issue Type: Bug
> Components: bootstrap
> Reporter: Manoj Govindassamy
> Assignee: Manoj Govindassamy
> Priority: Major
>
> Metadata table is boostrapped whenever it finds its commits not synced up
> with data table. Each instantiation of metadata table does this check. When
> the metadata table is turned on at the start, and after few commits turned
> off, followed by more commits and then turned on again, the current check for
> bootstrapping doesn't seem to catch the intermittent breakages in the commit
> sync-up and missing out the bootstrap.
>
> {code:java}
> protected void bootstrapIfNeeded(HoodieEngineContext engineContext,
> HoodieTableMetaClient dataMetaClient) throws IOException {
> HoodieTimer timer = new HoodieTimer().startTimer();
> boolean exists = dataMetaClient.getFs().exists(new
> Path(metadataWriteConfig.getBasePath(),
> HoodieTableMetaClient.METAFOLDER_NAME));
> boolean rebootstrap = false;
> if (exists) {
> // If the un-synched instants have been archived then the metadata table
> will need to be bootstrapped again
> HoodieTableMetaClient metadataMetaClient =
> HoodieTableMetaClient.builder().setConf(hadoopConf.get())
> .setBasePath(metadataWriteConfig.getBasePath()).build();
> Option<HoodieInstant> latestMetadataInstant =
> metadataMetaClient.getActiveTimeline().filterCompletedInstants().lastInstant();
> if (!latestMetadataInstant.isPresent()) {
> LOG.warn("Metadata Table will need to be re-bootstrapped as no instants
> were found");
> rebootstrap = true;
> } else if
> (!latestMetadataInstant.get().getTimestamp().equals(SOLO_COMMIT_TIMESTAMP)
> &&
> dataMetaClient.getActiveTimeline().getAllCommitsTimeline().isBeforeTimelineStarts(latestMetadataInstant.get().getTimestamp()))
> {
> // TODO: Revisit this logic and validate that filtering for all commits
> timeline is the right thing to do
> LOG.warn("Metadata Table will need to be re-bootstrapped as un-synced
> instants have been archived."
> + " latestMetadataInstant=" +
> latestMetadataInstant.get().getTimestamp()
> + ", latestDataInstant=" +
> dataMetaClient.getActiveTimeline().firstInstant().get().getTimestamp());
> rebootstrap = true;
> }
> }
> {code}
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)