[ 
https://issues.apache.org/jira/browse/HUDI-2603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manoj Govindassamy updated HUDI-2603:
-------------------------------------
    Description: 
Metadata table is boostrapped whenever it finds its commits not synced up with 
data table. Each instantiation of metadata table does this check. When the 
metadata table is turned on at the start, and after few commits turned off, 
followed by more commits and then turned on again, the current check for 
bootstrapping doesn't seem to catch the intermittent breakages in the commit 
sync-up and missing out the bootstrap.

 
{code:java}
protected void bootstrapIfNeeded(HoodieEngineContext engineContext, 
HoodieTableMetaClient dataMetaClient) throws IOException {
  HoodieTimer timer = new HoodieTimer().startTimer();
  boolean exists = dataMetaClient.getFs().exists(new 
Path(metadataWriteConfig.getBasePath(), HoodieTableMetaClient.METAFOLDER_NAME));
  boolean rebootstrap = false;
  if (exists) {
    // If the un-synched instants have been archived then the metadata table 
will need to be bootstrapped again
    HoodieTableMetaClient metadataMetaClient = 
HoodieTableMetaClient.builder().setConf(hadoopConf.get())
        .setBasePath(metadataWriteConfig.getBasePath()).build();
    Option<HoodieInstant> latestMetadataInstant = 
metadataMetaClient.getActiveTimeline().filterCompletedInstants().lastInstant();
    if (!latestMetadataInstant.isPresent()) {
      LOG.warn("Metadata Table will need to be re-bootstrapped as no instants 
were found");
      rebootstrap = true;
    } else if 
(!latestMetadataInstant.get().getTimestamp().equals(SOLO_COMMIT_TIMESTAMP)
        && 
dataMetaClient.getActiveTimeline().getAllCommitsTimeline().isBeforeTimelineStarts(latestMetadataInstant.get().getTimestamp()))
 {
      // TODO: Revisit this logic and validate that filtering for all commits 
timeline is the right thing to do
      LOG.warn("Metadata Table will need to be re-bootstrapped as un-synced 
instants have been archived."
          + " latestMetadataInstant=" + 
latestMetadataInstant.get().getTimestamp()
          + ", latestDataInstant=" + 
dataMetaClient.getActiveTimeline().firstInstant().get().getTimestamp());
      rebootstrap = true;
    }
  }
{code}
 

  was:
Metadata table is boostrapped whenever it finds its commits not synced up with 
data table. Each instantiation of metadata table does this check. When the 
metadata table is turned on at the start, and after few commits turned off, 
followed by more commits and then turned on again, the current check for 
bootstrapping doesn't seem to catch the intermittent breakages in the commit 
syncup and missing out the bootstrap.

 

```

protected void bootstrapIfNeeded(HoodieEngineContext engineContext, 
HoodieTableMetaClient dataMetaClient) throws IOException {
 HoodieTimer timer = new HoodieTimer().startTimer();
 boolean exists = dataMetaClient.getFs().exists(new 
Path(metadataWriteConfig.getBasePath(), HoodieTableMetaClient.METAFOLDER_NAME));
 boolean rebootstrap = false;
 if (exists) {
 // If the un-synched instants have been archived then the metadata table will 
need to be bootstrapped again
 HoodieTableMetaClient metadataMetaClient = 
HoodieTableMetaClient.builder().setConf(hadoopConf.get())
 .setBasePath(metadataWriteConfig.getBasePath()).build();
 Option<HoodieInstant> latestMetadataInstant = 
metadataMetaClient.getActiveTimeline().filterCompletedInstants().lastInstant();
 if (!latestMetadataInstant.isPresent()) {
 LOG.warn("Metadata Table will need to be re-bootstrapped as no instants were 
found");
 rebootstrap = true;
 } else if 
(!latestMetadataInstant.get().getTimestamp().equals(SOLO_COMMIT_TIMESTAMP)
 && 
dataMetaClient.getActiveTimeline().getAllCommitsTimeline().isBeforeTimelineStarts(latestMetadataInstant.get().getTimestamp()))
 {
 // TODO: Revisit this logic and validate that filtering for all commits 
timeline is the right thing to do
 LOG.warn("Metadata Table will need to be re-bootstrapped as un-synced instants 
have been archived."
 + " latestMetadataInstant=" + latestMetadataInstant.get().getTimestamp()
 + ", latestDataInstant=" + 
dataMetaClient.getActiveTimeline().firstInstant().get().getTimestamp());
 rebootstrap = true;
 }
 }

```

 


> Metadata table bootstrapping is missed out when the feature is disabled 
> intermittently
> --------------------------------------------------------------------------------------
>
>                 Key: HUDI-2603
>                 URL: https://issues.apache.org/jira/browse/HUDI-2603
>             Project: Apache Hudi
>          Issue Type: Bug
>          Components: bootstrap
>            Reporter: Manoj Govindassamy
>            Assignee: Manoj Govindassamy
>            Priority: Major
>
> Metadata table is boostrapped whenever it finds its commits not synced up 
> with data table. Each instantiation of metadata table does this check. When 
> the metadata table is turned on at the start, and after few commits turned 
> off, followed by more commits and then turned on again, the current check for 
> bootstrapping doesn't seem to catch the intermittent breakages in the commit 
> sync-up and missing out the bootstrap.
>  
> {code:java}
> protected void bootstrapIfNeeded(HoodieEngineContext engineContext, 
> HoodieTableMetaClient dataMetaClient) throws IOException {
>   HoodieTimer timer = new HoodieTimer().startTimer();
>   boolean exists = dataMetaClient.getFs().exists(new 
> Path(metadataWriteConfig.getBasePath(), 
> HoodieTableMetaClient.METAFOLDER_NAME));
>   boolean rebootstrap = false;
>   if (exists) {
>     // If the un-synched instants have been archived then the metadata table 
> will need to be bootstrapped again
>     HoodieTableMetaClient metadataMetaClient = 
> HoodieTableMetaClient.builder().setConf(hadoopConf.get())
>         .setBasePath(metadataWriteConfig.getBasePath()).build();
>     Option<HoodieInstant> latestMetadataInstant = 
> metadataMetaClient.getActiveTimeline().filterCompletedInstants().lastInstant();
>     if (!latestMetadataInstant.isPresent()) {
>       LOG.warn("Metadata Table will need to be re-bootstrapped as no instants 
> were found");
>       rebootstrap = true;
>     } else if 
> (!latestMetadataInstant.get().getTimestamp().equals(SOLO_COMMIT_TIMESTAMP)
>         && 
> dataMetaClient.getActiveTimeline().getAllCommitsTimeline().isBeforeTimelineStarts(latestMetadataInstant.get().getTimestamp()))
>  {
>       // TODO: Revisit this logic and validate that filtering for all commits 
> timeline is the right thing to do
>       LOG.warn("Metadata Table will need to be re-bootstrapped as un-synced 
> instants have been archived."
>           + " latestMetadataInstant=" + 
> latestMetadataInstant.get().getTimestamp()
>           + ", latestDataInstant=" + 
> dataMetaClient.getActiveTimeline().firstInstant().get().getTimestamp());
>       rebootstrap = true;
>     }
>   }
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to