sivabalan narayanan created HUDI-5408:
-----------------------------------------

             Summary: Partially failed commits in MDT is not rolledback in all 
cases
                 Key: HUDI-5408
                 URL: https://issues.apache.org/jira/browse/HUDI-5408
             Project: Apache Hudi
          Issue Type: Bug
          Components: metadata
            Reporter: sivabalan narayanan
            Assignee: sivabalan narayanan
             Fix For: 0.12.2


when compaction failed after completing in MDT but before completing in DT. and 
later when we re-attempt to apply the same compaction instant to MDT, we might 
miss to rollback any partially failed commit in MDT. 
Code of interest in SparkHoodieBackedTableMetadataWriter:
{code:java}
if (!metadataMetaClient.getActiveTimeline().containsInstant(instantTime)) {
  // if this is a new commit being applied to metadata for the first time
  writeClient.startCommitWithTime(instantTime);
} else {
  Option<HoodieInstant> alreadyCompletedInstant = 
metadataMetaClient.getActiveTimeline().filterCompletedInstants().filter(entry 
-> entry.getTimestamp().equals(instantTime)).lastInstant();
  if (alreadyCompletedInstant.isPresent()) {
    // this code path refers to a re-attempted commit that got committed to 
metadata table, but failed in datatable.
    // for eg, lets say compaction c1 on 1st attempt succeeded in metadata 
table and failed before committing to datatable.
    // when retried again, data table will first rollback pending compaction. 
these will be applied to metadata table, but all changes
    // are upserts to metadata table and so only a new delta commit will be 
created.
    // once rollback is complete, compaction will be retried again, which will 
eventually hit this code block where the respective commit is
    // already part of completed commit. So, we have to manually remove the 
completed instant and proceed.
    // and it is for the same reason we enabled 
withAllowMultiWriteOnSameInstant for metadata table.
    HoodieActiveTimeline.deleteInstantFile(metadataMetaClient.getFs(), 
metadataMetaClient.getMetaPath(), alreadyCompletedInstant.get());
    metadataMetaClient.reloadActiveTimeline();
  }
  // If the alreadyCompletedInstant is empty, that means there is a requested 
or inflight
  // instant with the same instant time.  This happens for data table clean 
action which
  // reuses the same instant time without rollback first.  It is a no-op here 
as the
  // clean plan is the same, so we don't need to delete the requested and 
inflight instant
  // files in the active timeline.
} {code}
incase of else block, if there happen to be a partially failed commit in MDT, 
we may miss to roll it back. 

we might need to fix the flow. 

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to