sivabalan narayanan created HUDI-3624:
-----------------------------------------

             Summary: Re-attempt of failed cleaning from DT to MDT fails 
                 Key: HUDI-3624
                 URL: https://issues.apache.org/jira/browse/HUDI-3624
             Project: Apache Hudi
          Issue Type: Bug
          Components: metadata
            Reporter: sivabalan narayanan


C5.clean.requested in DT 

C5.clean.inflight in DT

and then we try to apply this to MDT. 

C5.deltacommit.requested and C5.deltacommit.inflight

and crashed. 

 

If pipeline is restarted, cleaner will just go ahead retry the pending clean. 
It will not trigger rollback like compaction or clustering. So, This fails in 
MDT, since we only check for completed operation and avoid calling 
writeClient.startCommit(newInstantTime). 

 

SparkHoodieMetadataTableWriter 
{code:java}
if 
(!metadataMetaClient.getActiveTimeline().filterCompletedInstants().containsInstant(instantTime))
 {
  // if this is a new commit being applied to metadata for the first time
  writeClient.startCommitWithTime(instantTime);
} else {
  // this code path refers to a re-attempted commit that got committed to 
metadata table, but failed in datatable.
  // for eg, lets say compaction c1 on 1st attempt succeeded in metadata table 
and failed before committing to datatable.
  // when retried again, data table will first rollback pending compaction. 
these will be applied to metadata table, but all changes
  // are upserts to metadata table and so only a new delta commit will be 
created.
  // once rollback is complete, compaction will be retried again, which will 
eventually hit this code block where the respective commit is
  // already part of completed commit. So, we have to manually remove the 
completed instant and proceed.
  // and it is for the same reason we enabled withAllowMultiWriteOnSameInstant 
for metadata table.
  HoodieInstant alreadyCompletedInstant = 
metadataMetaClient.getActiveTimeline().filterCompletedInstants().filter(entry 
-> entry.getTimestamp().equals(instantTime)).lastInstant().get();
  HoodieActiveTimeline.deleteInstantFile(metadataMetaClient.getFs(), 
metadataMetaClient.getMetaPath(), alreadyCompletedInstant);
  metadataMetaClient.reloadActiveTimeline();
}
List<WriteStatus> statuses = writeClient.upsertPreppedRecords(preppedRecordRDD, 
instantTime).collect();
 {code}
In above code snippet, the if condition should be fixed to check for any 
instant in timeline and not just completed. 

And within else block, if there is a completed instant, we should delete. if 
not, its a no-op. 

 

 

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to