nsivabalan commented on code in PR #13340:
URL: https://github.com/apache/hudi/pull/13340#discussion_r2112875057


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java:
##########
@@ -1583,6 +1583,7 @@ static HoodieActiveTimeline 
runPendingTableServicesOperationsAndRefreshTimeline(
    * deltacommit.
    */
   void compactIfNecessary(BaseHoodieWriteClient<?,I,?,O> writeClient, 
Option<String> latestDeltaCommitTimeOpt) {
+    // TODO how to handle this case where compaction needs to be written in 
the past

Review Comment:
   this entire method `compactIfNecessary` is overridden in 
HoodieBackedTableMetadataWriterTableVersionSix. 
   
   So, we can park aside table version 6 for now. 
   
   I also did notice that even in tbl v 8, MDT compaction does get blocked on 
pending instant in DT. 
   
   tl;dr: should be doable to relax. 
   
   lets walk through the challenges and talk it through. 
   
   say we have, t1, t2, t100 delta commits in MDT.
   latest files slice is from t90 until t100. 
   and t97 failed in DT, but succeeded in MDT. 
   
   - when we trigger compaction planning at this juncture, the compaction 
planning should ignore the log file written by t97 (which succeeded in MDT, but 
failed in DT). this is already in place. So, we don't need to make any 
additional fixes. So that the new hfile that compaction creates should not have 
any data from log file created by t97. 
   - but once compaction planning completes, any new log files will be added to 
the new file slice that compaction might eventually add. 
   - rollback for t97 from data table will also trigger a rollback in MDT. in 
tbl v 8, we just delete the log files of interest. So, not issues seen here. 
   - when t97 is re-attempted in data table, it will result in a DC in mdt, 
which will add a log file. Just that the requested time might be less than the 
compaction instant time, but the completion time will definitely be larger. 
   
   So, from what I can guage, we can relax this. 
   
   @danny0405 : did you notice any gaps specifically around this. If I am not 
wrong, this code snippet (Trigger compaction with max instant time that is 
smaller than(or equals) the earliest pending instant from DT) was authored by 
you. 
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to