nsivabalan commented on code in PR #13340:
URL: https://github.com/apache/hudi/pull/13340#discussion_r2112875057
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java:
##########
@@ -1583,6 +1583,7 @@ static HoodieActiveTimeline
runPendingTableServicesOperationsAndRefreshTimeline(
* deltacommit.
*/
void compactIfNecessary(BaseHoodieWriteClient<?,I,?,O> writeClient,
Option<String> latestDeltaCommitTimeOpt) {
+ // TODO how to handle this case where compaction needs to be written in
the past
Review Comment:
this entire method `compactIfNecessary` is overridden in
HoodieBackedTableMetadataWriterTableVersionSix.
So, we can park aside table version 6 for now.
I also did notice that even in tbl v 8, MDT compaction does get blocked on
pending instant in DT.
tl;dr: should be doable to relax.
lets walk through the challenges and talk it through.
say we have, t1, t2, t100 delta commits in MDT.
latest files slice is from t90 until t100.
and t97 failed in DT, but succeeded in MDT.
- when we trigger compaction planning at this juncture, the compaction
planning should ignore the log file written by t97 (which succeeded in MDT, but
failed in DT). this is already in place. So, we don't need to make any
additional fixes. So that the new hfile that compaction creates should not have
any data from log file created by t97.
- but once compaction planning completes, any new log files will be added to
the new file slice that compaction might eventually add.
- rollback for t97 from data table will also trigger a rollback in MDT. in
tbl v 8, we just delete the log files of interest. So, not issues seen here.
- when t97 is re-attempted in data table, it will result in a DC in mdt,
which will add a log file. Just that the requested time might be less than the
compaction instant time, but the completion time will definitely be larger.
So, from what I can guage, we can relax this.
@danny0405 : did you notice any gaps specifically around this. If I am not
wrong, this code snippet (Trigger compaction with max instant time that is
smaller than(or equals) the earliest pending instant from DT) was authored by
you.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]