nsivabalan commented on code in PR #12236:
URL: https://github.com/apache/hudi/pull/12236#discussion_r1992435832
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieTableServiceClient.java:
##########
@@ -298,12 +314,56 @@ protected HoodieWriteMetadata<O> compact(String
compactionInstantTime, boolean s
table.getMetaClient().reloadActiveTimeline();
}
compactionTimer = metrics.getCompactionCtx();
+ // start commit in MDT if enabled
+ Option<HoodieTableMetadataWriter> metadataWriterOpt =
getMetadataWriterFunc.apply(compactionInstantTime, table.getMetaClient());
+ if (metadataWriterOpt.isPresent()) {
Review Comment:
table services are done this way (and is different from ingestion commits),
bcoz the schedulding and execution could happen separately. but with MDT, if we
start the commit during compaction scheduling in data table, and defer the
execution later, some other thread in MDT could detect failed heart beats for
the corresponding DC in MDT and can trigger rollback. So, we are deferring the
starting of DC in MDT for data table table services just when the execution of
table services start. So, that we know the heart beats will be continuous and
if anything failed mid-way, it will get lazily rolled back.
But wanted to jam something on this end. Can we completely disable auto
rollbacks in MDT. the data table writer is the only one that can trigger
rollbacks for the current commit its dealing with.
What this means is:
When an ingestion commit in DT fails mid-way in MDT:
- the resp DC in MDT will be inflight until the rollback of data table
kicks in. And when the rollback in data table reaches MDT layer, it can
rollback as usual.
For compaction and clustering:
- Compaction in DT failed mid-way while writing to resp DC in MDT. This
will stay inflight until the next attempt of DT compaction resumes. On
resuming, hudi triggers a rollback of the compaction commit in DT which will
then gets applied to MDT as well. i..e result in rolling back the compaction
commit. and then the compaction in DT will go through 2nd attempt. which in
turn will get applied as DC in MDT.
So, if there are any table services or ingestion commit stays inflight in
data table for a long duration, this could also mean, inflight hanging around
in MDT.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]