kbuci commented on PR #18012:
URL: https://github.com/apache/hudi/pull/18012#issuecomment-3846043607

   @nsivabalan While checking failing UTs, I noticed a complication. In some 
places the client calls `compact` with auto commit as false, and then later 
calls one of the "commit compaction" APIs. At first I thought this was just for 
UTs, but I realized that this is actually a legitimate way users can execute a 
compaction
   In spark I see some usages like
   `org.apache.hudi.utilities.HoodieCompactor#doCompact`
   `org.apache.spark.sql.hudi.command.procedures.RunCompactionProcedure#call`
   And for flink specifically as well 
   `org.apache.hudi.sink.v2.compact.CompactionCommitSinkV2#commitIfNecessary`
   
   Because of this, for now I am thinking that we can just have this heatbeat 
guard if auto-complete/commit is enabled? 
   The alternative is that we change the heartbeating logic such that it starts 
and closes the heartbeat twice
   - When calling the `compact` API it starts a heartbeat  (within a 
transaction) at beginning , and then stops it before it exits the function (or 
auto-commits)
   - When calling any of the APIs to commit a compaction, we again start a 
heartbeat (within a transaction) and then stop it after completing the 
compaction commit
   The issue though is that this adds some code complexity since there seem to 
be multiple APIs that users can directly call for committing a compaction:
   - `org.apache.hudi.client.BaseHoodieWriteClient#completeCompaction` (mainly 
used by Flink)
   - `org.apache.hudi.client.BaseHoodieTableServiceClient#commitCompaction` 
(technically engine agnostic, but based on comment my hunch is that its mainly 
intended for spark use case)
   We could add the checks to both of those APIs, but at first glance I'm not 
sure if that would actually catch all "concurrent compact attempt" edge cases. 
For example, if 
   What are your thoughts?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to