kbuci commented on PR #18012: URL: https://github.com/apache/hudi/pull/18012#issuecomment-3846043607
@nsivabalan While checking failing UTs, I noticed a complication. In some places the client calls `compact` with auto commit as false, and then later calls one of the "commit compaction" APIs. At first I thought this was just for UTs, but I realized that this is actually a legitimate way users can execute a compaction In spark I see some usages like `org.apache.hudi.utilities.HoodieCompactor#doCompact` `org.apache.spark.sql.hudi.command.procedures.RunCompactionProcedure#call` And for flink specifically as well `org.apache.hudi.sink.v2.compact.CompactionCommitSinkV2#commitIfNecessary` Because of this, for now I am thinking that we can just have this heatbeat guard if auto-complete/commit is enabled? The alternative is that we change the heartbeating logic such that it starts and closes the heartbeat twice - When calling the `compact` API it starts a heartbeat (within a transaction) at beginning , and then stops it before it exits the function (or auto-commits) - When calling any of the APIs to commit a compaction, we again start a heartbeat (within a transaction) and then stop it after completing the compaction commit The issue though is that this adds some code complexity since there seem to be multiple APIs that users can directly call for committing a compaction: - `org.apache.hudi.client.BaseHoodieWriteClient#completeCompaction` (mainly used by Flink) - `org.apache.hudi.client.BaseHoodieTableServiceClient#commitCompaction` (technically engine agnostic, but based on comment my hunch is that its mainly intended for spark use case) We could add the checks to both of those APIs, but at first glance I'm not sure if that would actually catch all "concurrent compact attempt" edge cases. For example, if What are your thoughts? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
