[
https://issues.apache.org/jira/browse/HUDI-7503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ethan Guo updated HUDI-7503:
----------------------------
Component/s: compaction
table-service
> concurrent executions of compaction plan should not corrupt dataset
> -------------------------------------------------------------------
>
> Key: HUDI-7503
> URL: https://issues.apache.org/jira/browse/HUDI-7503
> Project: Apache Hudi
> Issue Type: Improvement
> Components: compaction, table-service
> Reporter: Krishen Bhan
> Priority: Minor
>
> Currently it is not safe for 2+ writers to concurrently call
> `org.apache.hudi.client.BaseHoodieTableServiceClient#compact` on the same
> compaction instant. This is since one writer might execute the instant and
> create an inflight, while the other writer sees the inflight and tries to
> roll it back before re-attempting to execute it (since it will assume said
> inflight was a previously failed compaction attempt).
> This logic should be updated such that only one writer will actually execute
> the compaction plan at a time (and the others will fail/abort).
> One approach is to use a transaction (base table lock) in conjunction with
> heartbeating, to ensure that the writer triggers a heartbeat before executing
> compaction, and any concurrent writers will use the heartbeat to check wether
> the compaction is currently being executed by another writer. Specifically ,
> the compact API should execute the following steps
> # Get the instant to compact C (as usual)
> # Start a transaction
> # Checks if C has an active heartbeat, if so finish transaction and throw
> exception
> # Start a heartbeat for C (this will implicitly re-start the heartbeat if it
> has been started before by another job)
> # Finish transaction
> # Run the existing compact API logic on C
> # If execution succeeds, clean up heartbeat file . If it fails do nothing
> (as the heartbeat will anyway be automatically expired later).
> Note that this approach only holds the table lock temporarily, when
> checking/starting the heartbeat
> Also, this flow can be applied to execution of clean plans and other table
> services
--
This message was sent by Atlassian Jira
(v8.20.10#820010)