[
https://issues.apache.org/jira/browse/HUDI-7503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Kate Huber updated HUDI-7503:
-----------------------------
Fix Version/s: 1.1.0
(was: 1.0.0)
> Concurrent executions of table service plan should not corrupt dataset
> ----------------------------------------------------------------------
>
> Key: HUDI-7503
> URL: https://issues.apache.org/jira/browse/HUDI-7503
> Project: Apache Hudi
> Issue Type: Improvement
> Components: compaction, table-service
> Reporter: Krishen Bhan
> Assignee: sivabalan narayanan
> Priority: Minor
> Labels: pull-request-available
> Fix For: 0.16.0, 1.1.0
>
>
> Some external workflow schedulers can accidentally (or) misbehave and
> schedule duplicate executions of the same compaction plan. We need a way to
> guard against this inside Hudi (vs user taking a lock externally). In such a
> world, 2 instance of the job concurrently call
> `org.apache.hudi.client.BaseHoodieTableServiceClient#compact` on the same
> compaction instant.
> This is since one writer might execute the instant and create an inflight,
> while the other writer sees the inflight and tries to roll it back before
> re-attempting to execute it (since it will assume said inflight was a
> previously failed compaction attempt).
> This logic should be updated such that only one writer will actually execute
> the compaction plan at a time (and the others will fail/abort).
> One approach is to use a transaction (base table lock) in conjunction with
> heartbeating, to ensure that the writer triggers a heartbeat before executing
> compaction, and any concurrent writers will use the heartbeat to check wether
> the compaction is currently being executed by another writer. Specifically ,
> the compact API should execute the following steps
> # Get the instant to compact C (as usual)
> # Start a transaction
> # Checks if C has an active heartbeat, if so finish transaction and throw
> exception
> # Start a heartbeat for C (this will implicitly re-start the heartbeat if it
> has been started before by another job)
> # Finish transaction
> # Run the existing compact API logic on C
> # If execution succeeds, clean up heartbeat file . If it fails do nothing
> (as the heartbeat will anyway be automatically expired later).
> Note that this approach only holds the table lock temporarily, when
> checking/starting the heartbeat
> Also, this flow can be applied to execution of clean plans and other table
> services
--
This message was sent by Atlassian Jira
(v8.20.10#820010)