[ 
https://issues.apache.org/jira/browse/HUDI-7503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krishen Bhan updated HUDI-7503:
-------------------------------
    Summary: concurrent executions of table service plan should not corrupt 
dataset  (was: concurrent executions of compaction plan should not corrupt 
dataset)

> concurrent executions of table service plan should not corrupt dataset
> ----------------------------------------------------------------------
>
>                 Key: HUDI-7503
>                 URL: https://issues.apache.org/jira/browse/HUDI-7503
>             Project: Apache Hudi
>          Issue Type: Improvement
>          Components: compaction, table-service
>            Reporter: Krishen Bhan
>            Priority: Minor
>
> Currently it is not safe for 2+ writers to concurrently call 
> `org.apache.hudi.client.BaseHoodieTableServiceClient#compact` on the same 
> compaction instant. This is since one writer might execute the instant and 
> create an inflight, while the other writer sees the inflight and tries to 
> roll it back before re-attempting to execute it (since it will assume said 
> inflight was a previously failed compaction attempt).
> This logic should be updated such that only one writer will actually execute 
> the compaction plan at a time (and the others will fail/abort).
> One approach is to use a transaction (base table lock) in conjunction with 
> heartbeating, to ensure that the writer triggers a heartbeat before executing 
> compaction, and any concurrent writers will use the heartbeat to check wether 
> the compaction is currently being executed by another writer. Specifically , 
> the compact API should execute the following steps
>  # Get the instant to compact C (as usual)
>  # Start a transaction
>  # Checks if C has an active heartbeat, if so finish transaction and throw 
> exception
>  # Start a heartbeat for C (this will implicitly re-start the heartbeat if it 
> has been started before by another job)
>  # Finish transaction
>  # Run the existing compact API logic on C 
>  # If execution succeeds, clean up heartbeat file . If it fails do nothing 
> (as the heartbeat will anyway be automatically expired later).
> Note that this approach only holds the table lock temporarily, when 
> checking/starting the heartbeat
> Also, this flow can be applied to execution of clean plans and other table 
> services



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to