[jira] [Updated] (HUDI-7503) Concurrent executions of table service plan should not corrupt dataset

ASF GitHub Bot (Jira) Fri, 05 Apr 2024 12:31:08 -0700


     [ 
https://issues.apache.org/jira/browse/HUDI-7503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


ASF GitHub Bot updated HUDI-7503:
---------------------------------
    Labels: pull-request-available  (was: )

> Concurrent executions of table service plan should not corrupt dataset
> ----------------------------------------------------------------------
>
>                 Key: HUDI-7503
>                 URL: https://issues.apache.org/jira/browse/HUDI-7503
>             Project: Apache Hudi
>          Issue Type: Improvement
>          Components: compaction, table-service
>            Reporter: Krishen Bhan
>            Priority: Minor
>              Labels: pull-request-available
>             Fix For: 0.15.0, 1.0.0
>
>
> Some external workflow schedulers can accidentally (or) misbehave and 
> schedule duplicate executions of the same compaction plan. We need a way to 
> guard against this inside Hudi (vs user taking a lock externally). In such a 
> world,  2 instance of the job concurrently call 
> `org.apache.hudi.client.BaseHoodieTableServiceClient#compact` on the same 
> compaction instant. 
> This is since one writer might execute the instant and create an inflight, 
> while the other writer sees the inflight and tries to roll it back before 
> re-attempting to execute it (since it will assume said inflight was a 
> previously failed compaction attempt).
> This logic should be updated such that only one writer will actually execute 
> the compaction plan at a time (and the others will fail/abort).
> One approach is to use a transaction (base table lock) in conjunction with 
> heartbeating, to ensure that the writer triggers a heartbeat before executing 
> compaction, and any concurrent writers will use the heartbeat to check wether 
> the compaction is currently being executed by another writer. Specifically , 
> the compact API should execute the following steps
>  # Get the instant to compact C (as usual)
>  # Start a transaction
>  # Checks if C has an active heartbeat, if so finish transaction and throw 
> exception
>  # Start a heartbeat for C (this will implicitly re-start the heartbeat if it 
> has been started before by another job)
>  # Finish transaction
>  # Run the existing compact API logic on C 
>  # If execution succeeds, clean up heartbeat file . If it fails do nothing 
> (as the heartbeat will anyway be automatically expired later).
> Note that this approach only holds the table lock temporarily, when 
> checking/starting the heartbeat
> Also, this flow can be applied to execution of clean plans and other table 
> services



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-7503) Concurrent executions of table service plan should not corrupt dataset

Reply via email to