[jira] [Updated] (HUDI-7503) concurrent executions of compaction plan should not corrupt dataset

Krishen Bhan (Jira) Thu, 14 Mar 2024 17:52:40 -0700


     [ 
https://issues.apache.org/jira/browse/HUDI-7503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Krishen Bhan updated HUDI-7503:
-------------------------------
    Description: 
Currently it is not safe for 2+ writers to concurrently call 
`org.apache.hudi.client.BaseHoodieTableServiceClient#compact` on the same 
compaction instant. This is since one writer might execute the instant and 
create an inflight, while the other writer sees the inflight and tries to roll 
it back before re-attempting to execute it (since it will assume said inflight 
was a previously failed compaction attempt).

This logic should be updated such that only one writer will actually execute 
the compaction plan at a time (and the others will fail/abort).

One approach is to use a transaction (base table lock) in conjunction with 
heartbeating, to ensure that the writer triggers a heartbeat before executing 
compaction, and any concurrent writers will use the heartbeat to check wether 
the compaction is currently being executed by another writer. Specifically , 
the compact API should execute the following steps
 # Get the instant to compact C (as usual)
 # Start a transaction
 # Checks if C has an active heartbeat, if so finish transaction and throw 
exception
 # Start a heartbeat for C (this will implicitly re-start the heartbeat if it 
has been started before by another job)
 # Finish transaction
 # Run the existing compact API logic on C 
 # If execution succeeds, clean up heartbeat file . If it fails do nothing (as 
the heartbeat will anyway be automatically expired later).

Note that this approach only holds the table lock temporarily, when 
checking/starting the heartbeat

Also, this flow can be applied to execution of clean plans and other table 
services

  was:
Currently it is not safe for 2+ writers to concurrently call 
`org.apache.hudi.client.BaseHoodieTableServiceClient#compact` on the same 
compaction instant. This is since one writer might execute the instant and 
create an inflight, while the other writer sees the inflight and tries to roll 
it back before re-attempting to execute it (since it will assume said inflight 
was a previously failed compaction attempt).

This logic should be updated such that only one writer will actually execute 
the compaction plan at a time (and the others will fail/abort).

One approach is to use a transaction (base table lock) in conjunction with 
heartbeating, to ensure that the writer triggers a heartbeat before executing 
compaction, and any concurrent writers will use the heartbeat to check wether 
the compaction is currently being executed by another writer. Specifically , 
the compact API should execute the following steps
 # Get the instant to compact C (as usual)
 # Start a transaction
 # Checks if C has an active heartbeat, if so finish transaction and throw 
exception
 # Start a heartbeat for C (this will implicitly re-start the heartbeat if it 
has been started before by another job)
 # Finish transaction
 # Run the existing compact API logic on C 
 # If execution succeeds, clean up heartbeat file . If it fails do nothing (as 
the heartbeat will anyway be automatically expired later).

Note that this approach only holds the table lock temporarily, when 
checking/starting the heartbeat

 


> concurrent executions of compaction plan should not corrupt dataset
> -------------------------------------------------------------------
>
>                 Key: HUDI-7503
>                 URL: https://issues.apache.org/jira/browse/HUDI-7503
>             Project: Apache Hudi
>          Issue Type: Improvement
>            Reporter: Krishen Bhan
>            Priority: Minor
>
> Currently it is not safe for 2+ writers to concurrently call 
> `org.apache.hudi.client.BaseHoodieTableServiceClient#compact` on the same 
> compaction instant. This is since one writer might execute the instant and 
> create an inflight, while the other writer sees the inflight and tries to 
> roll it back before re-attempting to execute it (since it will assume said 
> inflight was a previously failed compaction attempt).
> This logic should be updated such that only one writer will actually execute 
> the compaction plan at a time (and the others will fail/abort).
> One approach is to use a transaction (base table lock) in conjunction with 
> heartbeating, to ensure that the writer triggers a heartbeat before executing 
> compaction, and any concurrent writers will use the heartbeat to check wether 
> the compaction is currently being executed by another writer. Specifically , 
> the compact API should execute the following steps
>  # Get the instant to compact C (as usual)
>  # Start a transaction
>  # Checks if C has an active heartbeat, if so finish transaction and throw 
> exception
>  # Start a heartbeat for C (this will implicitly re-start the heartbeat if it 
> has been started before by another job)
>  # Finish transaction
>  # Run the existing compact API logic on C 
>  # If execution succeeds, clean up heartbeat file . If it fails do nothing 
> (as the heartbeat will anyway be automatically expired later).
> Note that this approach only holds the table lock temporarily, when 
> checking/starting the heartbeat
> Also, this flow can be applied to execution of clean plans and other table 
> services



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-7503) concurrent executions of compaction plan should not corrupt dataset

Reply via email to