[jira] [Updated] (HUDI-7361) Fix a concurrency issue caused by rollbackFailedWrites

eric (Jira) Tue, 30 Jan 2024 23:40:09 -0800


     [ 
https://issues.apache.org/jira/browse/HUDI-7361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


eric updated HUDI-7361:
-----------------------
    Description: 
{quote}CREATE TABLE tbl (
......
) WITH (
'connector' = 'hudi',
'path' = '/tblpath',
'table.type' = 'COPY_ON_WRITE',
'write.bucket_assign.tasks'='5',
'write.operation'='insert',
'write.tasks'='5', 
'clustering.schedule.enabled'='true',
'clustering.async.enabled'='true',
'clustering.delta_commits'='3',
'clustering.tasks'='5',
'hoodie.cleaner.policy.failed.writes'='LAZY'
);
{quote}
*Table parameters are as above*

 

*From jbmanager and taskmanager log, we can summarize the process of abnormal 
triggering:* 


before the writeClient complete the commit 20240126154725671, the clean table 
service starts to work, and the failed Writes rollback needs to be checked and 
completed during the clean process. 

This method will verify whether the heartbeats of all inflight instants are 
overtime and rollback which instants have overtime heartbeats. At the same 
time, the write client has completed the commit 20240126154725671 and deleted 
the heartbeat file of this instant. 

The clean table service client obtained the last heartbeat of 0, so it rolled 
back this instant.

> Fix a concurrency issue caused by rollbackFailedWrites
> ------------------------------------------------------
>
>                 Key: HUDI-7361
>                 URL: https://issues.apache.org/jira/browse/HUDI-7361
>             Project: Apache Hudi
>          Issue Type: Bug
>          Components: writer-core
>            Reporter: eric
>            Priority: Major
>         Attachments: jobmanager_log.txt, taskmanager_log.txt
>
>
> {quote}CREATE TABLE tbl (
> ......
> ) WITH (
> 'connector' = 'hudi',
> 'path' = '/tblpath',
> 'table.type' = 'COPY_ON_WRITE',
> 'write.bucket_assign.tasks'='5',
> 'write.operation'='insert',
> 'write.tasks'='5', 
> 'clustering.schedule.enabled'='true',
> 'clustering.async.enabled'='true',
> 'clustering.delta_commits'='3',
> 'clustering.tasks'='5',
> 'hoodie.cleaner.policy.failed.writes'='LAZY'
> );
> {quote}
> *Table parameters are as above*
>  
> *From jbmanager and taskmanager log, we can summarize the process of abnormal 
> triggering:* 
> before the writeClient complete the commit 20240126154725671, the clean table 
> service starts to work, and the failed Writes rollback needs to be checked 
> and completed during the clean process. 
> This method will verify whether the heartbeats of all inflight instants are 
> overtime and rollback which instants have overtime heartbeats. At the same 
> time, the write client has completed the commit 20240126154725671 and deleted 
> the heartbeat file of this instant. 
> The clean table service client obtained the last heartbeat of 0, so it rolled 
> back this instant.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-7361) Fix a concurrency issue caused by rollbackFailedWrites

Reply via email to