[ 
https://issues.apache.org/jira/browse/HUDI-6719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Y Ethan Guo updated HUDI-6719:
------------------------------
    Fix Version/s: 1.0.2

> Fix data inconsistency issues caused by concurrent clustering and delete 
> partition.
> -----------------------------------------------------------------------------------
>
>                 Key: HUDI-6719
>                 URL: https://issues.apache.org/jira/browse/HUDI-6719
>             Project: Apache Hudi
>          Issue Type: Bug
>          Components: clustering, table-service
>            Reporter: Ma Jian
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.0.2
>
>
> Related issue: https://issues.apache.org/jira/browse/HUDI-5553
> The specific problem is that when concurrent replace commit operations are 
> executed, two replace commits may point to the same file ID, resulting in a 
> duplicate key error. The existing issue solved the problem of scheduling 
> delete partition while there are pending clustering or compaction operations, 
> which will be prevented in this case. However, this solution is not perfect 
> and may still cause data inconsistency if a clustering plan is scheduled 
> before the delete partition is committed. Because validation is one-way.In 
> this case, both replace commits will still contain duplicate keys, and the 
> table will become inconsistent when both plans are committed. This is very 
> fatal, and there are other similar scenarios that may bypass the validation 
> of the existing issue. Moreover, the existing issue is at the partition level 
> and is not precise enough.
> Here is my solution:
> !https://intranetproxy.alipay.com/skylark/lark/0/2023/png/62256341/1692328998008-f9dc6530-e44e-43e7-9b75-d760b55b3dfa.png|width=335,id=WXCCX!
> As shown in the figure, both drop partition and clustering will go through a 
> period of time that is not registered to the timeline, which is the scenario 
> that the previous issue did not solve. Here, I register the replace file IDs 
> involved in each replace commit to the active timeline (the replace commit 
> timeline that has been submitted has saved partitionToReplaceFileIds, and 
> only pending requests need to be processed). Since in the case of Spark SQL, 
> delete partition creates a requested commit in advance during write, which is 
> inconvenient to handle, I save the pending replace commit's 
> partitionToReplaceFileIds information to the inflight commit's extra 
> metadata. Therefore, each time drop partition or clustering is executed, it 
> only needs to read the partitionToReplaceFileIds information in the timeline 
> after ensuring that the inflight commit information has been saved to the 
> timeline to ensure that there are no duplicate file IDs and prevent this kind 
> of error from occurring.
> In simple terms, each replace commit will register the replace file ID 
> information to the timeline whether it is submitted or not, at the same time, 
> each submission will check this information to ensure that it will not be 
> repeated, so that any replace commit containing this file ID will be 
> prevented, ensuring that there are no duplicate keys.
> When this idea is also implemented on the compaction commit, the modification 
> involved in the related issue can be removed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to