[jira] (HIVE-28366) Iceberg: Concurrent Insert and IOW produce incorrect result

yongzhi.shao (Jira) Sat, 23 May 2026 08:54:07 -0700


    [ https://issues.apache.org/jira/browse/HIVE-28366 ]



    yongzhi.shao deleted comment on HIVE-28366:
    -------------------------------------

was (Author: lisoda):
[~dkuzmenko] 

Sir, although this is how we currently proceed, I strongly suspect that we may 
also need `exclusive write` locks when executing merge into semantics in 
parallel. If two tasks were to execute updates concurrently, their results 
might overwrite each other. The situation becomes even more complex with MERGE 
INTO. Do we need to consider expanding the scope of `exclusive write` locks?

 
For example, Client A updates(by use mergeinto or else) the record with ID=1 to 
2. However, since we use COW (Copy-on-Write) mode, the datafile is rewritten 
from D1 to D2.
Client B also reads D1 in parallel, updates(by use mergeinto or else) the 
record with ID=3 to 31, and similarly rewrites a D2' file.
Assuming D2' gets committed finally, then from Client A's perspective, it 
appears to have successfully updated the data for ID=1, but after a while, the 
data for ID=1 is inexplicably restored to its original value.

> Iceberg: Concurrent Insert and IOW produce incorrect result 
> ------------------------------------------------------------
>
>                 Key: HIVE-28366
>                 URL: https://issues.apache.org/jira/browse/HIVE-28366
>             Project: Hive
>          Issue Type: Bug
>          Components: Iceberg integration
>    Affects Versions: 4.0.0
>            Reporter: Denys Kuzmenko
>            Assignee: Denys Kuzmenko
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 4.1.0
>
>
> 1. create a table and insert some data:
> {code}
> create table ice_t (i int, p int) partitioned by spec (truncate(10, i)) 
> stored by iceberg;
> insert into ice_t values (1, 1), (2, 2);
> insert into ice_t values (10, 10), (20, 20);
> insert into ice_t values (40, 40), (30, 30);
> {code}
> Then concurrently execute the following jobs:
> Job 1:
> {code}
> insert into ice_t select i*100, p*100 from ice_t;
> {code}
> Job 2:
> {code}
> insert overwrite ice_t select i+1, p+1 from ice_t;
> {code}
> If Job 1 finishes first, Job 2 still succeeds for me, and after that the 
> table content will be the following:
> {code}
> 2      2
> 3      3
> 11     11
> 21     21
> 31     31
> 41     41
> 100    100
> 200    200
> 1000   1000
> 2000   2000
> 3000   3000
> 4000   4000
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] (HIVE-28366) Iceberg: Concurrent Insert and IOW produce incorrect result

Reply via email to