[
https://issues.apache.org/jira/browse/HIVE-28366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18083085#comment-18083085
]
yongzhi.shao edited comment on HIVE-28366 at 5/23/26 3:47 PM:
--------------------------------------------------------------
[~dkuzmenko]
Sir, although this is how we currently proceed, I strongly suspect that we may
also need `exclusive write` locks when executing update/delete/merge into
semantics in parallel. If two tasks were to execute updates concurrently, their
results might overwrite each other. The situation becomes even more complex
with MERGE INTO. Do we need to consider expanding the scope of `exclusive
write` locks?
For example, Client A updates the record with ID=1 to 2. However, since we use
COW (Copy-on-Write) mode, the datafile is rewritten from D1 to D2.
Client B also reads D1 in parallel, updates the record with ID=3 to 31, and
similarly rewrites a D2' file.
Assuming D2' gets committed first, then from Client A's perspective, it appears
to have successfully updated the data for ID=1, but after a while, the data for
ID=1 is inexplicably restored to its original value.
was (Author: lisoda):
[~dkuzmenko]
Sir, although this is how we currently proceed, I strongly suspect that we may
also need `exclusive write` locks when executing update/delete/merge into
semantics in parallel. If two tasks were to execute updates concurrently, their
results might overwrite each other. The situation becomes even more complex
with MERGE INTO. Do we need to consider expanding the scope of `exclusive
write` locks?
> Iceberg: Concurrent Insert and IOW produce incorrect result
> ------------------------------------------------------------
>
> Key: HIVE-28366
> URL: https://issues.apache.org/jira/browse/HIVE-28366
> Project: Hive
> Issue Type: Bug
> Components: Iceberg integration
> Affects Versions: 4.0.0
> Reporter: Denys Kuzmenko
> Assignee: Denys Kuzmenko
> Priority: Major
> Labels: pull-request-available
> Fix For: 4.1.0
>
>
> 1. create a table and insert some data:
> {code}
> create table ice_t (i int, p int) partitioned by spec (truncate(10, i))
> stored by iceberg;
> insert into ice_t values (1, 1), (2, 2);
> insert into ice_t values (10, 10), (20, 20);
> insert into ice_t values (40, 40), (30, 30);
> {code}
> Then concurrently execute the following jobs:
> Job 1:
> {code}
> insert into ice_t select i*100, p*100 from ice_t;
> {code}
> Job 2:
> {code}
> insert overwrite ice_t select i+1, p+1 from ice_t;
> {code}
> If Job 1 finishes first, Job 2 still succeeds for me, and after that the
> table content will be the following:
> {code}
> 2 2
> 3 3
> 11 11
> 21 21
> 31 31
> 41 41
> 100 100
> 200 200
> 1000 1000
> 2000 2000
> 3000 3000
> 4000 4000
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)