[jira] [Comment Edited] (HIVE-28822) Race condition in Hive.mvFile

Urmas Tamassy (Jira) Thu, 12 Mar 2026 05:29:08 -0700


    [ 
https://issues.apache.org/jira/browse/HIVE-28822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18065366#comment-18065366
 ]


Urmas Tamassy edited comment on HIVE-28822 at 3/12/26 12:28 PM:
----------------------------------------------------------------

For reference, this issue as per my understanding was raised for a race 
condition that occurs when move tasks of multiple concurrent statements 
(inserts) try to rename to the same file. One of the move tasks would succeed, 
while the others will fail as the target file would already exist as logged by 
HiveServer2:
{code:java}
Caused by: org.apache.hadoop.fs.FileAlreadyExistsException: Failed to rename 
{source file URI} to {target file URI}; destination file exists {code}
Example reproduction (depends on the fs used likely related to the slowness of 
renames, this was encountered with AWS S3):

repro table:
{code:java}
create external table p_test (a int) partitioned by (b int) stored as parquet 
TBLPROPERTIES ("external.table.purge"="true");{code}
one-liner script to run 30 concurrent beeline sessions inserting to the same 
partition (with 30 sessions, had ~0-4 failures using AWS S3) and collect their 
outputs:
{code:java}
for i in {1..30}; do beeline {connection details} -e "insert into p_test values 
(1,2);" > $i.out 2>&1 & done {code}
For reference the error from beeline usually is:
{code:java}
ERROR : FAILED: Execution Error, return code 40000 from 
org.apache.hadoop.hive.ql.exec.MoveTask. Exception when loading ... {code}


was (Author: pumt99):
For reference, this issue as per my understanding was raised for a race 
condition that occurs when move tasks of multiple concurrent statements 
(inserts) try to rename to the same file. One of the move tasks would succeed, 
while the others will fail as the target file would already exist as logged by 
HiveServer2:

 
{code:java}
Caused by: org.apache.hadoop.fs.FileAlreadyExistsException: Failed to rename 
{source file URI} to {target file URI}; destination file exists {code}
Example reproduction (depends on the fs used likely related to the slowness of 
renames, this was encountered with AWS S3):

 

repro table:

 
{code:java}
create external table p_test (a int) partitioned by (b int) stored as parquet 
TBLPROPERTIES ("external.table.purge"="true");{code}
one-liner script to run 30 concurrent beeline sessions inserting to the same 
partition (with 30 sessions, had ~0-4 failures using AWS S3) and collect their 
outputs:
{code:java}
for i in {1..30}; do beeline {connection details} -e "insert into p_test values 
(1,2);" > $i.out 2>&1 & done {code}
For reference the error from beeline usually is:
{code:java}
ERROR : FAILED: Execution Error, return code 40000 from 
org.apache.hadoop.hive.ql.exec.MoveTask. Exception when loading ... {code}
 

> Race condition in Hive.mvFile
> -----------------------------
>
>                 Key: HIVE-28822
>                 URL: https://issues.apache.org/jira/browse/HIVE-28822
>             Project: Hive
>          Issue Type: Bug
>            Reporter: László Bodor
>            Assignee: László Bodor
>            Priority: Major
>              Labels: pull-request-available
>
> TODO: add description later



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (HIVE-28822) Race condition in Hive.mvFile

Reply via email to