Csaba Ringhofer created IMPALA-14096:
----------------------------------------

             Summary: Writing non-UTF8 partition values can lead to dirty writes
                 Key: IMPALA-14096
                 URL: https://issues.apache.org/jira/browse/IMPALA-14096
             Project: IMPALA
          Issue Type: Bug
            Reporter: Csaba Ringhofer


{code}
create table tspart (s string) partitioned by (p string);
insert into tspart partition (p="a") values ("a");
insert into tspart partition (p="aa") values ("aa");
-- s is not valid utf8
insert into tspart partition (p="a") values (unhex("aa"));
-- insert the table again but swap p and s, so one partition will be unhex("aa")
insert into tspart partition (p) select p s_, concat(s, "a")  p_ from tspart;
-- leads to error:
2025-05-26 11:47:03 [Exception]  ERROR: Query da440f13f21ab301:79918f1100000000 
failed:
Error(s) moving partition files. First error (of 1) was: Hdfs op (RENAME 
hdfs://localhost:20500/test-warehouse/tspart/_impala_insert_staging/da440f13f21ab301_79918f1100000000/.da440f13f21ab301-79918f1100000002_588063374_dir/p=�a/da440f13f21ab301-79918f1100000002_782687841_data.0.txt
 TO 
hdfs://localhost:20500/test-warehouse/tspart/p=�a/da440f13f21ab301-79918f1100000002_782687841_data.0.txt)
 failed, error was: 
hdfs://localhost:20500/test-warehouse/tspart/_impala_insert_staging/da440f13f21ab301_79918f1100000000/.da440f13f21ab301-79918f1100000002_588063374_dir/p=�a/da440f13f21ab301-79918f1100000002_782687841_data.0.txt
Error(5): Input/output error

select count(*) from tspart;
-- result: 3, the table looks unchanged
refresh tspart;
select count(*) from tspart;
-- result: 4, because an extra file was found by refresh
{code}

While dirty writes is a known issue in non transactional tables, reproducing it 
so easily should be avoided if possible. The problem in this case is that the 
error comes when moving the files, so some files can be already moved to their 
final destination. Detecting the problematic partition names earlier could 
ensure that files written for other partitions are not moved out of staging dir.
https://github.com/apache/impala/blob/f4e75510948bdb72f2d5206161fee12e5b6d0888/be/src/runtime/dml-exec-state.cc#L341



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to