Csaba Ringhofer created IMPALA-14096: ----------------------------------------
Summary: Writing non-UTF8 partition values can lead to dirty writes Key: IMPALA-14096 URL: https://issues.apache.org/jira/browse/IMPALA-14096 Project: IMPALA Issue Type: Bug Reporter: Csaba Ringhofer {code} create table tspart (s string) partitioned by (p string); insert into tspart partition (p="a") values ("a"); insert into tspart partition (p="aa") values ("aa"); -- s is not valid utf8 insert into tspart partition (p="a") values (unhex("aa")); -- insert the table again but swap p and s, so one partition will be unhex("aa") insert into tspart partition (p) select p s_, concat(s, "a") p_ from tspart; -- leads to error: 2025-05-26 11:47:03 [Exception] ERROR: Query da440f13f21ab301:79918f1100000000 failed: Error(s) moving partition files. First error (of 1) was: Hdfs op (RENAME hdfs://localhost:20500/test-warehouse/tspart/_impala_insert_staging/da440f13f21ab301_79918f1100000000/.da440f13f21ab301-79918f1100000002_588063374_dir/p=�a/da440f13f21ab301-79918f1100000002_782687841_data.0.txt TO hdfs://localhost:20500/test-warehouse/tspart/p=�a/da440f13f21ab301-79918f1100000002_782687841_data.0.txt) failed, error was: hdfs://localhost:20500/test-warehouse/tspart/_impala_insert_staging/da440f13f21ab301_79918f1100000000/.da440f13f21ab301-79918f1100000002_588063374_dir/p=�a/da440f13f21ab301-79918f1100000002_782687841_data.0.txt Error(5): Input/output error select count(*) from tspart; -- result: 3, the table looks unchanged refresh tspart; select count(*) from tspart; -- result: 4, because an extra file was found by refresh {code} While dirty writes is a known issue in non transactional tables, reproducing it so easily should be avoided if possible. The problem in this case is that the error comes when moving the files, so some files can be already moved to their final destination. Detecting the problematic partition names earlier could ensure that files written for other partitions are not moved out of staging dir. https://github.com/apache/impala/blob/f4e75510948bdb72f2d5206161fee12e5b6d0888/be/src/runtime/dml-exec-state.cc#L341 -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org