Marta Kuczora created HIVE-22969: ------------------------------------ Summary: Union remove optimisation results incorrect data when inserting to ACID table Key: HIVE-22969 URL: https://issues.apache.org/jira/browse/HIVE-22969 Project: Hive Issue Type: Bug Affects Versions: 4.0.0 Reporter: Marta Kuczora Assignee: Marta Kuczora
Steps to reproduce the issue: {noformat} create table input_text(key string, val string) stored as textfile location '/Users/martakuczora/work/hive/warehouse/external/input_text'; create table output_acid(key string, val string) stored as orc tblproperties('transactional'='true'); insert into input_text values ('1','1'), ('2','2'),('3','3'); {noformat} {noformat} set hive.mapred.mode=nonstrict; set hive.stats.autogather=false; set hive.optimize.union.remove=true; set hive.auto.convert.join=true; set hive.exec.submitviachild=false; set hive.exec.submit.local.task.via.child=false; SELECT * FROM ( select key, val from input_text union all select a.key as key, b.val as val FROM input_text a join input_text b on a.key=b.key) c; The result of the select: 1 1 2 2 3 3 1 1 2 2 3 3 {noformat} {noformat} insert into table output_acid SELECT * FROM ( select key, val from input_text union all select a.key as key, b.val as val FROM input_text a join input_text b on a.key=b.key) c; select * from output_acid; The result: 1 1 2 2 3 3 {noformat} The folder of the output_acid table contained the following delta directories: {noformat} drwxr-xr-x 6 martakuczora staff 192 Mar 2 16:29 delta_0000000_0000000 drwxr-xr-x 6 martakuczora staff 192 Mar 2 16:29 delta_0000001_0000001_0001 {noformat} It can be seen that the statement ID from the first directory is missing and when the select statements runs on the table, this directory will be ignored. That's why only half of the data got returned when running the select on the output_acid table. If either hive.stats.autogather is set to true or hive.optimize.union.remove is set to false the result of the insert will be correct. In this case there will be only 1 delta directory in the table's folder. -- This message was sent by Atlassian Jira (v8.3.4#803005)