Marta Kuczora created HIVE-22969:
------------------------------------
Summary: Union remove optimisation results incorrect data when
inserting to ACID table
Key: HIVE-22969
URL: https://issues.apache.org/jira/browse/HIVE-22969
Project: Hive
Issue Type: Bug
Affects Versions: 4.0.0
Reporter: Marta Kuczora
Assignee: Marta Kuczora
Steps to reproduce the issue:
{noformat}
create table input_text(key string, val string) stored as textfile location
'/Users/martakuczora/work/hive/warehouse/external/input_text';
create table output_acid(key string, val string) stored as orc
tblproperties('transactional'='true');
insert into input_text values ('1','1'), ('2','2'),('3','3');
{noformat}
{noformat}
set hive.mapred.mode=nonstrict;
set hive.stats.autogather=false;
set hive.optimize.union.remove=true;
set hive.auto.convert.join=true;
set hive.exec.submitviachild=false;
set hive.exec.submit.local.task.via.child=false;
SELECT * FROM (
select key, val from input_text
union all
select a.key as key, b.val as val FROM input_text a join input_text b on
a.key=b.key) c;
The result of the select:
1 1
2 2
3 3
1 1
2 2
3 3
{noformat}
{noformat}
insert into table output_acid
SELECT * FROM (
select key, val from input_text
union all
select a.key as key, b.val as val FROM input_text a join input_text b on
a.key=b.key) c;
select * from output_acid;
The result:
1 1
2 2
3 3
{noformat}
The folder of the output_acid table contained the following delta directories:
{noformat}
drwxr-xr-x 6 martakuczora staff 192 Mar 2 16:29 delta_0000000_0000000
drwxr-xr-x 6 martakuczora staff 192 Mar 2 16:29 delta_0000001_0000001_0001
{noformat}
It can be seen that the statement ID from the first directory is missing and
when the select statements runs on the table, this directory will be ignored.
That's why only half of the data got returned when running the select on the
output_acid table.
If either hive.stats.autogather is set to true or hive.optimize.union.remove is
set to false the result of the insert will be correct. In this case there will
be only 1 delta directory in the table's folder.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)