[ 
https://issues.apache.org/jira/browse/HIVE-23114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marta Kuczora updated HIVE-23114:
---------------------------------
    Description: 
This is a follow-up Jira for the 
[conversation|https://issues.apache.org/jira/browse/HIVE-21164?focusedCommentId=17059280&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17059280]
 in HIVE-21164
 Doing an insert overwrite from a multi-insert statement with dynamic 
partitioning will give wrong results for ACID tables when 
'hive.acid.direct.insert.enabled' is true or for insert-only tables.

Reproduction:
{noformat}
set hive.acid.direct.insert.enabled=true;
set hive.support.concurrency=true;
set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
set hive.vectorized.execution.enabled=false;
set hive.stats.autogather=false;

create external table multiinsert_test_text (a int, b int, c int) stored as 
textfile;
insert into multiinsert_test_text values (1111, 11, 1111), (2222, 22, 1111), 
(3333, 33, 2222), (4444, 44, NULL), (5555, 55, NULL);

create table multiinsert_test_acid (a int, b int) partitioned by (c int) stored 
as orc tblproperties('transactional'='true');
create table multiinsert_test_mm (a int, b int) partitioned by (c int) stored 
as orc tblproperties('transactional'='true', 
'transactional_properties'='insert_only');

from multiinsert_test_text a
insert overwrite table multiinsert_test_acid partition (c)
select
 a.a,
 a.b,
 a.c
 where a.c is not null
insert overwrite table multiinsert_test_acid partition (c)
select
 a.a,
 a.b,
 a.c
where a.c is null;
select * from multiinsert_test_acid;

from multiinsert_test_text a
insert overwrite table multiinsert_test_mm partition (c)
select
 a.a,
 a.b,
 a.c
 where a.c is not null
insert overwrite table multiinsert_test_mm partition (c)
select
 a.a,
 a.b,
 a.c
where a.c is null;
select * from multiinsert_test_mm;
{noformat}
The result of these steps can be different, it depends on the execution order 
of the FileSinkOperators of the insert overwrite statements. It can happen that 
an error occurs due to manifest file collision, it can happen that no error 
occurs but the result will be incorrect.
 Running the same insert query with an external table of with and ACID table 
with 'hive.acid.direct.insert.enabled=false' will give the follwing result:
{noformat}
1111    11      1111
2222    22      1111
3333    33      2222
4444    44      NULL
5555    55      NULL
{noformat}

> Insert overwrite with dynamic partitioning is not working correctly with 
> direct insert
> --------------------------------------------------------------------------------------
>
>                 Key: HIVE-23114
>                 URL: https://issues.apache.org/jira/browse/HIVE-23114
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Marta Kuczora
>            Assignee: Marta Kuczora
>            Priority: Major
>
> This is a follow-up Jira for the 
> [conversation|https://issues.apache.org/jira/browse/HIVE-21164?focusedCommentId=17059280&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17059280]
>  in HIVE-21164
>  Doing an insert overwrite from a multi-insert statement with dynamic 
> partitioning will give wrong results for ACID tables when 
> 'hive.acid.direct.insert.enabled' is true or for insert-only tables.
> Reproduction:
> {noformat}
> set hive.acid.direct.insert.enabled=true;
> set hive.support.concurrency=true;
> set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
> set hive.vectorized.execution.enabled=false;
> set hive.stats.autogather=false;
> create external table multiinsert_test_text (a int, b int, c int) stored as 
> textfile;
> insert into multiinsert_test_text values (1111, 11, 1111), (2222, 22, 1111), 
> (3333, 33, 2222), (4444, 44, NULL), (5555, 55, NULL);
> create table multiinsert_test_acid (a int, b int) partitioned by (c int) 
> stored as orc tblproperties('transactional'='true');
> create table multiinsert_test_mm (a int, b int) partitioned by (c int) stored 
> as orc tblproperties('transactional'='true', 
> 'transactional_properties'='insert_only');
> from multiinsert_test_text a
> insert overwrite table multiinsert_test_acid partition (c)
> select
>  a.a,
>  a.b,
>  a.c
>  where a.c is not null
> insert overwrite table multiinsert_test_acid partition (c)
> select
>  a.a,
>  a.b,
>  a.c
> where a.c is null;
> select * from multiinsert_test_acid;
> from multiinsert_test_text a
> insert overwrite table multiinsert_test_mm partition (c)
> select
>  a.a,
>  a.b,
>  a.c
>  where a.c is not null
> insert overwrite table multiinsert_test_mm partition (c)
> select
>  a.a,
>  a.b,
>  a.c
> where a.c is null;
> select * from multiinsert_test_mm;
> {noformat}
> The result of these steps can be different, it depends on the execution order 
> of the FileSinkOperators of the insert overwrite statements. It can happen 
> that an error occurs due to manifest file collision, it can happen that no 
> error occurs but the result will be incorrect.
>  Running the same insert query with an external table of with and ACID table 
> with 'hive.acid.direct.insert.enabled=false' will give the follwing result:
> {noformat}
> 1111    11      1111
> 2222    22      1111
> 3333    33      2222
> 4444    44      NULL
> 5555    55      NULL
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to