[ 
https://issues.apache.org/jira/browse/HIVE-21164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17060373#comment-17060373
 ] 

Marta Kuczora commented on HIVE-21164:
--------------------------------------

Hi [~glapark],

I was trying to reproduce the issue. I can reproduce it only if the result of 
the select in the last "INSERT OVERWRITE" part is empty. With your example 
query:
{noformat}
from tpcds_text_2.store_sales ss
insert overwrite table store_sales partition (ss_sold_date_sk)
select
 ss.ss_sold_time_sk,
...
 ss.ss_net_profit,
 ss.ss_sold_date_sk
 where ss.ss_sold_date_sk is not null
insert overwrite table store_sales partition (ss_sold_date_sk)
select
 ss.ss_sold_time_sk,
...
 ss.ss_net_profit,
 ss.ss_sold_date_sk
 where ss.ss_sold_date_sk is null
 sort by ss.ss_sold_date_sk
;
{noformat}
If the result of the
{noformat}
select
 ss.ss_sold_time_sk,
 ...
 ss.ss_net_profit,
 ss.ss_sold_date_sk
 where ss.ss_sold_date_sk is null
 sort by ss.ss_sold_date_sk
{noformat}
is empty then the store_sales will be also empty after the query is finished.
 Do you know the result of this last part in your environment? Is it possible 
that your table contains no rows with ss_sold_date_sk=null?


 What I see is that with this patch when I do an insert overwrite even with 
dynamic partitioning, the data in the table will be overwritten. Without this 
patch, this didn't happen with dynamic partitioning. When I do two insert 
overwrites without this patch and dynamic partitioning is happening then the 
result will contain the data from both inserts. The second one won't overwrite 
the result of the first one as I would expect. So this behaviour seems to be 
changed in this patch.


 This is a good finding actually, thanks a lot for bringing it to my attention. 
I will keep investigating this and will find a fix.
 Until then, you can turn off this feature by setting the 
'hive.acid.direct.insert.enabled' config parameter to false. With this, the 
insert will happens just as before this patch. Or you can also try to do an 
INSERT instead of the INSERT OVERWRITE.

> ACID: explore how we can avoid a move step during inserts/compaction
> --------------------------------------------------------------------
>
>                 Key: HIVE-21164
>                 URL: https://issues.apache.org/jira/browse/HIVE-21164
>             Project: Hive
>          Issue Type: Bug
>          Components: Transactions
>    Affects Versions: 3.1.1
>            Reporter: Vaibhav Gumashta
>            Assignee: Marta Kuczora
>            Priority: Major
>             Fix For: 4.0.0
>
>         Attachments: HIVE-21164.1.patch, HIVE-21164.10.patch, 
> HIVE-21164.11.patch, HIVE-21164.11.patch, HIVE-21164.12.patch, 
> HIVE-21164.13.patch, HIVE-21164.14.patch, HIVE-21164.14.patch, 
> HIVE-21164.15.patch, HIVE-21164.16.patch, HIVE-21164.17.patch, 
> HIVE-21164.18.patch, HIVE-21164.19.patch, HIVE-21164.2.patch, 
> HIVE-21164.20.patch, HIVE-21164.21.patch, HIVE-21164.22.patch, 
> HIVE-21164.3.patch, HIVE-21164.4.patch, HIVE-21164.5.patch, 
> HIVE-21164.6.patch, HIVE-21164.7.patch, HIVE-21164.8.patch, HIVE-21164.9.patch
>
>
> Currently, we write compacted data to a temporary location and then move the 
> files to a final location, which is an expensive operation on some cloud file 
> systems. Since HIVE-20823 is already in, it can control the visibility of 
> compacted data for the readers. Therefore, we can perhaps avoid writing data 
> to a temporary location and directly write compacted data to the intended 
> final path.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to