[ 
https://issues.apache.org/jira/browse/HIVE-21164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17064788#comment-17064788
 ] 

Sungwoo commented on HIVE-21164:
--------------------------------

I tested with Hive 4 on Tez, and confirm that the same phenomenon occurs. Here 
is the summary of the setup:

- Hive 4 commit: ffee30e6267e85f00a22767262192abb9681cfb7 (HIVE-21164: ACID: 
...), Fri Feb 21
- Tez commit: fd19ce6c93bc1f899ccca7161b0c0407f850bd77 (TEZ-4123. ...), Wed Feb 
12
- hive.acid.direct.insert.enabled set to true
- The warehouse directories reside on S3 (simulated with MinIO), not on HDFS.
- minor changes to tez/pom.xml and hive/pom.xml to fix compilation issues

Result:

0: jdbc:hive2://indigo1:9842/> select * from web_sales limit 100;
No rows selected (99.906 seconds)
0: jdbc:hive2://indigo1:9842/> select count(*) from web_sales;
+----------+
|   _c0    |
+----------+
| 1438883  |
+----------+
1 row selected (0.613 seconds)

If we do not create a transactional table, the result is okay. If we add the 
following line, the resultant table is empty:

TBLPROPERTIES('transactional'='true', 'transactional_properties'='default');

>From the log of HiveServer2, it seems that HiveServer2 deletes the output 
>directories because Utilities.handleDirectInsertTableFinalPath() is called 
>twice:

20/03/23 12:38:20 INFO FileOperations: Deleting 
s3a://hivemr3/warehouse/tpcds_bin_partitioned_orc_2.db/web_sales/ws_sold_date_sk=2451145/base_0000001/bucket_00000_0
 that was not committed


> ACID: explore how we can avoid a move step during inserts/compaction
> --------------------------------------------------------------------
>
>                 Key: HIVE-21164
>                 URL: https://issues.apache.org/jira/browse/HIVE-21164
>             Project: Hive
>          Issue Type: Bug
>          Components: Transactions
>    Affects Versions: 3.1.1
>            Reporter: Vaibhav Gumashta
>            Assignee: Marta Kuczora
>            Priority: Major
>             Fix For: 4.0.0
>
>         Attachments: HIVE-21164.1.patch, HIVE-21164.10.patch, 
> HIVE-21164.11.patch, HIVE-21164.11.patch, HIVE-21164.12.patch, 
> HIVE-21164.13.patch, HIVE-21164.14.patch, HIVE-21164.14.patch, 
> HIVE-21164.15.patch, HIVE-21164.16.patch, HIVE-21164.17.patch, 
> HIVE-21164.18.patch, HIVE-21164.19.patch, HIVE-21164.2.patch, 
> HIVE-21164.20.patch, HIVE-21164.21.patch, HIVE-21164.22.patch, 
> HIVE-21164.3.patch, HIVE-21164.4.patch, HIVE-21164.5.patch, 
> HIVE-21164.6.patch, HIVE-21164.7.patch, HIVE-21164.8.patch, HIVE-21164.9.patch
>
>
> Currently, we write compacted data to a temporary location and then move the 
> files to a final location, which is an expensive operation on some cloud file 
> systems. Since HIVE-20823 is already in, it can control the visibility of 
> compacted data for the readers. Therefore, we can perhaps avoid writing data 
> to a temporary location and directly write compacted data to the intended 
> final path.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to