[
https://issues.apache.org/jira/browse/HIVE-21164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17064788#comment-17064788
]
Sungwoo commented on HIVE-21164:
--------------------------------
I tested with Hive 4 on Tez, and confirm that the same phenomenon occurs. Here
is the summary of the setup:
- Hive 4 commit: ffee30e6267e85f00a22767262192abb9681cfb7 (HIVE-21164: ACID:
...), Fri Feb 21
- Tez commit: fd19ce6c93bc1f899ccca7161b0c0407f850bd77 (TEZ-4123. ...), Wed Feb
12
- hive.acid.direct.insert.enabled set to true
- The warehouse directories reside on S3 (simulated with MinIO), not on HDFS.
- minor changes to tez/pom.xml and hive/pom.xml to fix compilation issues
Result:
0: jdbc:hive2://indigo1:9842/> select * from web_sales limit 100;
No rows selected (99.906 seconds)
0: jdbc:hive2://indigo1:9842/> select count(*) from web_sales;
+----------+
| _c0 |
+----------+
| 1438883 |
+----------+
1 row selected (0.613 seconds)
If we do not create a transactional table, the result is okay. If we add the
following line, the resultant table is empty:
TBLPROPERTIES('transactional'='true', 'transactional_properties'='default');
>From the log of HiveServer2, it seems that HiveServer2 deletes the output
>directories because Utilities.handleDirectInsertTableFinalPath() is called
>twice:
20/03/23 12:38:20 INFO FileOperations: Deleting
s3a://hivemr3/warehouse/tpcds_bin_partitioned_orc_2.db/web_sales/ws_sold_date_sk=2451145/base_0000001/bucket_00000_0
that was not committed
> ACID: explore how we can avoid a move step during inserts/compaction
> --------------------------------------------------------------------
>
> Key: HIVE-21164
> URL: https://issues.apache.org/jira/browse/HIVE-21164
> Project: Hive
> Issue Type: Bug
> Components: Transactions
> Affects Versions: 3.1.1
> Reporter: Vaibhav Gumashta
> Assignee: Marta Kuczora
> Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-21164.1.patch, HIVE-21164.10.patch,
> HIVE-21164.11.patch, HIVE-21164.11.patch, HIVE-21164.12.patch,
> HIVE-21164.13.patch, HIVE-21164.14.patch, HIVE-21164.14.patch,
> HIVE-21164.15.patch, HIVE-21164.16.patch, HIVE-21164.17.patch,
> HIVE-21164.18.patch, HIVE-21164.19.patch, HIVE-21164.2.patch,
> HIVE-21164.20.patch, HIVE-21164.21.patch, HIVE-21164.22.patch,
> HIVE-21164.3.patch, HIVE-21164.4.patch, HIVE-21164.5.patch,
> HIVE-21164.6.patch, HIVE-21164.7.patch, HIVE-21164.8.patch, HIVE-21164.9.patch
>
>
> Currently, we write compacted data to a temporary location and then move the
> files to a final location, which is an expensive operation on some cloud file
> systems. Since HIVE-20823 is already in, it can control the visibility of
> compacted data for the readers. Therefore, we can perhaps avoid writing data
> to a temporary location and directly write compacted data to the intended
> final path.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)