[ 
https://issues.apache.org/jira/browse/HIVE-23410?focusedWorklogId=521711&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-521711
 ]

ASF GitHub Bot logged work on HIVE-23410:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 08/Dec/20 14:40
            Start Date: 08/Dec/20 14:40
    Worklog Time Spent: 10m 
      Work Description: kuczoram commented on a change in pull request #1660:
URL: https://github.com/apache/hive/pull/1660#discussion_r538445196



##########
File path: ql/src/test/org/apache/hadoop/hive/ql/lockmgr/TestDbTxnManager2.java
##########
@@ -1747,15 +1747,15 @@ public void 
testMultiInsertOnDynamicallyPartitionedMmTable() throws Exception {
     final String completedTxnComponentsContents =
         TxnDbUtil.queryToString(conf, "select * from 
\"COMPLETED_TXN_COMPONENTS\"");
     Assert.assertEquals(completedTxnComponentsContents,
-        2, TxnDbUtil.countQueryAgent(conf, "select count(*) from 
\"COMPLETED_TXN_COMPONENTS\""));
+        4, TxnDbUtil.countQueryAgent(conf, "select count(*) from 
\"COMPLETED_TXN_COMPONENTS\""));

Review comment:
       Those records are duplicates. It is a "side-effect" of fixing the 
FileSinkOperator-MoveTask assignment.
   For ACID tables for an insert like in the test, 4 records were created even 
before the direct insert got introduced. Because then the FSO-MoveTask 
assignment was based on the staging directories. And for insert like this there 
were 2 FSOs and 2 MoveTasks. Each MoveTasks called the metastore method which 
creates an entry in the TXN_COMPONENTS table for each partition. So there were 
4 records at the end of the insert. But for MM tables (and later for direct 
insert) there is no staging directory and all MoveTasks and all FSOs will 
contain the table directory. So for every FSO it will find the same MoveTask 
(which is the first in the list) and only this one will be executed. This is 
not correct, but didn't cause any issue, so it was undetected until the direct 
delete and update came in. To make them work properly, had to fix the 
FSO-MoveTask assignment, but then for MM tables and with direct insert it will 
have duplicate records just like for ACID tables without direct insert. The 
Java doc of the TxnHandler.addDynamicPartitions method says that duplicates 
won't cause any trouble, but if you know issues with that, please share it with 
me.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 521711)
    Time Spent: 1h 20m  (was: 1h 10m)

> ACID: Improve the delete and update operations to avoid the move step
> ---------------------------------------------------------------------
>
>                 Key: HIVE-23410
>                 URL: https://issues.apache.org/jira/browse/HIVE-23410
>             Project: Hive
>          Issue Type: Improvement
>    Affects Versions: 4.0.0
>            Reporter: Marta Kuczora
>            Assignee: Marta Kuczora
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: HIVE-23410.1.patch
>
>          Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> This is a follow-up task for 
> [HIVE-21164|https://issues.apache.org/jira/browse/HIVE-21164], where the 
> insert operation has been modified to write directly to the table locations 
> instead of the staging directory. The same improvement should be done for the 
> ACID update and delete operations as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to