[ 
https://issues.apache.org/jira/browse/IMPALA-9664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar updated IMPALA-9664:
----------------------------------------
    Description: 
According to what we see in Hive source code, for transactional tables, the 
insert events are fired with a different API {{addWriteNotificationLog}}. 
Currently Impala fires {{firelistenerEvent}} for both transactional and 
non-transactional tables. We should look at what is the difference between the 
two APIs and see if we need to handle transactional tables differently.

References:
https://github.com/apache/hive/blob/c3afb57bdb1041f566fbbd896f625328fc9656a0/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L2402

https://github.com/apache/hive/blob/c3afb57bdb1041f566fbbd896f625328fc9656a0/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L2236

These insert events are used to replicate the changes in the ACID tables by 
tools like Hive replication. With the ability of insert data into ACID tables 
from Impala, we should also generate the insert events appropriately so that 
the replication works seemlessly. Additionally, the {{truncate table}} command 
should use the HMS API to truncate the table instead of deleteing the files 
directly from filesystem since it takes care of moving the files to a 
replication change management directory so that replication can have access to 
dropped data files.

Note that for external tables, Hive replication doesn't need to keep track of 
the files. It only replicates the table metadata based on events and the data 
files are "distcped" to the target cluster.

  was:
According to what we see in Hive source code, for transactional tables, the 
insert events are fired with a different API {{addWriteNotificationLog}}. 
Currently Impala fires {{firelistenerEvent}} for both transactional and 
non-transactional tables. We should look at what is the difference between the 
two APIs and see if we need to handle transactional tables differently.

References:
https://github.com/apache/hive/blob/c3afb57bdb1041f566fbbd896f625328fc9656a0/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L2402

https://github.com/apache/hive/blob/c3afb57bdb1041f566fbbd896f625328fc9656a0/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L2236

These insert events are used to replicate the changes in the ACID tables by 
tools like Hive replication. With the ability of insert data into ACID tables 
from Impala, we should also generate the insert events appropriately so that 
the replication works seemlessly. Additionally, the {{truncate table}} command 
should use the HMS API to truncate the table instead of deleteing the files 
directly from filesystem since it takes care of moving the files to a 
replication change management directory so that replication can have access to 
dropped data files.


> Support Hive replication for ACID tables
> ----------------------------------------
>
>                 Key: IMPALA-9664
>                 URL: https://issues.apache.org/jira/browse/IMPALA-9664
>             Project: IMPALA
>          Issue Type: Bug
>            Reporter: Vihang Karajgaonkar
>            Assignee: Vihang Karajgaonkar
>            Priority: Critical
>             Fix For: Impala 4.0
>
>
> According to what we see in Hive source code, for transactional tables, the 
> insert events are fired with a different API {{addWriteNotificationLog}}. 
> Currently Impala fires {{firelistenerEvent}} for both transactional and 
> non-transactional tables. We should look at what is the difference between 
> the two APIs and see if we need to handle transactional tables differently.
> References:
> https://github.com/apache/hive/blob/c3afb57bdb1041f566fbbd896f625328fc9656a0/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L2402
> https://github.com/apache/hive/blob/c3afb57bdb1041f566fbbd896f625328fc9656a0/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L2236
> These insert events are used to replicate the changes in the ACID tables by 
> tools like Hive replication. With the ability of insert data into ACID tables 
> from Impala, we should also generate the insert events appropriately so that 
> the replication works seemlessly. Additionally, the {{truncate table}} 
> command should use the HMS API to truncate the table instead of deleteing the 
> files directly from filesystem since it takes care of moving the files to a 
> replication change management directory so that replication can have access 
> to dropped data files.
> Note that for external tables, Hive replication doesn't need to keep track of 
> the files. It only replicates the table metadata based on events and the data 
> files are "distcped" to the target cluster.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to