[ 
https://issues.apache.org/jira/browse/HIVE-21451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16793385#comment-16793385
 ] 

Vaibhav Gumashta commented on HIVE-21451:
-----------------------------------------

[~pvary] would you be interested in looking at this?

cc [~gopalv]

> ACID: Avoid using hive.acid.key.index to determine if the file is original or 
> not
> ---------------------------------------------------------------------------------
>
>                 Key: HIVE-21451
>                 URL: https://issues.apache.org/jira/browse/HIVE-21451
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Transactions
>    Affects Versions: 3.1.1
>            Reporter: Vaibhav Gumashta
>            Priority: Major
>
> The transactional files written in hive have each row decorated with ROW__ID 
> column. However, when we bring in files using LOAD DATA... command to the 
> transactional tables, they do not have these metadata columns (in Hive ACID 
> parlance, these are called original files). These original files are 
> decorated with an inferred ROW__ID generated while reading these. However, 
> after these are compacted, the ROW__ID metadata column, becomes part of the 
> file itself.
> To determine if a file is original or not, currently we use check for the 
> presence of hive.acid.key.index. For query based compaction, currently we do 
> not write hive.acid.key.index (HIVE-21165). This means, there is a 
> possibility that that even after compaction, they get treated as original 
> files.
> Irrespective of HIVE-21165, we should avoid hive.acid.key.index to decide 
> whether the file is original or not, and instead look for the presence of 
> ROW__ID to do that. hive.acid.key.index should be treated as a performance 
> optimization, as it was seemingly meant to be.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to