[jira] [Commented] (HIVE-14980) Minor compaction when triggered simultaniously on the same table/partition deletes data

Eugene Koifman (JIRA) Mon, 17 Oct 2016 15:27:12 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-14980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15583692#comment-15583692
 ]


Eugene Koifman commented on HIVE-14980:
---------------------------------------

Relying on "show compactions" is not atomic so it's not a complete fix.
It should use locks of some kind, but not in the current lock manager.  
MutexAPI.acquireLock(String) was meant to support the kind of locking that this 
needs but it's not quite complete.  If you use </db/table/partition> for the 
key, and use this from Worker, it will achieve the proper synchronization 
atomically and the "lock" will be released if the process dies.


> Minor compaction when triggered simultaniously on the same table/partition 
> deletes data
> ---------------------------------------------------------------------------------------
>
>                 Key: HIVE-14980
>                 URL: https://issues.apache.org/jira/browse/HIVE-14980
>             Project: Hive
>          Issue Type: Bug
>          Components: Metastore, Transactions
>    Affects Versions: 2.1.0
>            Reporter: Mahipal Jupalli
>            Assignee: Mahipal Jupalli
>            Priority: Critical
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> I have two tables (TABLEA, TABLEB). If I manually trigger compaction after 
> each INSERT into TABLEB from TABLEA, compactions are triggered on random 
> metastore asynchronously and are stepping on each other which is causing the 
> data to be deleted.
> Example here: 
> TABLEA - has 10k rows. 
> insert into mj.tableb select * from mj.tablea;
> alter table mj.tableb compact 'MINOR';
> insert into mj.tableb select * from mj.tablea;
> alter table mj.tableb compact 'MINOR';
> Once all the compactions are complete, I should ideally see 20k rows in 
> TABLEB. But I see only 10k rows (Only the rows INSERTED before the last 
> compaction persist, the old rows are deleted. I believe the old delta files 
> are deleted). 
> To further confirm the bug, if I do only one compaction after two inserts, I 
> see 20k rows in TABLEB.
> Proposed Fix:
> I have identified the bug in the code, it requires an additional check in the 
> org.apache.hadoop.hive.ql.txn.compactor.Worker class to check for any active 
> compactions on the table/partition. I will 'share the details of the fix once 
> I test it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14980) Minor compaction when triggered simultaniously on the same table/partition deletes data

Reply via email to