Sergey Shelukhin commented on HIVE-14980:

cc [~ekoifman] 

Should the compactor just use locks?

> Minor compaction when triggered simultaniously on the same table/partition 
> deletes data
> ---------------------------------------------------------------------------------------
>                 Key: HIVE-14980
>                 URL: https://issues.apache.org/jira/browse/HIVE-14980
>             Project: Hive
>          Issue Type: Bug
>          Components: Metastore, Transactions
>    Affects Versions: 2.1.0
>            Reporter: Mahipal Jupalli
>            Assignee: Mahipal Jupalli
>            Priority: Critical
>   Original Estimate: 96h
>  Remaining Estimate: 96h
> I have two tables (TABLEA, TABLEB). If I manually trigger compaction after 
> each INSERT into TABLEB from TABLEA, compactions are triggered on random 
> metastore asynchronously and are stepping on each other which is causing the 
> data to be deleted.
> Example here: 
> TABLEA - has 10k rows. 
> insert into mj.tableb select * from mj.tablea;
> alter table mj.tableb compact 'MINOR';
> insert into mj.tableb select * from mj.tablea;
> alter table mj.tableb compact 'MINOR';
> Once all the compactions are complete, I should ideally see 20k rows in 
> TABLEB. But I see only 10k rows (Only the rows INSERTED before the last 
> compaction persist, the old rows are deleted. I believe the old delta files 
> are deleted). 
> To further confirm the bug, if I do only one compaction after two inserts, I 
> see 20k rows in TABLEB.
> Proposed Fix:
> I have identified the bug in the code, it requires an additional check in the 
> org.apache.hadoop.hive.ql.txn.compactor.Worker class to check for any active 
> compactions on the table/partition. I will 'share the details of the fix once 
> I test it.

This message was sent by Atlassian JIRA

Reply via email to