[
https://issues.apache.org/jira/browse/HIVE-14980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15583187#comment-15583187
]
Sergey Shelukhin commented on HIVE-14980:
-----------------------------------------
cc [~ekoifman]
Should the compactor just use locks?
> Minor compaction when triggered simultaniously on the same table/partition
> deletes data
> ---------------------------------------------------------------------------------------
>
> Key: HIVE-14980
> URL: https://issues.apache.org/jira/browse/HIVE-14980
> Project: Hive
> Issue Type: Bug
> Components: Metastore, Transactions
> Affects Versions: 2.1.0
> Reporter: Mahipal Jupalli
> Assignee: Mahipal Jupalli
> Priority: Critical
> Original Estimate: 96h
> Remaining Estimate: 96h
>
> I have two tables (TABLEA, TABLEB). If I manually trigger compaction after
> each INSERT into TABLEB from TABLEA, compactions are triggered on random
> metastore asynchronously and are stepping on each other which is causing the
> data to be deleted.
> Example here:
> TABLEA - has 10k rows.
> insert into mj.tableb select * from mj.tablea;
> alter table mj.tableb compact 'MINOR';
> insert into mj.tableb select * from mj.tablea;
> alter table mj.tableb compact 'MINOR';
> Once all the compactions are complete, I should ideally see 20k rows in
> TABLEB. But I see only 10k rows (Only the rows INSERTED before the last
> compaction persist, the old rows are deleted. I believe the old delta files
> are deleted).
> To further confirm the bug, if I do only one compaction after two inserts, I
> see 20k rows in TABLEB.
> Proposed Fix:
> I have identified the bug in the code, it requires an additional check in the
> org.apache.hadoop.hive.ql.txn.compactor.Worker class to check for any active
> compactions on the table/partition. I will 'share the details of the fix once
> I test it.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)