[jira] [Commented] (HIVE-8966) Delta files created by hive hcatalog streaming cannot be compacted

Jihong Liu (JIRA) Sat, 06 Dec 2014 20:43:09 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14237046#comment-14237046
 ]


Jihong Liu commented on HIVE-8966:
----------------------------------

The scenario of data lost:
Assume when start compaction there are two deltas, delta_00011_00020 and 
delta_00021_00030, where the transaction batch in the first one is closed, and 
the second one still has transaction batch open. After compaction is finished, 
the status in compaction_ queue  will become “ready_for_clean”. Then clean 
process will be triggered. Cleaner will remove all deltas if its transaction id 
is less than the base which just created and if there is no lock on it. In the 
meantime, we still load data into the second delta. When finish loading and 
close the transaction batch, cleaner detects no lock on that, so delete it. So 
the new data added after compaction will be lost. 


> Delta files created by hive hcatalog streaming cannot be compacted
> ------------------------------------------------------------------
>
>                 Key: HIVE-8966
>                 URL: https://issues.apache.org/jira/browse/HIVE-8966
>             Project: Hive
>          Issue Type: Bug
>          Components: HCatalog
>    Affects Versions: 0.14.0
>         Environment: hive
>            Reporter: Jihong Liu
>            Assignee: Alan Gates
>            Priority: Critical
>             Fix For: 0.14.1
>
>         Attachments: HIVE-8966.patch
>
>
> hive hcatalog streaming will also create a file like bucket_n_flush_length in 
> each delta directory. Where "n" is the bucket number. But the 
> compactor.CompactorMR think this file also needs to compact. However this 
> file of course cannot be compacted, so compactor.CompactorMR will not 
> continue to do the compaction. 
> Did a test, after removed the bucket_n_flush_length file, then the "alter 
> table partition compact" finished successfully. If don't delete that file, 
> nothing will be compacted. 
> This is probably a very severity bug. Both 0.13 and 0.14 have this issue



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8966) Delta files created by hive hcatalog streaming cannot be compacted

Reply via email to