[
https://issues.apache.org/jira/browse/HBASE-10370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ted Yu resolved HBASE-10370.
----------------------------
Resolution: Fixed
> Compaction in out-of-date Store causes region split failure
> -----------------------------------------------------------
>
> Key: HBASE-10370
> URL: https://issues.apache.org/jira/browse/HBASE-10370
> Project: HBase
> Issue Type: Bug
> Components: Compaction
> Affects Versions: 0.94.3, 0.98.0, 0.99.0
> Reporter: Liu Shaohui
> Assignee: Liu Shaohui
> Priority: Critical
> Fix For: 0.98.0, 0.96.2, 0.99.0
>
> Attachments: 10370-v3.patch, 10370-v4.patch, 10370v2.096.txt,
> HBASE-10370-v1.diff, HBASE-10370-v2.diff
>
>
> In out product cluster, we encounter a problem that two daughter regions can
> not been opened for FileNotFoundException.
> {quote}
> 2014-01-14,20:12:46,927 INFO
> org.apache.hadoop.hbase.regionserver.SplitRequest: Running rollback/cleanup
> of failed split of
> user_profile,xxxxxxxxx,1389671863815.99e016485b0bc142d67ae07a884f6966.;
> Failed
> lg-hadoop-st34.bj,21600,1389060755669-daughterOpener=ec8bbda0f132c481b451fa40e7152b98
> java.io.IOException: Failed
> lg-hadoop-st34.bj,21600,1389060755669-daughterOpener=ec8bbda0f132c481b451fa40e7152b98
> at
> org.apache.hadoop.hbase.regionserver.SplitTransaction.openDaughters(SplitTransaction.java:375)
> at
> org.apache.hadoop.hbase.regionserver.SplitTransaction.execute(SplitTransaction.java:467)
> at
> org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:69)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> Caused by: java.io.IOException: java.io.IOException:
> java.io.FileNotFoundException: File does not exist:
> /hbase/lgprc-xiaomi/user_profile/99e016485b0bc142d67ae07a884f6966/A/5e05d706e4a84f34acc2cf00f089a4cf
> ....
> {quote}
> The reason is that a compaction in an out-of-date Store deletes the hfiles,
> which are referenced by the daughter regions after split. This will cause
> the daughter regions can not be opened forever.
> The timeline is that
> Assumption: there are two hfiles: a, b in Store A in Region R
> t0: A compaction request of Store A(a+b) in Region R is sent.
> t1: First Split for Region R. But this split is timeout and rollbacked. In
> the rollback, region reinitializes all store objects , see SplitTransaction
> #824. Now the store is Region R is A'(a+b).
> t2: Run the compaction sent in t0 . (hfile: a + b -> c): A(a+b) -> A(c).
> Hfile a and b are archived.
> t3: Another Split for Region R. R splits into two region R.0, R.1, which
> create hfile references for hfile a, b from Store A'(a + b)
> t4: For hfile a, b have been deleted, the opening for region R.0 and R.1 will
> failed for FileNotFoundException.
> I have add a test to identity this problem.
> After search the jira, maybe HBASE-8502 is the same problem. [~goldin]
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)