[
https://issues.apache.org/jira/browse/HBASE-12319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jeffrey Zhong resolved HBASE-12319.
-----------------------------------
Resolution: Fixed
Fix Version/s: 0.99.2
0.98.8
Hadoop Flags: Reviewed
Thanks [~jxiang] for the review! I've integrated the fix into 0.98 & branch-1.
> Inconsistencies during region recovery due to close/open of a region during
> recovery
> ------------------------------------------------------------------------------------
>
> Key: HBASE-12319
> URL: https://issues.apache.org/jira/browse/HBASE-12319
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.98.7, 0.99.1
> Reporter: Devaraj Das
> Assignee: Jeffrey Zhong
> Fix For: 0.98.8, 0.99.2
>
> Attachments: HBASE-12319.patch
>
>
> In one of my test runs, I saw the following:
> {noformat}
> 2014-10-14 13:45:30,782 DEBUG
> [StoreOpener-51af4bd23dc32a940ad2dd5435f00e1d-1] regionserver.HStore: loaded
> hdfs://hor9n01.gq1.ygridcore.net:8020/apps/hbase/data/data/default/IntegrationTestIngest/51af4bd23dc32a940ad2dd5435f00e1d/test_cf/d6df5cfe15ca41d68c619489fbde4d04,
> isReference=false, isBulkLoadResult=false, seqid=141197, majorCompaction=true
> 2014-10-14 13:45:30,788 DEBUG [RS_OPEN_REGION-hor9n01:60020-1]
> regionserver.HRegion: Found 3 recovered edits file(s) under
> hdfs://hor9n01.gq1.ygridcore.net:8020/apps/hbase/data/data/default/IntegrationTestIngest/51af4bd23dc32a940ad2dd5435f00e1d
> .............
> .............
> 2014-10-14 13:45:31,916 WARN [RS_OPEN_REGION-hor9n01:60020-1]
> regionserver.HRegion: Null or non-existent edits file:
> hdfs://hor9n01.gq1.ygridcore.net:8020/apps/hbase/data/data/default/IntegrationTestIngest/51af4bd23dc32a940ad2dd5435f00e1d/recovered.edits/0000000000000198080
> {noformat}
> The above logs is from a regionserver, say RS2. From the initial analysis it
> seemed like the master asked a certain regionserver to open the region (let's
> say RS1) and for some reason asked it to close soon after. The open was still
> proceeding on RS1 but the master reassigned the region to RS2. This also
> started the recovery but it ended up seeing an inconsistent view of the
> recovered-edits files (it reports missing files as per the logs above) since
> the first regionserver (RS1) deleted some files after it completed the
> recovery. When RS2 really opens the region, it might not see the recent data
> that was written by flushes on hor9n10 during the recovery process. Reads of
> that data would have inconsistencies.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)