[ 
https://issues.apache.org/jira/browse/HBASE-12319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14197056#comment-14197056
 ] 

stack commented on HBASE-12319:
-------------------------------

Its looking like this patch indeed made branch-1 unstable (branch-1 is back to 
blue again ... except for a hickup brought on by preemptive fail).

> Inconsistencies during region recovery due to close/open of a region during 
> recovery
> ------------------------------------------------------------------------------------
>
>                 Key: HBASE-12319
>                 URL: https://issues.apache.org/jira/browse/HBASE-12319
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.98.7, 0.99.1
>            Reporter: Devaraj Das
>            Assignee: Jeffrey Zhong
>             Fix For: 0.98.8, 0.99.2
>
>         Attachments: HBASE-12319.patch
>
>
> In one of my test runs, I saw the following:
> {noformat}
> 2014-10-14 13:45:30,782 DEBUG 
> [StoreOpener-51af4bd23dc32a940ad2dd5435f00e1d-1] regionserver.HStore: loaded 
> hdfs://hor9n01.gq1.ygridcore.net:8020/apps/hbase/data/data/default/IntegrationTestIngest/51af4bd23dc32a940ad2dd5435f00e1d/test_cf/d6df5cfe15ca41d68c619489fbde4d04,
>  isReference=false, isBulkLoadResult=false, seqid=141197, majorCompaction=true
> 2014-10-14 13:45:30,788 DEBUG [RS_OPEN_REGION-hor9n01:60020-1] 
> regionserver.HRegion: Found 3 recovered edits file(s) under 
> hdfs://hor9n01.gq1.ygridcore.net:8020/apps/hbase/data/data/default/IntegrationTestIngest/51af4bd23dc32a940ad2dd5435f00e1d
> .............
> .............
> 2014-10-14 13:45:31,916 WARN  [RS_OPEN_REGION-hor9n01:60020-1] 
> regionserver.HRegion: Null or non-existent edits file: 
> hdfs://hor9n01.gq1.ygridcore.net:8020/apps/hbase/data/data/default/IntegrationTestIngest/51af4bd23dc32a940ad2dd5435f00e1d/recovered.edits/0000000000000198080
> {noformat}
> The above logs is from a regionserver, say RS2. From the initial analysis it 
> seemed like the master asked a certain regionserver to open the region (let's 
> say RS1) and for some reason asked it to close soon after. The open was still 
> proceeding on RS1 but the master reassigned the region to RS2. This also 
> started the recovery but it ended up seeing an inconsistent view of the 
> recovered-edits files (it reports missing files as per the logs above) since 
> the first regionserver (RS1) deleted some files after it completed the 
> recovery. When RS2 really opens the region, it might not see the recent data 
> that was written by flushes on hor9n10 during the recovery process. Reads of 
> that data would have inconsistencies.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to