[
https://issues.apache.org/jira/browse/HBASE-12319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14181756#comment-14181756
]
Jeffrey Zhong commented on HBASE-12319:
---------------------------------------
This issue is due to region opening is canceled while AM doesn't wait for the
cancel completes and reassign the region immediately as shown in the following
log lines. Therefore, the previous region open operation may overlap the new
region assignment. This issue happen in 0.98 & branch-1.
{noformat}
hbase-hbase-master-hor9n01.gq1.ygridcore.net.log:Caused by:
org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.NotServingRegionException):
org.apache.hadoop.hbase.NotServingRegionException: The region
51af4bd23dc32a940ad2dd5435f00e1d was opening but not yet served. Opening is
cancelled.
hbase-hbase-master-hor9n01.gq1.ygridcore.net.log:2014-10-14 13:45:30,564 INFO
[AM.-pool1-t8] master.RegionStates: Transitioned
{51af4bd23dc32a940ad2dd5435f00e1d state=OPENING, ts=1413294330350,
server=hor9n10.gq1.ygridcore.net,60020,1413293516978} to
{51af4bd23dc32a940ad2dd5435f00e1d state=OFFLINE, ts=1413294330564,
server=hor9n10.gq1.ygridcore.net,60020,1413293516978}
hbase-hbase-master-hor9n01.gq1.ygridcore.net.log:2014-10-14 13:45:30,566 DEBUG
[AM.-pool1-t8] master.AssignmentManager: No previous transition plan found (or
ignoring an existing plan) for
IntegrationTestIngest,59999994,1413293958381.51af4bd23dc32a940ad2dd5435f00e1d.;
generated random
plan=hri=IntegrationTestIngest,59999994,1413293958381.51af4bd23dc32a940ad2dd5435f00e1d.,
src=, dest=hor9n01.gq1.ygridcore.net,60020,1413294323616; 4 (online=4,
available=4) available servers, forceNewPlan=true
hbase-hbase-master-hor9n01.gq1.ygridcore.net.log:2014-10-14 13:45:30,566 DEBUG
[AM.-pool1-t8] zookeeper.ZKAssign: master:60000-0x3490b3b07a1085e,
quorum=hor9n08.gq1.ygridcore.net:2181,hor9n01.gq1.ygridcore.net:2181,hor9n10.gq1.ygridcore.net:2181,
baseZNode=/hbase Creating (or updating) unassigned node
51af4bd23dc32a940ad2dd5435f00e1d with OFFLINE state
hbase-hbase-master-hor9n01.gq1.ygridcore.net.log:2014-10-14 13:45:30,589 INFO
[AM.-pool1-t8] master.AssignmentManager: Assigning
IntegrationTestIngest,59999994,1413293958381.51af4bd23dc32a940ad2dd5435f00e1d.
to hor9n01.gq1.ygridcore.net,60020,1413294323616
{noformat}
> Inconsistencies during region recovery due to close/open of a region during
> recovery
> ------------------------------------------------------------------------------------
>
> Key: HBASE-12319
> URL: https://issues.apache.org/jira/browse/HBASE-12319
> Project: HBase
> Issue Type: Bug
> Reporter: Devaraj Das
> Assignee: Jeffrey Zhong
>
> In one of my test runs, I saw the following:
> {noformat}
> 2014-10-14 13:45:30,782 DEBUG
> [StoreOpener-51af4bd23dc32a940ad2dd5435f00e1d-1] regionserver.HStore: loaded
> hdfs://hor9n01.gq1.ygridcore.net:8020/apps/hbase/data/data/default/IntegrationTestIngest/51af4bd23dc32a940ad2dd5435f00e1d/test_cf/d6df5cfe15ca41d68c619489fbde4d04,
> isReference=false, isBulkLoadResult=false, seqid=141197, majorCompaction=true
> 2014-10-14 13:45:30,788 DEBUG [RS_OPEN_REGION-hor9n01:60020-1]
> regionserver.HRegion: Found 3 recovered edits file(s) under
> hdfs://hor9n01.gq1.ygridcore.net:8020/apps/hbase/data/data/default/IntegrationTestIngest/51af4bd23dc32a940ad2dd5435f00e1d
> .............
> .............
> 2014-10-14 13:45:31,916 WARN [RS_OPEN_REGION-hor9n01:60020-1]
> regionserver.HRegion: Null or non-existent edits file:
> hdfs://hor9n01.gq1.ygridcore.net:8020/apps/hbase/data/data/default/IntegrationTestIngest/51af4bd23dc32a940ad2dd5435f00e1d/recovered.edits/0000000000000198080
> {noformat}
> The above logs is from a regionserver, say RS2. From the initial analysis it
> seemed like the master asked a certain regionserver to open the region (let's
> say RS1) and for some reason asked it to close soon after. The open was still
> proceeding on RS1 but the master reassigned the region to RS2. This also
> started the recovery but it ended up seeing an inconsistent view of the
> recovered-edits files (it reports missing files as per the logs above) since
> the first regionserver (RS1) deleted some files after it completed the
> recovery. When RS2 really opens the region, it might not see the recent data
> that was written by flushes on hor9n10 during the recovery process. Reads of
> that data would have inconsistencies.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)