[
https://issues.apache.org/jira/browse/HBASE-9387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13754885#comment-13754885
]
stack commented on HBASE-9387:
------------------------------
[~jxiang] regards the xtra step of confirming the znode in still there, that
would be tough given the master is going to remove it. I'd imagine we'd see
many cases of the region getting closed by the RS because it could not do this
last (new) step. I say that on the edges of transitions, we just abort for now.
> Region could get lost during assignment
> ---------------------------------------
>
> Key: HBASE-9387
> URL: https://issues.apache.org/jira/browse/HBASE-9387
> Project: HBase
> Issue Type: Bug
> Components: Region Assignment
> Affects Versions: 0.95.2
> Reporter: Ted Yu
> Assignee: Ted Yu
> Priority: Critical
> Attachments: 9387-v1.txt, hbase-9387.patch,
> org.apache.hadoop.hbase.TestFullLogReconstruction-output.txt
>
>
> I observed test timeout running against hadoop 2.1.0 with distributed log
> replay turned on.
> Looks like region state for 1588230740 became inconsistent between master and
> the surviving region server:
> {code}
> 2013-08-29 22:15:34,180 INFO [AM.ZK.Worker-pool2-t4]
> master.RegionStates(299): Onlined 1588230740 on
> kiyo.gq1.ygridcore.net,57016,1377814510039
> ...
> 2013-08-29 22:15:34,587 DEBUG [Thread-221]
> client.HConnectionManager$HConnectionImplementation(1269): locateRegionInMeta
> parentTable=hbase:meta, metaLocation={region=hbase:meta,,1.1588230740,
> hostname=kiyo.gq1.ygridcore.net,57016,1377814510039, seqNum=0}, attempt=2 of
> 35 failed; retrying after sleep of 302 because:
> org.apache.hadoop.hbase.exceptions.RegionOpeningException: Region is being
> opened: 1588230740
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:2574)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3949)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:2733)
> at
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:26965)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2063)
> at
> org.apache.hadoop.hbase.ipc.RpcServer$CallRunner.run(RpcServer.java:1800)
> at
> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:165)
> at
> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:41)
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira