[
https://issues.apache.org/jira/browse/HBASE-9387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13754906#comment-13754906
]
stack commented on HBASE-9387:
------------------------------
oh, needs a test too.
[~jeffreyz] agree; agree that we should pick out the explicit scenarios where
we can lose region accountability. Since this is the only one we know of
currently, perhaps just do patch for this case for now... and try to figure
other holes in state machine outside of this issue.
> Region could get lost during assignment
> ---------------------------------------
>
> Key: HBASE-9387
> URL: https://issues.apache.org/jira/browse/HBASE-9387
> Project: HBase
> Issue Type: Bug
> Components: Region Assignment
> Affects Versions: 0.95.2
> Reporter: Ted Yu
> Assignee: Ted Yu
> Priority: Critical
> Attachments: 9387-v1.txt, 9387-v3.txt, hbase-9387.patch,
> org.apache.hadoop.hbase.TestFullLogReconstruction-output.txt
>
>
> I observed test timeout running against hadoop 2.1.0 with distributed log
> replay turned on.
> Looks like region state for 1588230740 became inconsistent between master and
> the surviving region server:
> {code}
> 2013-08-29 22:15:34,180 INFO [AM.ZK.Worker-pool2-t4]
> master.RegionStates(299): Onlined 1588230740 on
> kiyo.gq1.ygridcore.net,57016,1377814510039
> ...
> 2013-08-29 22:15:34,587 DEBUG [Thread-221]
> client.HConnectionManager$HConnectionImplementation(1269): locateRegionInMeta
> parentTable=hbase:meta, metaLocation={region=hbase:meta,,1.1588230740,
> hostname=kiyo.gq1.ygridcore.net,57016,1377814510039, seqNum=0}, attempt=2 of
> 35 failed; retrying after sleep of 302 because:
> org.apache.hadoop.hbase.exceptions.RegionOpeningException: Region is being
> opened: 1588230740
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:2574)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3949)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:2733)
> at
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:26965)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2063)
> at
> org.apache.hadoop.hbase.ipc.RpcServer$CallRunner.run(RpcServer.java:1800)
> at
> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:165)
> at
> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:41)
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira