[
https://issues.apache.org/jira/browse/HBASE-12380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14190360#comment-14190360
]
Jimmy Xiang commented on HBASE-12380:
-------------------------------------
I have discussed it with Esteban. We agree that it is better not to abort. We
can log a warning/error message instead and let it go.
The reason for aborting is that this scenario should never happen natually.
Master has a state machine and won't send the open call again if it is already
opened.
My concern with not aborting is that we may hide some serious bug in master if
that indeed happens.
This test is an old test. My suggestion is to remove this test.
> Too many attempts to open a region can crash the RegionServer
> -------------------------------------------------------------
>
> Key: HBASE-12380
> URL: https://issues.apache.org/jira/browse/HBASE-12380
> Project: HBase
> Issue Type: Bug
> Affects Versions: 2.0.0
> Reporter: Esteban Gutierrez
> Priority: Critical
>
> Noticed this while trying to fix faulty test while working on a fix for
> HBASE-12219:
> {code}
> Tests in error:
> TestRegionServerNoMaster.testMultipleOpen:237 » Service
> java.io.IOException: R...
> TestRegionServerNoMaster.testCloseByRegionServer:211->closeRegionNoZK:201 »
> Service
> {code}
> Initially I thought the problem was on my patch for HBASE-12219 but I noticed
> that the issue was occurring on the 7th attempt to open the region. However I
> was able to reproduce the same problem in the master branch after increasing
> the number of requests in testMultipleOpen():
> {code}
> 2014-10-29 15:03:45,043 INFO [Thread-216] regionserver.RSRpcServices(1334):
> Receiving OPEN for the
> region:TestRegionServerNoMaster,,1414620223682.025198143197ea68803e49819eae27ca.,
> which we are already trying to OPEN - ignoring this new request for this
> region.
> Submitting openRegion attempt: 16 <====
> 2014-10-29 15:03:45,044 INFO [Thread-216] regionserver.RSRpcServices(1311):
> Open TestRegionServerNoMaster,,1414620223682.025198143197ea68803e49819eae27ca.
> 2014-10-29 15:03:45,044 INFO
> [PostOpenDeployTasks:025198143197ea68803e49819eae27ca]
> hbase.MetaTableAccessor(1307): Updated row
> TestRegionServerNoMaster,,1414620223682.025198143197ea68803e49819eae27ca.
> with server=192.168.1.105,63082,1414620220789
> Submitting openRegion attempt: 17 <====
> 2014-10-29 15:03:45,046 ERROR [RS_OPEN_REGION-192.168.1.105:63082-2]
> handler.OpenRegionHandler(88): Region 025198143197ea68803e49819eae27ca was
> already online when we started processing the opening. Marking this new
> attempt as failed
> 2014-10-29 15:03:45,047 FATAL [Thread-216] regionserver.HRegionServer(1931):
> ABORTING region server 192.168.1.105,63082,1414620220789: Received OPEN for
> the
> region:TestRegionServerNoMaster,,1414620223682.025198143197ea68803e49819eae27ca.,
> which is already online
> 2014-10-29 15:03:45,047 FATAL [Thread-216] regionserver.HRegionServer(1937):
> RegionServer abort: loaded coprocessors are:
> [org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint]
> 2014-10-29 15:03:45,054 WARN [Thread-216] regionserver.HRegionServer(1955):
> Unable to report fatal error to master
> com.google.protobuf.ServiceException: java.io.IOException: Call to
> /192.168.1.105:63079 failed on local exception: java.io.IOException:
> Connection to /192.168.1.105:63079 is closing. Call id=4, waitTime=2
> at
> org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1707)
> at
> org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1757)
> at
> org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$BlockingStub.reportRSFatalError(RegionServerStatusProtos.java:8301)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.abort(HRegionServer.java:1952)
> at
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.abortRegionServer(MiniHBaseCluster.java:174)
> at
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.access$100(MiniHBaseCluster.java:108)
> at
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer$2.run(MiniHBaseCluster.java:167)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:356)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1528)
> at
> org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:277)
> at
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.abort(MiniHBaseCluster.java:165)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.abort(HRegionServer.java:1964)
> at
> org.apache.hadoop.hbase.regionserver.RSRpcServices.openRegion(RSRpcServices.java:1308)
> at
> org.apache.hadoop.hbase.regionserver.TestRegionServerNoMaster.testMultipleOpen(TestRegionServerNoMaster.java:237)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
> at
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
> at
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> at
> org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
> Caused by: java.io.IOException: Call to /192.168.1.105:63079 failed on local
> exception: java.io.IOException: Connection to /192.168.1.105:63079 is
> closing. Call id=4, waitTime=2
> at
> org.apache.hadoop.hbase.ipc.RpcClient.wrapException(RpcClient.java:1563)
> at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1534)
> at
> org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1692)
> ... 23 more
> Caused by: java.io.IOException: Connection to /192.168.1.105:63079 is
> closing. Call id=4, waitTime=2
> at
> org.apache.hadoop.hbase.ipc.RpcClient$Connection.cleanupCalls(RpcClient.java:1257)
> at
> org.apache.hadoop.hbase.ipc.RpcClient$Connection.close(RpcClient.java:1063)
> at
> org.apache.hadoop.hbase.ipc.RpcClient$Connection.run(RpcClient.java:791)
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)