[
https://issues.apache.org/jira/browse/HBASE-8531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
stack updated HBASE-8531:
-------------------------
Attachment: 8531v6.txt
Here are commit notes:
{code}
M hbase-client/src/main/java/org/apache/hadoop/hbase/client/ClientScanner.java
Fixup confusing log message
M hbase-client/src/main/java/org/apache/hadoop/hbase/client/HConnection.java
De-deprecate getCurrentNrHRS; it no longer talks of zk (don't have to get it
from zk but default does)
M
hbase-client/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
Make it so can specify alternate HConnection implementation. At the moment
it is
still hard to do a HConnection only because too many things still rely on
getting
an HConnectionImplementation from HCM -- mostly it is pretty important tests
--
but being able to insert a subclass is enough for most purposes. Instead of
getting cluster data from zk, i.e. clusterid, meta address, etc., now go via
a 'Registry' Interface so can insert alternate Registry implementations; i.e
non-zk ones. Added a zk registry to get info from zk as default.
M hbase-client/src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java
This is actual fix, using HTable instead of custom HConnection handling/ the
custom handling was not resetting the Scanner if it got a DoNotRetryException.
Also make it so can pass a null Visitor. Makes testing a little easier.
A hbase-client/src/main/java/org/apache/hadoop/hbase/client/Registry.java
New Registry interface; get cluster specifics from here.
M hbase-client/src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java
I changed RegionServerStoppedException to implement DoNotRetryException so
this
bit of code doing special handling of RSSE is no longer needed.
M hbase-client/src/main/java/org/apache/hadoop/hbase/client/ServerCallable.java
Make the log message bear more info... was confounding just printing out
exception
message and not what the exception type was.
A
hbase-client/src/main/java/org/apache/hadoop/hbase/client/ZooKeeperRegistry.java
A registry that keeps cluster data up in zk.
M
hbase-client/src/main/java/org/apache/hadoop/hbase/exceptions/RegionServerStoppedException.java
Make this a DoNotRetryException.
A
hbase-client/src/test/java/org/apache/hadoop/hbase/client/TestClientNoCluster.java
Test that makes use of new functionality to reproduce the stack trace seen in
this
issue; caused by a DoNotRetryException coming up when we were doing a
MetaScanner.metaScan.
A hbase-client/src/test/resources/hbase-site.xml
We need one of these in here too... just has the one config for now.
M
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
Improve message when going down; a YouAreDeadException is not an
'Unhandled'... we
are doing the right thing going down.
M hbase-server/src/test/java/org/apache/hadoop/hbase/TestZooKeeper.java
Add more debug and timeouts on tests.
{code}
> TestZooKeeper fails in trunk/0.95 builds
> ----------------------------------------
>
> Key: HBASE-8531
> URL: https://issues.apache.org/jira/browse/HBASE-8531
> Project: HBase
> Issue Type: Bug
> Components: Zookeeper
> Reporter: stack
> Assignee: stack
> Fix For: 0.95.1
>
> Attachments: 8531.txt, 8531v4.txt, 8531v5.txt, 8531v6.txt
>
>
> TestZooKeeper fails on occasion. I caught a good example recently. See
> below failure stack trace.
> It took me a while. I thought the issue had to do w/ our recent ipc
> refactorings but it looks like a problem we have always had. In short,
> MetaScanner is not handling DoNotRetryIOEs -- it is letting them out.
> DNRIOEs when scanning are supposed to force a reset of the scan. HTable#next
> catches these and does the necessary scanner reset up. MetaScanner is
> running some subset of what HTable does when it is scanning except the part
> where it catches a DNRIOE and redoes the scan. Odd.
> TestZooKeeper failed in this instance because the test kills a regionserver
> at same time as we are trying to create a table. In create table we do a
> meta scan using MetaScanner passing a Visitor. The scan starts and gets a
> RegionServerStoppedException (This is NOT a DNRIOE -- it should be -- but
> later we convert it into one up in ScannerCallable).
> DNRIOEs are thrown to the upper layers to handle....
> Let me look into having MetaScanner just use HTable scanning. It makes an
> instance just to find where to start the scan... let me try using this
> instance for actually scanning.
> TODO: Do this convertion everywhere a DNRIOE could come out.
> Here is the stack trace
> {code}
> org.apache.hadoop.hbase.exceptions.DoNotRetryIOException: Reset scanner
> at
> org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:209)
> at
> org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:52)
> at
> org.apache.hadoop.hbase.client.ServerCallable.withRetries(ServerCallable.java:170)
> at
> org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:212)
> at
> org.apache.hadoop.hbase.client.MetaScanner.access$000(MetaScanner.java:52)
> at
> org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:131)
> at
> org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:128)
> at
> org.apache.hadoop.hbase.client.HConnectionManager.execute(HConnectionManager.java:398)
> at
> org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:128)
> at
> org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:103)
> at
> org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:81)
> at
> org.apache.hadoop.hbase.client.HBaseAdmin.createTable(HBaseAdmin.java:448)
> at
> org.apache.hadoop.hbase.client.HBaseAdmin.createTable(HBaseAdmin.java:348)
> at
> org.apache.hadoop.hbase.TestZooKeeper.testSanity(TestZooKeeper.java:242)
> at
> org.apache.hadoop.hbase.TestZooKeeper.testRegionServerSessionExpired(TestZooKeeper.java:203)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
> at
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
> at
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> at
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
> at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
> at
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
> at
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
> at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
> at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
> at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
> at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
> at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
> at
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
> at
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
> at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
> at org.junit.runners.Suite.runChild(Suite.java:127)
> at org.junit.runners.Suite.runChild(Suite.java:26)
> at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> Caused by: org.apache.hadoop.hbase.exceptions.RegionServerStoppedException:
> org.apache.hadoop.hbase.exceptions.RegionServerStoppedException: Server
> p0116.mtv.cloudera.com,60679,1368057284663 not running, aborting
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
> at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
> at
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:95)
> at
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:79)
> at
> org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:227)
> at
> org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:175)
> ... 43 more
> Caused by: org.apache.hadoop.hbase.ipc.RemoteWithExtrasException:
> org.apache.hadoop.hbase.exceptions.RegionServerStoppedException: Server
> p0116.mtv.cloudera.com,60679,1368057284663 not running, aborting
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2310)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:2874)
> at
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:20577)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2103)
> at
> org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1810)
> at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1336)
> at
> org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1532)
> at
> org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1587)
> at
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.scan(ClientProtos.java:21012)
> at
> org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:147)
> ... 43 more
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira