[ 
https://issues.apache.org/jira/browse/HBASE-8531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-8531:
-------------------------

    Attachment: 8531v6.txt

Here are commit notes:

{code}
M hbase-client/src/main/java/org/apache/hadoop/hbase/client/ClientScanner.java
  Fixup confusing log message

M hbase-client/src/main/java/org/apache/hadoop/hbase/client/HConnection.java
  De-deprecate getCurrentNrHRS; it no longer talks of zk (don't have to get it
  from zk but default does)


M 
hbase-client/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
  Make it so can specify alternate HConnection implementation.  At the moment 
it is
  still hard to do a HConnection only because too many things still rely on 
getting
  an HConnectionImplementation from HCM -- mostly it is pretty important tests 
--
  but being able to insert a subclass is enough for most purposes.  Instead of
  getting cluster data from zk, i.e. clusterid, meta address, etc., now go via
  a 'Registry' Interface so can insert alternate Registry implementations; i.e
  non-zk ones.  Added a zk registry to get info from zk as default.

M hbase-client/src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java
  This is actual fix, using HTable instead of custom HConnection handling/ the
  custom handling was not resetting the Scanner if it got a DoNotRetryException.
  Also make it so can pass a null Visitor.  Makes testing a little easier.

A hbase-client/src/main/java/org/apache/hadoop/hbase/client/Registry.java
  New Registry interface; get cluster specifics from here.

M hbase-client/src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java
  I changed RegionServerStoppedException to implement DoNotRetryException so 
this
  bit of code doing special handling of RSSE is no longer needed.

M hbase-client/src/main/java/org/apache/hadoop/hbase/client/ServerCallable.java
  Make the log message bear more info... was confounding just printing out 
exception
  message and not what the exception type was.

A 
hbase-client/src/main/java/org/apache/hadoop/hbase/client/ZooKeeperRegistry.java
  A registry that keeps cluster data up in zk.

M 
hbase-client/src/main/java/org/apache/hadoop/hbase/exceptions/RegionServerStoppedException.java
  Make this a DoNotRetryException.

A 
hbase-client/src/test/java/org/apache/hadoop/hbase/client/TestClientNoCluster.java
  Test that makes use of new functionality to reproduce the stack trace seen in 
this
  issue; caused by a DoNotRetryException coming up when we were doing a 
MetaScanner.metaScan.

A hbase-client/src/test/resources/hbase-site.xml
  We need one of these in here too... just has the one config for now.

M 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
  Improve message when going down; a YouAreDeadException is not an 
'Unhandled'... we
  are doing the right thing going down.

M hbase-server/src/test/java/org/apache/hadoop/hbase/TestZooKeeper.java
  Add more debug and timeouts on tests.
{code}
                
> TestZooKeeper fails in trunk/0.95 builds
> ----------------------------------------
>
>                 Key: HBASE-8531
>                 URL: https://issues.apache.org/jira/browse/HBASE-8531
>             Project: HBase
>          Issue Type: Bug
>          Components: Zookeeper
>            Reporter: stack
>            Assignee: stack
>             Fix For: 0.95.1
>
>         Attachments: 8531.txt, 8531v4.txt, 8531v5.txt, 8531v6.txt
>
>
> TestZooKeeper fails on occasion.  I caught a good example recently.  See 
> below failure stack trace.
> It took me a while.  I thought the issue had to do w/ our recent ipc 
> refactorings but it looks like a problem we have always had.  In short, 
> MetaScanner is not handling DoNotRetryIOEs -- it is letting them out.  
> DNRIOEs when scanning are supposed to force a reset of the scan.  HTable#next 
> catches these and does the necessary scanner reset up.  MetaScanner is 
> running some subset of what HTable does when it is scanning except the part 
> where it catches a DNRIOE and redoes the scan.  Odd.
> TestZooKeeper failed in this instance because the test kills a regionserver 
> at same time as we are trying to create a table.  In create table we do a 
> meta scan using MetaScanner passing a Visitor.  The scan starts and gets a 
> RegionServerStoppedException (This is NOT a DNRIOE -- it should be -- but 
> later we convert it into one up in ScannerCallable).
> DNRIOEs are thrown to the upper layers to handle....
> Let me look into having MetaScanner just use HTable scanning.  It makes an 
> instance just to find where to start the scan... let me try using this 
> instance for actually scanning.
> TODO: Do this convertion everywhere a DNRIOE could come out.
> Here is the stack trace
> {code}
> org.apache.hadoop.hbase.exceptions.DoNotRetryIOException: Reset scanner
>       at 
> org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:209)
>       at 
> org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:52)
>       at 
> org.apache.hadoop.hbase.client.ServerCallable.withRetries(ServerCallable.java:170)
>       at 
> org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:212)
>       at 
> org.apache.hadoop.hbase.client.MetaScanner.access$000(MetaScanner.java:52)
>       at 
> org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:131)
>       at 
> org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:128)
>       at 
> org.apache.hadoop.hbase.client.HConnectionManager.execute(HConnectionManager.java:398)
>       at 
> org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:128)
>       at 
> org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:103)
>       at 
> org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:81)
>       at 
> org.apache.hadoop.hbase.client.HBaseAdmin.createTable(HBaseAdmin.java:448)
>       at 
> org.apache.hadoop.hbase.client.HBaseAdmin.createTable(HBaseAdmin.java:348)
>       at 
> org.apache.hadoop.hbase.TestZooKeeper.testSanity(TestZooKeeper.java:242)
>       at 
> org.apache.hadoop.hbase.TestZooKeeper.testRegionServerSessionExpired(TestZooKeeper.java:203)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>       at java.lang.reflect.Method.invoke(Method.java:597)
>       at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>       at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>       at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>       at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>       at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>       at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
>       at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
>       at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
>       at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
>       at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
>       at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
>       at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
>       at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
>       at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>       at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>       at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
>       at org.junit.runners.Suite.runChild(Suite.java:127)
>       at org.junit.runners.Suite.runChild(Suite.java:26)
>       at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
>       at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>       at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>       at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>       at java.lang.Thread.run(Thread.java:662)
> Caused by: org.apache.hadoop.hbase.exceptions.RegionServerStoppedException: 
> org.apache.hadoop.hbase.exceptions.RegionServerStoppedException: Server 
> p0116.mtv.cloudera.com,60679,1368057284663 not running, aborting
>       at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>       at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
>       at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
>       at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
>       at 
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:95)
>       at 
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:79)
>       at 
> org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:227)
>       at 
> org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:175)
>       ... 43 more
> Caused by: org.apache.hadoop.hbase.ipc.RemoteWithExtrasException: 
> org.apache.hadoop.hbase.exceptions.RegionServerStoppedException: Server 
> p0116.mtv.cloudera.com,60679,1368057284663 not running, aborting
>       at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2310)
>       at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:2874)
>       at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:20577)
>       at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2103)
>       at 
> org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1810)
>       at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1336)
>       at 
> org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1532)
>       at 
> org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1587)
>       at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.scan(ClientProtos.java:21012)
>       at 
> org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:147)
>       ... 43 more
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to