[jira] [Updated] (HBASE-16345) RpcRetryingCallerWithReadReplicas#call() should catch some RegionServer Exceptions

huaxiang sun (JIRA) Wed, 24 Aug 2016 12:58:46 -0700

     [ 
https://issues.apache.org/jira/browse/HBASE-16345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


huaxiang sun updated HBASE-16345:
---------------------------------
    Description: 
Update for the description. Debugged more at this front based on the comments 
from Enis. 

The cause is that for the primary replica, if its retry is exhausted too fast, 
f.get() [1] returns ExecutionException. This Exception needs to be ignored and 
continue with the replicas.
The other issue is that after adding calls for the replicas, if the first 
completed task gets ExecutionException (due to the retry exhausted), it throws 
the exception to the client[2].
In this case, it needs to loop through these tasks, waiting for the success 
one. If no one succeeds, throw exception.

Similar for the scan as well

[1] 
https://github.com/apache/hbase/blob/master/hbase-client/src/main/java/org/apache/hadoop/hbase/client/RpcRetryingCallerWithReadReplicas.java#L197

[2]https://github.com/apache/hbase/blob/master/hbase-client/src/main/java/org/apache/hadoop/hbase/client/RpcRetryingCallerWithReadReplicas.java#L219


  was:
We run into the following exception during read replica testing.

{code}
Caused by: org.apache.hadoop.hbase.regionserver.RegionServerStoppedException: 
org.apache.hadoop.hbase.regionserver.RegionServerStoppedException: Server  not 
running
at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.checkOpen(RSRpcServices.java:924)
at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:1766)
at 
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:31439)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2035)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107)
at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
at java.lang.Thread.run(Thread.java:745)

at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at 
org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
at 
org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95)
at 
org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:326)
at 
org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas$ReplicaRegionServerCallable.call(RpcRetryingCallerWithReadReplicas.java:168)
at 
org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas$ReplicaRegionServerCallable.call(RpcRetryingCallerWithReadReplicas.java:100)
at 
org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:126)
... 4 more
Caused by: 
org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.regionserver.RegionServerStoppedException):
 org.apache.hadoop.hbase.regionserver.RegionServerStoppedException: Server not 
running
at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.checkOpen(RSRpcServices.java:924)
at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:1766)
at 
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:31439)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2035)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107)
at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
at java.lang.Thread.run(Thread.java:745)

at org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1200)
at 
org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:216)
at 
org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:300)
at 
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.get(ClientProtos.java:31865)
at 
org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas$ReplicaRegionServerCallable.call(RpcRetryingCallerWithReadReplicas.java:162)
... 6 more
{code}

Checking the code, we need to catch a few exceptions from the primary region 
server and continue with replicas.

https://github.com/apache/hbase/blob/master/hbase-client/src/main/java/org/apache/hadoop/hbase/client/RpcRetryingCallerWithReadReplicas.java#L211


> RpcRetryingCallerWithReadReplicas#call() should catch some RegionServer 
> Exceptions
> ----------------------------------------------------------------------------------
>
>                 Key: HBASE-16345
>                 URL: https://issues.apache.org/jira/browse/HBASE-16345
>             Project: HBase
>          Issue Type: Bug
>          Components: Client
>    Affects Versions: 2.0.0
>            Reporter: huaxiang sun
>            Assignee: huaxiang sun
>
> Update for the description. Debugged more at this front based on the comments 
> from Enis. 
> The cause is that for the primary replica, if its retry is exhausted too 
> fast, f.get() [1] returns ExecutionException. This Exception needs to be 
> ignored and continue with the replicas.
> The other issue is that after adding calls for the replicas, if the first 
> completed task gets ExecutionException (due to the retry exhausted), it 
> throws the exception to the client[2].
> In this case, it needs to loop through these tasks, waiting for the success 
> one. If no one succeeds, throw exception.
> Similar for the scan as well
> [1] 
> https://github.com/apache/hbase/blob/master/hbase-client/src/main/java/org/apache/hadoop/hbase/client/RpcRetryingCallerWithReadReplicas.java#L197
> [2]https://github.com/apache/hbase/blob/master/hbase-client/src/main/java/org/apache/hadoop/hbase/client/RpcRetryingCallerWithReadReplicas.java#L219



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-16345) RpcRetryingCallerWithReadReplicas#call() should catch some RegionServer Exceptions

Reply via email to