[ 
https://issues.apache.org/jira/browse/HBASE-12377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen Yuan Jiang updated HBASE-12377:
---------------------------------------
    Description: 
This is the same issue that HBASE-10809 tried to address.  The fix of 
HBASE-10809 refetch the latest meta location in retry-loop.  However, there are 
2 problems: (1).  inside the retry loop, there is another try-catch block that 
would throw the exception before retry can kick in; (2). It looks like that 
HBaseAdmin::getFirstMetaServerForTable() always tries to get meta data from 
meta cache, which means if the meta cache is stale and out of date, retries 
would not solve the problem by fetch the right data.

Here is the call stack of the issue:

{noformat}
2014-10-27 
10:11:58,495|beaver.machine|INFO|18218|140065036261120|MainThread|org.apache.hadoop.hbase.NotServingRegionException:
 org.apache.hadoop.hbase.NotServingRegionException: Region hbase:meta,,1 is not 
online on ip-172-31-0-48.ec2.internal,60020,1414403435009
2014-10-27 10:11:58,496|beaver.machine|INFO|18218|140065036261120|MainThread|at 
org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:2774)
2014-10-27 10:11:58,496|beaver.machine|INFO|18218|140065036261120|MainThread|at 
org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:4257)
2014-10-27 10:11:58,497|beaver.machine|INFO|18218|140065036261120|MainThread|at 
org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3156)
2014-10-27 10:11:58,497|beaver.machine|INFO|18218|140065036261120|MainThread|at 
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29994)
2014-10-27 10:11:58,498|beaver.machine|INFO|18218|140065036261120|MainThread|at 
org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2078)
2014-10-27 10:11:58,498|beaver.machine|INFO|18218|140065036261120|MainThread|at 
org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108)
2014-10-27 10:11:58,499|beaver.machine|INFO|18218|140065036261120|MainThread|at 
org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114)
2014-10-27 10:11:58,499|beaver.machine|INFO|18218|140065036261120|MainThread|at 
org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94)
2014-10-27 10:11:58,499|beaver.machine|INFO|18218|140065036261120|MainThread|at 
java.lang.Thread.run(Thread.java:745)
2014-10-27 10:11:58,500|beaver.machine|INFO|18218|140065036261120|MainThread|
2014-10-27 10:11:58,500|beaver.machine|INFO|18218|140065036261120|MainThread|at 
sun.reflect.GeneratedConstructorAccessor12.newInstance(Unknown Source)
2014-10-27 10:11:58,500|beaver.machine|INFO|18218|140065036261120|MainThread|at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
2014-10-27 10:11:58,501|beaver.machine|INFO|18218|140065036261120|MainThread|at 
java.lang.reflect.Constructor.newInstance(Constructor.java:526)
2014-10-27 10:11:58,501|beaver.machine|INFO|18218|140065036261120|MainThread|at 
org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
2014-10-27 10:11:58,502|beaver.machine|INFO|18218|140065036261120|MainThread|at 
org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95)
2014-10-27 10:11:58,502|beaver.machine|INFO|18218|140065036261120|MainThread|at 
org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:306)
2014-10-27 10:11:58,502|beaver.machine|INFO|18218|140065036261120|MainThread|at 
org.apache.hadoop.hbase.client.HBaseAdmin.deleteTable(HBaseAdmin.java:699)
2014-10-27 10:11:58,503|beaver.machine|INFO|18218|140065036261120|MainThread|at 
org.apache.hadoop.hbase.client.HBaseAdmin.deleteTable(HBaseAdmin.java:654)
2014-10-27 10:11:58,503|beaver.machine|INFO|18218|140065036261120|MainThread|at 
org.apache.hadoop.hbase.IntegrationTestManyRegions.tearDown(IntegrationTestManyRegions.java:99)
{noformat}

The META region was Online in RS1 when the delete table starts, it was moved to 
RS2 during the delete table operation.  And the problem appears.


  was:
This is the same issue that HBASE-10809 tried to address.  The fix of 
HBASE-10809 refetch the latest meta location in retry-loop.  However, there are 
2 problems: (1).  inside the retry loop, there is another try-catch block that 
would throw the exception before retry can kick in; (2). It looks like that 
HBaseAdmin::getFirstMetaServerForTable() always tries to get meta data from 
meta cache, which means if the meta cache is stale and out of date, retries 
would not solve the problem by fetch the right data.

Here is the call stack of the issue:

{noformat}
2014-10-27 
10:11:58,495|beaver.machine|INFO|18218|140065036261120|MainThread|org.apache.hadoop.hbase.NotServingRegionException:
 org.apache.hadoop.hbase.NotServingRegionException: Region hbase:meta,,1 is not 
online on ip-172-31-0-48.ec2.internal,60020,1414403435009
2014-10-27 10:11:58,496|beaver.machine|INFO|18218|140065036261120|MainThread|at 
org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:2774)
2014-10-27 10:11:58,496|beaver.machine|INFO|18218|140065036261120|MainThread|at 
org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:4257)
2014-10-27 10:11:58,497|beaver.machine|INFO|18218|140065036261120|MainThread|at 
org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3156)
2014-10-27 10:11:58,497|beaver.machine|INFO|18218|140065036261120|MainThread|at 
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29994)
2014-10-27 10:11:58,498|beaver.machine|INFO|18218|140065036261120|MainThread|at 
org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2078)
2014-10-27 10:11:58,498|beaver.machine|INFO|18218|140065036261120|MainThread|at 
org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108)
2014-10-27 10:11:58,499|beaver.machine|INFO|18218|140065036261120|MainThread|at 
org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114)
2014-10-27 10:11:58,499|beaver.machine|INFO|18218|140065036261120|MainThread|at 
org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94)
2014-10-27 10:11:58,499|beaver.machine|INFO|18218|140065036261120|MainThread|at 
java.lang.Thread.run(Thread.java:745)
2014-10-27 10:11:58,500|beaver.machine|INFO|18218|140065036261120|MainThread|
2014-10-27 10:11:58,500|beaver.machine|INFO|18218|140065036261120|MainThread|at 
sun.reflect.GeneratedConstructorAccessor12.newInstance(Unknown Source)
2014-10-27 10:11:58,500|beaver.machine|INFO|18218|140065036261120|MainThread|at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
2014-10-27 10:11:58,501|beaver.machine|INFO|18218|140065036261120|MainThread|at 
java.lang.reflect.Constructor.newInstance(Constructor.java:526)
2014-10-27 10:11:58,501|beaver.machine|INFO|18218|140065036261120|MainThread|at 
org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
2014-10-27 10:11:58,502|beaver.machine|INFO|18218|140065036261120|MainThread|at 
org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95)
2014-10-27 10:11:58,502|beaver.machine|INFO|18218|140065036261120|MainThread|at 
org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:306)
2014-10-27 10:11:58,502|beaver.machine|INFO|18218|140065036261120|MainThread|at 
org.apache.hadoop.hbase.client.HBaseAdmin.deleteTable(HBaseAdmin.java:699)
2014-10-27 10:11:58,503|beaver.machine|INFO|18218|140065036261120|MainThread|at 
org.apache.hadoop.hbase.client.HBaseAdmin.deleteTable(HBaseAdmin.java:654)
2014-10-27 10:11:58,503|beaver.machine|INFO|18218|140065036261120|MainThread|at 
org.apache.hadoop.hbase.IntegrationTestManyRegions.tearDown(IntegrationTestManyRegions.java:99)
{noformat}



> HBaseAdmin#deleteTable fails when META region is moved around the same 
> timeframe
> --------------------------------------------------------------------------------
>
>                 Key: HBASE-12377
>                 URL: https://issues.apache.org/jira/browse/HBASE-12377
>             Project: HBase
>          Issue Type: Bug
>          Components: Client
>    Affects Versions: 0.98.4
>            Reporter: Stephen Yuan Jiang
>            Assignee: Stephen Yuan Jiang
>             Fix For: 2.0.0, 0.98.8, 0.99.2
>
>
> This is the same issue that HBASE-10809 tried to address.  The fix of 
> HBASE-10809 refetch the latest meta location in retry-loop.  However, there 
> are 2 problems: (1).  inside the retry loop, there is another try-catch block 
> that would throw the exception before retry can kick in; (2). It looks like 
> that HBaseAdmin::getFirstMetaServerForTable() always tries to get meta data 
> from meta cache, which means if the meta cache is stale and out of date, 
> retries would not solve the problem by fetch the right data.
> Here is the call stack of the issue:
> {noformat}
> 2014-10-27 
> 10:11:58,495|beaver.machine|INFO|18218|140065036261120|MainThread|org.apache.hadoop.hbase.NotServingRegionException:
>  org.apache.hadoop.hbase.NotServingRegionException: Region hbase:meta,,1 is 
> not online on ip-172-31-0-48.ec2.internal,60020,1414403435009
> 2014-10-27 
> 10:11:58,496|beaver.machine|INFO|18218|140065036261120|MainThread|at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:2774)
> 2014-10-27 
> 10:11:58,496|beaver.machine|INFO|18218|140065036261120|MainThread|at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:4257)
> 2014-10-27 
> 10:11:58,497|beaver.machine|INFO|18218|140065036261120|MainThread|at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3156)
> 2014-10-27 
> 10:11:58,497|beaver.machine|INFO|18218|140065036261120|MainThread|at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29994)
> 2014-10-27 
> 10:11:58,498|beaver.machine|INFO|18218|140065036261120|MainThread|at 
> org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2078)
> 2014-10-27 
> 10:11:58,498|beaver.machine|INFO|18218|140065036261120|MainThread|at 
> org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108)
> 2014-10-27 
> 10:11:58,499|beaver.machine|INFO|18218|140065036261120|MainThread|at 
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114)
> 2014-10-27 
> 10:11:58,499|beaver.machine|INFO|18218|140065036261120|MainThread|at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94)
> 2014-10-27 
> 10:11:58,499|beaver.machine|INFO|18218|140065036261120|MainThread|at 
> java.lang.Thread.run(Thread.java:745)
> 2014-10-27 10:11:58,500|beaver.machine|INFO|18218|140065036261120|MainThread|
> 2014-10-27 
> 10:11:58,500|beaver.machine|INFO|18218|140065036261120|MainThread|at 
> sun.reflect.GeneratedConstructorAccessor12.newInstance(Unknown Source)
> 2014-10-27 
> 10:11:58,500|beaver.machine|INFO|18218|140065036261120|MainThread|at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> 2014-10-27 
> 10:11:58,501|beaver.machine|INFO|18218|140065036261120|MainThread|at 
> java.lang.reflect.Constructor.newInstance(Constructor.java:526)
> 2014-10-27 
> 10:11:58,501|beaver.machine|INFO|18218|140065036261120|MainThread|at 
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
> 2014-10-27 
> 10:11:58,502|beaver.machine|INFO|18218|140065036261120|MainThread|at 
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95)
> 2014-10-27 
> 10:11:58,502|beaver.machine|INFO|18218|140065036261120|MainThread|at 
> org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:306)
> 2014-10-27 
> 10:11:58,502|beaver.machine|INFO|18218|140065036261120|MainThread|at 
> org.apache.hadoop.hbase.client.HBaseAdmin.deleteTable(HBaseAdmin.java:699)
> 2014-10-27 
> 10:11:58,503|beaver.machine|INFO|18218|140065036261120|MainThread|at 
> org.apache.hadoop.hbase.client.HBaseAdmin.deleteTable(HBaseAdmin.java:654)
> 2014-10-27 
> 10:11:58,503|beaver.machine|INFO|18218|140065036261120|MainThread|at 
> org.apache.hadoop.hbase.IntegrationTestManyRegions.tearDown(IntegrationTestManyRegions.java:99)
> {noformat}
> The META region was Online in RS1 when the delete table starts, it was moved 
> to RS2 during the delete table operation.  And the problem appears.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to