[jira] [Commented] (HBASE-12377) HBaseAdmin#deleteTable fails when META region is moved around the same timeframe

Enis Soztutar (JIRA) Wed, 29 Oct 2014 14:24:07 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-12377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14189017#comment-14189017
 ]


Enis Soztutar commented on HBASE-12377:
---------------------------------------

I think HBASE-12072 is related since it unifies some paths so that we are using 
the "regular" rpc retrying mechanism instead of custom build ones inside 
HBaseAdmin. 

For this issue, the problem is that HBaseAdmin.deleteTable() does not use the 
regular scan rpc code (which handles retrying / meta cache, etc correctly) but 
instead does reinvent the stuff in a broken way. 

Another issue with this is that all this logic is in client side vs it should 
have been in the master side, but that is a different and much more involved 
issue.   

Can we do the patch so that it uses MetaReader or MetaScanner to obtain the 
list of regions for the table in the retry loop? 

> HBaseAdmin#deleteTable fails when META region is moved around the same 
> timeframe
> --------------------------------------------------------------------------------
>
>                 Key: HBASE-12377
>                 URL: https://issues.apache.org/jira/browse/HBASE-12377
>             Project: HBase
>          Issue Type: Bug
>          Components: Client
>    Affects Versions: 0.98.4
>            Reporter: Stephen Yuan Jiang
>            Assignee: Stephen Yuan Jiang
>             Fix For: 2.0.0, 0.98.8, 0.99.2
>
>
> This is the same issue that HBASE-10809 tried to address.  The fix of 
> HBASE-10809 refetch the latest meta location in retry-loop.  However, there 
> are 2 problems: (1).  inside the retry loop, there is another try-catch block 
> that would throw the exception before retry can kick in; (2). It looks like 
> that HBaseAdmin::getFirstMetaServerForTable() always tries to get meta data 
> from meta cache, which means if the meta cache is stale and out of date, 
> retries would not solve the problem by fetch the right data.
> Here is the call stack of the issue:
> {noformat}
> 2014-10-27 
> 10:11:58,495|beaver.machine|INFO|18218|140065036261120|MainThread|org.apache.hadoop.hbase.NotServingRegionException:
>  org.apache.hadoop.hbase.NotServingRegionException: Region hbase:meta,,1 is 
> not online on ip-172-31-0-48.ec2.internal,60020,1414403435009
> 2014-10-27 
> 10:11:58,496|beaver.machine|INFO|18218|140065036261120|MainThread|at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:2774)
> 2014-10-27 
> 10:11:58,496|beaver.machine|INFO|18218|140065036261120|MainThread|at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:4257)
> 2014-10-27 
> 10:11:58,497|beaver.machine|INFO|18218|140065036261120|MainThread|at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3156)
> 2014-10-27 
> 10:11:58,497|beaver.machine|INFO|18218|140065036261120|MainThread|at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29994)
> 2014-10-27 
> 10:11:58,498|beaver.machine|INFO|18218|140065036261120|MainThread|at 
> org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2078)
> 2014-10-27 
> 10:11:58,498|beaver.machine|INFO|18218|140065036261120|MainThread|at 
> org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108)
> 2014-10-27 
> 10:11:58,499|beaver.machine|INFO|18218|140065036261120|MainThread|at 
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114)
> 2014-10-27 
> 10:11:58,499|beaver.machine|INFO|18218|140065036261120|MainThread|at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94)
> 2014-10-27 
> 10:11:58,499|beaver.machine|INFO|18218|140065036261120|MainThread|at 
> java.lang.Thread.run(Thread.java:745)
> 2014-10-27 10:11:58,500|beaver.machine|INFO|18218|140065036261120|MainThread|
> 2014-10-27 
> 10:11:58,500|beaver.machine|INFO|18218|140065036261120|MainThread|at 
> sun.reflect.GeneratedConstructorAccessor12.newInstance(Unknown Source)
> 2014-10-27 
> 10:11:58,500|beaver.machine|INFO|18218|140065036261120|MainThread|at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> 2014-10-27 
> 10:11:58,501|beaver.machine|INFO|18218|140065036261120|MainThread|at 
> java.lang.reflect.Constructor.newInstance(Constructor.java:526)
> 2014-10-27 
> 10:11:58,501|beaver.machine|INFO|18218|140065036261120|MainThread|at 
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
> 2014-10-27 
> 10:11:58,502|beaver.machine|INFO|18218|140065036261120|MainThread|at 
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95)
> 2014-10-27 
> 10:11:58,502|beaver.machine|INFO|18218|140065036261120|MainThread|at 
> org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:306)
> 2014-10-27 
> 10:11:58,502|beaver.machine|INFO|18218|140065036261120|MainThread|at 
> org.apache.hadoop.hbase.client.HBaseAdmin.deleteTable(HBaseAdmin.java:699)
> 2014-10-27 
> 10:11:58,503|beaver.machine|INFO|18218|140065036261120|MainThread|at 
> org.apache.hadoop.hbase.client.HBaseAdmin.deleteTable(HBaseAdmin.java:654)
> 2014-10-27 
> 10:11:58,503|beaver.machine|INFO|18218|140065036261120|MainThread|at 
> org.apache.hadoop.hbase.IntegrationTestManyRegions.tearDown(IntegrationTestManyRegions.java:99)
> {noformat}
> The META region was Online in RS1 when the delete table starts, it was moved 
> to RS2 during the delete table operation.  And the problem appears.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-12377) HBaseAdmin#deleteTable fails when META region is moved around the same timeframe

Reply via email to