[jira] [Updated] (HBASE-12377) HBaseAdmin#deleteTable fails when META region is moved around the same timeframe

Stephen Yuan Jiang (JIRA) Thu, 30 Oct 2014 22:36:57 -0700

     [ 
https://issues.apache.org/jira/browse/HBASE-12377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Stephen Yuan Jiang updated HBASE-12377:
---------------------------------------
    Attachment: HBASE-12377.v3-2.0.patch

The V3 patch is attached. This is based on Enis's feedback to make the aux 
function getTableDescriptorByTableName private, as HBaseAdmin.deleteTable is 
the only caller at this time & we have no plan to have other callers.

Enis' other comment about "this new method which returns null instead will be 
confusing to users.".  This is not the concern, because the existing function 
getTableDescriptorsByTableName we use already return null - and caller is 
internal, therefore, I did not do any update for this comment.


The description of the patch: 
"
The HBaseAdmin.deleteTable implemented its own logic to obtain all regions of a 
table, this is not robust (a few issues found from the logic, NotServingRegion 
exception was not handled for retry; stale meta cache was used, etc.)
The change in this patch is to use proven MetaScanner.listTableRegionLocations 
method to find all non-archived regions of a table. And the patch also 
simplifies the code in the function to make it more maintainable (eg. using 
HConnection.getHTableDescriptorsByTableName instead of duping same logic in the 
function).
"

> HBaseAdmin#deleteTable fails when META region is moved around the same 
> timeframe
> --------------------------------------------------------------------------------
>
>                 Key: HBASE-12377
>                 URL: https://issues.apache.org/jira/browse/HBASE-12377
>             Project: HBase
>          Issue Type: Bug
>          Components: Client
>    Affects Versions: 0.98.4
>            Reporter: Stephen Yuan Jiang
>            Assignee: Stephen Yuan Jiang
>             Fix For: 2.0.0, 0.98.8, 0.99.2
>
>         Attachments: HBASE-12377.v1-2.0.patch, HBASE-12377.v2-2.0.patch, 
> HBASE-12377.v3-2.0.patch
>
>
> This is the same issue that HBASE-10809 tried to address.  The fix of 
> HBASE-10809 refetch the latest meta location in retry-loop.  However, there 
> are 2 problems: (1).  inside the retry loop, there is another try-catch block 
> that would throw the exception before retry can kick in; (2). It looks like 
> that HBaseAdmin::getFirstMetaServerForTable() always tries to get meta data 
> from meta cache, which means if the meta cache is stale and out of date, 
> retries would not solve the problem by fetching from the stale meta cache.
> Here is the call stack of the issue:
> {noformat}
> 2014-10-27 
> 10:11:58,495|beaver.machine|INFO|18218|140065036261120|MainThread|org.apache.hadoop.hbase.NotServingRegionException:
>  org.apache.hadoop.hbase.NotServingRegionException: Region hbase:meta,,1 is 
> not online on ip-172-31-0-48.ec2.internal,60020,1414403435009
> 2014-10-27 
> 10:11:58,496|beaver.machine|INFO|18218|140065036261120|MainThread|at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:2774)
> 2014-10-27 
> 10:11:58,496|beaver.machine|INFO|18218|140065036261120|MainThread|at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:4257)
> 2014-10-27 
> 10:11:58,497|beaver.machine|INFO|18218|140065036261120|MainThread|at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3156)
> 2014-10-27 
> 10:11:58,497|beaver.machine|INFO|18218|140065036261120|MainThread|at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29994)
> 2014-10-27 
> 10:11:58,498|beaver.machine|INFO|18218|140065036261120|MainThread|at 
> org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2078)
> 2014-10-27 
> 10:11:58,498|beaver.machine|INFO|18218|140065036261120|MainThread|at 
> org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108)
> 2014-10-27 
> 10:11:58,499|beaver.machine|INFO|18218|140065036261120|MainThread|at 
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114)
> 2014-10-27 
> 10:11:58,499|beaver.machine|INFO|18218|140065036261120|MainThread|at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94)
> 2014-10-27 
> 10:11:58,499|beaver.machine|INFO|18218|140065036261120|MainThread|at 
> java.lang.Thread.run(Thread.java:745)
> 2014-10-27 10:11:58,500|beaver.machine|INFO|18218|140065036261120|MainThread|
> 2014-10-27 
> 10:11:58,500|beaver.machine|INFO|18218|140065036261120|MainThread|at 
> sun.reflect.GeneratedConstructorAccessor12.newInstance(Unknown Source)
> 2014-10-27 
> 10:11:58,500|beaver.machine|INFO|18218|140065036261120|MainThread|at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> 2014-10-27 
> 10:11:58,501|beaver.machine|INFO|18218|140065036261120|MainThread|at 
> java.lang.reflect.Constructor.newInstance(Constructor.java:526)
> 2014-10-27 
> 10:11:58,501|beaver.machine|INFO|18218|140065036261120|MainThread|at 
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
> 2014-10-27 
> 10:11:58,502|beaver.machine|INFO|18218|140065036261120|MainThread|at 
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95)
> 2014-10-27 
> 10:11:58,502|beaver.machine|INFO|18218|140065036261120|MainThread|at 
> org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:306)
> 2014-10-27 
> 10:11:58,502|beaver.machine|INFO|18218|140065036261120|MainThread|at 
> org.apache.hadoop.hbase.client.HBaseAdmin.deleteTable(HBaseAdmin.java:699)
> 2014-10-27 
> 10:11:58,503|beaver.machine|INFO|18218|140065036261120|MainThread|at 
> org.apache.hadoop.hbase.client.HBaseAdmin.deleteTable(HBaseAdmin.java:654)
> 2014-10-27 
> 10:11:58,503|beaver.machine|INFO|18218|140065036261120|MainThread|at 
> org.apache.hadoop.hbase.IntegrationTestManyRegions.tearDown(IntegrationTestManyRegions.java:99)
> {noformat}
> The META region was Online in RS1 when the delete table starts, it was moved 
> to RS2 during the delete table operation.  And the problem appears.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-12377) HBaseAdmin#deleteTable fails when META region is moved around the same timeframe

Reply via email to