[
https://issues.apache.org/jira/browse/HBASE-10595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Feng Honghua updated HBASE-10595:
---------------------------------
Attachment: HBASE-10595-trunk_v2.patch
New patch 'fixing' previously failed TestMasterObserver case
The cause of the failure for TestMasterObserver is similar as
TestAssignmentManagerOnCluster#testMoveRegionOfDeletedTable :
HBaseAdmin.deleteTable is 'synchronous' to client in that it returns after it
ensures table descriptor can't be retrieved back from master after asking
master to delete a table. But DeleteTableHandler is processed asynchronously in
master, and things such as 'clearing table descriptor cache', 'removing regions
from RegionStates' and 'calling all coprocessors' postDeleteTableHandler' are
all done *after* removing the table dir (it's 'removing table dir' now that
makes client can't get table descriptor and believe the table is deleted after
this patch, not from table descriptor cache).
Before this patch, the client can still get a valid table descriptor after
master removes the table dir(first rename, then remove all region data dirs and
finally remove table dir) until the table descriptor is removed from the table
descriptor cache. But after this patch, client can't get table descriptor once
master renames the table dir, so it makes the cases which assume "regions are
removed from RegionStates" or "coprocessors' postDeleteTableHandler are called"
much more possible to fail since now it takes longer from "client can't get
table descriptor" to "regions are removed from RegionStates" / "coprocessors'
postDeleteTableHandler are called", and the code assuming such things fail when
executed immediately after HBaseAdmin.deleteTable().
In short, we can't assume "regions are removed from RegionStates" or
"coprocessors' postDeleteTableHandler are called" after
HBaseAdmin.deleteTable() returns, though HBaseAdmin.deleteTable() is seemingly
synchronous.
> HBaseAdmin.getTableDescriptor can wrongly get the previous table's
> TableDescriptor even after the table dir in hdfs is removed
> ------------------------------------------------------------------------------------------------------------------------------
>
> Key: HBASE-10595
> URL: https://issues.apache.org/jira/browse/HBASE-10595
> Project: HBase
> Issue Type: Bug
> Components: master, util
> Reporter: Feng Honghua
> Assignee: Feng Honghua
> Attachments: HBASE-10595-trunk_v1.patch, HBASE-10595-trunk_v2.patch
>
>
> When a table dir (in hdfs) is removed(by outside), HMaster will still return
> the cached TableDescriptor to client for getTableDescriptor request.
> On the contrary, HBaseAdmin.listTables() is handled correctly in current
> implementation, for a table whose table dir in hdfs is removed by outside,
> getTableDescriptor can still retrieve back a valid (old) table descriptor,
> while listTables says it doesn't exist, this is inconsistent
> The reason for this bug is because HMaster (via FSTableDescriptors) doesn't
> check if the table dir exists for getTableDescriptor() request, (while it
> lists all existing table dirs(not firstly respects cache) and returns
> accordingly for listTables() request)
> When a table is deleted via deleteTable, the cache will be cleared after the
> table dir and tableInfo file is removed, listTables/getTableDescriptor
> inconsistency should be transient(though still exists, when table dir is
> removed while cache is not cleared) and harder to expose
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)