[
https://issues.apache.org/jira/browse/HBASE-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14382993#comment-14382993
]
Mikhail Antonov commented on HBASE-9738:
----------------------------------------
Interesting, as I look at HMaster#balance(), we don't check for table being in
{disabled, disabling} state when computing plans at all, and in
AssignmentManager#balance we first check table state (without grabbing any
lock) and don't actually execute plans for known disabled tables, then we only
grab region lock (locker.acquireLock(encodedName);), but doesn't seem we're
getting TableLock here.
So I guess the desired behavior is that we don't allow users to disable table
if some of its regions are being balanced?
Disabling table requires writeLock, so getting readLock under balancer would be
right (not too coarse-grained?).
Alternatively, we could probably add new table state ("BALANCING"), but that
seems overkill.
> Delete table and loadbalancer interference
> ------------------------------------------
>
> Key: HBASE-9738
> URL: https://issues.apache.org/jira/browse/HBASE-9738
> Project: HBase
> Issue Type: Bug
> Reporter: Devaraj Das
> Priority: Critical
> Fix For: 2.0.0, 1.1.0
>
>
> I have noticed that when the balancer is computing a plan for region moves,
> and a delete table is issued, there is some interference.
> 1. At time t1, user deleted the table.
> 2. This led to the master updating the meta table to remove the line for the
> regioninfo for a region f2a9e2e9d70894c03f54ee5902bebee6.
> {noformat}
> 2013-10-04 08:42:52,495 INFO [MASTER_TABLE_OPERATIONS-hor15n05:60000-0]
> catalog.MetaEditor: Deleted [{ENCODED => f2a9e2e9d70894c03f54ee5902bebee6,
> NAME => 'usertable,,1380876170581.f2a9e2e9d70894c03f54ee5902bebee6.',
> STARTKEY => '', ENDKEY => ''}]
> {noformat}
> 3. However around the same time, the balancer kicked in, and reassigned the
> region and made it online somewhere. It didn't check the fact (nor anyone
> else did) that the table was indeed deleted.
> {noformat}
> 2013-10-04 08:42:53,215 INFO
> [hor15n05.gq1.ygridcore.net,60000,1380869262259-BalancerChore]
> master.HMaster: balance
> hri=usertable,,1380876170581.f2a9e2e9d70894c03f54ee5902bebee6.,
> src=hor15n09.gq1.ygridcore.net,60020,1380869263722,
> dest=hor15n11.gq1.ygridcore.net,60020,1380869263682
> {noformat}
> .....
> {noformat}
> 2013-10-04 08:42:53,592 INFO [AM.ZK.Worker-pool2-t829] master.RegionStates:
> Onlined f2a9e2e9d70894c03f54ee5902bebee6 on
> hor15n11.gq1.ygridcore.net,60020,1380869263682
> {noformat}
> 4. Henceforth, all the drop tables started giving warnings like
> {noformat}
> 2013-10-04 08:45:17,587 INFO [RpcServer.handler=8,port=60000]
> master.HMaster: Client=hrt_qa//68.142.246.151 delete usertable
> 2013-10-04 08:45:17,631 DEBUG [RpcServer.handler=8,port=60000]
> lock.ZKInterProcessLockBase: Acquired a lock for
> /hbase/table-lock/usertable/write-master:600000000000000
> 2013-10-04 08:45:17,637 WARN [RpcServer.handler=8,port=60000]
> catalog.MetaReader: No serialized HRegionInfo in
> keyvalues={usertable,,1380876170581.f2a9e2e9d70894c03f54ee5902bebee6./info:seqnumDuringOpen/1380876173509/Put/vlen=8/mvcc=0,
>
> usertable,,1380876170581.f2a9e2e9d70894c03f54ee5902bebee6./info:server/1380876173509/Put/vlen=32/mvcc=0,
>
> usertable,,1380876170581.f2a9e2e9d70894c03f54ee5902bebee6./info:serverstartcode/1380876173509/Put/vlen=8/mvcc=0}
> {noformat}
> 5. The create of the same table also fails since there is still state
> (reincarnated, maybe) about the table in the master.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)