David Manning created HBASE-23372:
-------------------------------------

             Summary: ZooKeeper Assignment can result in stale znodes in 
region-in-transition after table is dropped and hbck run
                 Key: HBASE-23372
                 URL: https://issues.apache.org/jira/browse/HBASE-23372
             Project: HBase
          Issue Type: Bug
          Components: hbck, master, Region Assignment, Zookeeper
    Affects Versions: 1.3.2
            Reporter: David Manning


It is possible for znodes under /hbase/region-in-transition to remain long 
after a table is deleted. There does not appear to be any cleanup logic for 
these.

The details are a little fuzzy, but it seems to be fallout from HBASE-22617. 
Incidents related to that bug involved regions stuck in transition, and use of 
hbck to fix clusters. There was a temporary table created and deleted once per 
day, but somehow it led to receiving 
{{FSLimitException$MaxDirectoryItemsExceededException}} and regions stuck in 
transition. Even weeks after fixing the bug and upgrading the cluster, the 
znodes remain under /hbase/region-in-transition. In the most impacted cluster, 
{{hbase zkcli ls /hbase/region-in-transition | wc -w}} returns almost 100,000 
entries. This causes very slow region transition times (often 80 seconds), 
likely due to enumerating all these entries when zk watch on this node is 
triggered.

Log lines for slow region transitions:
{code:java}
2019-12-05 07:02:14,714 DEBUG [K.Worker-pool3-t7344] master.AssignmentManager - 
Handling RS_ZK_REGION_CLOSED, server=<<SERVERNAME>>, region=<<REGION_HASH>>, 
which is more than 15 seconds late, current_state={<<REGION_HASH>> 
state=PENDING_CLOSE, ts=1575529254635, server=<<SERVERNAME>>}
{code}
Even during hmaster failover, entries are not cleaned, but the following log 
lines can be seen:
{code:java}
2019-11-27 00:26:27,044 WARN  [.activeMasterManager] master.AssignmentManager - 
Couldn't find the region in recovering region=<<DELETED_TABLE_REGION>>, 
state=RS_ZK_REGION_FAILED_OPEN, servername=<<SERVERNAME>>, 
createTime=1565603905404, payload.length=0
{code}
Possible solutions:
 # Logic to parse the RIT znode during master failover which sees if the table 
exists. Clean up entries for nonexistent tables.
 # New mode for hbck to do cleanup of nonexistent regions under the znode.
 # Others?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to