[
https://issues.apache.org/jira/browse/HBASE-28271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Viraj Jasani updated HBASE-28271:
---------------------------------
Fix Version/s: 2.6.0
(was: 2.4.18)
(was: 2.6.1)
> Infinite waiting on lock acquisition by snapshot can result in unresponsive
> master
> ----------------------------------------------------------------------------------
>
> Key: HBASE-28271
> URL: https://issues.apache.org/jira/browse/HBASE-28271
> Project: HBase
> Issue Type: Improvement
> Affects Versions: 3.0.0-alpha-4, 2.4.17, 2.5.7
> Reporter: Viraj Jasani
> Assignee: Viraj Jasani
> Priority: Major
> Fix For: 2.6.0, 2.5.8, 3.0.0-beta-2
>
> Attachments: image.png
>
>
> When a region is stuck in transition for significant time, any attempt to
> take snapshot on the table would keep master handler thread in forever
> waiting state. As part of the creating snapshot on enabled or disabled table,
> in order to get the table level lock, LockProcedure is executed but if any
> region of the table is in transition, LockProcedure could not be executed by
> the snapshot handler, resulting in forever waiting until the region
> transition is completed, allowing the table level lock to be acquired by the
> snapshot handler.
> In cases where a region stays in RIT for considerable time, if enough
> attempts are made by the client to create snapshots on the table, it can
> easily exhaust all handler threads, leading to potentially unresponsive
> master. Attached a sample thread dump.
> Proposal: The snapshot handler should not stay stuck forever if it cannot
> take table level lock, it should fail-fast.
> !image.png!
--
This message was sent by Atlassian Jira
(v8.20.10#820010)