[ 
https://issues.apache.org/jira/browse/HBASE-27951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell updated HBASE-27951:
----------------------------------------
    Summary: Use ADMIN_QOS in MasterRpcServices for regionServerStartup and 
regionServerReport  (was: Use ADMIN_QOS in MasterRpcServices for 
regionServerReport)

> Use ADMIN_QOS in MasterRpcServices for regionServerStartup and 
> regionServerReport
> ---------------------------------------------------------------------------------
>
>                 Key: HBASE-27951
>                 URL: https://issues.apache.org/jira/browse/HBASE-27951
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 2.4.10
>            Reporter: Andrew Kyle Purtell
>            Assignee: Andrew Kyle Purtell
>            Priority: Major
>             Fix For: 2.6.0, 2.4.18, 2.5.6, 3.0.0-beta-1
>
>
> Analysis of a recent production incident is not yet complete but an item of 
> note is an apparent deadlock. Imagine you are gracefully draining a 
> regionserver by way of a flurry of moveRegion requests. The handler for 
> moveRegion submits a TRSP and then waits on its future without timeout. 
> Imagine that there are sufficient number of moveRegion requests to tie up the 
> normal priority master RPC pool. Now imagine that all of those requests are 
> waiting on TRSPs pending on a regionserver that is concurrently bounced or 
> maybe it fails. The TRSPs are blocked in REGION_STATE_TRANSITION_CLOSE 
> because the target regionserver terminated before responding to the close 
> requests, blocking the moveRegion requests, blocking the RPC handlers. The 
> regionserver restarts and tries to check in, but cannot report to the master 
> because regionServerReport is a normal priority admin RPC and there are no 
> free normal priority handlers to handle it. It seems not correct to have 
> regionServerReport, which is important, contending with normal priority 
> requests.
> regionServerReport should be made ADMIN_QOS to avoid this case. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to