[ 
https://issues.apache.org/jira/browse/HBASE-25212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17219261#comment-17219261
 ] 

Andrew Kyle Purtell commented on HBASE-25212:
---------------------------------------------

After upgrading to Java 11 realized I was just missing a change to HTU. Never 
mind.

> Optionally abort requests in progress after deciding a region should close
> --------------------------------------------------------------------------
>
>                 Key: HBASE-25212
>                 URL: https://issues.apache.org/jira/browse/HBASE-25212
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>            Reporter: Andrew Kyle Purtell
>            Assignee: Andrew Kyle Purtell
>            Priority: Major
>             Fix For: 3.0.0-alpha-1, 1.7.0, 2.4.0
>
>
> After deciding a region should be closed, the regionserver will set the 
> internal region state to closing and wait for all pending requests to 
> complete, via a rendezvous on the region lock. In closing state the region 
> will not accept any new requests but requests in progress will be allowed to 
> complete before the close action takes place. In our production we see 
> outlier wait times on this lock in excess of several minutes. 
> During close when there are requests in flight the regionserver is subject to 
> any conceivable reason for delay, like full scans over large regions, 
> expensive filtering hierarchies, bugs, or store level performance problems 
> like slow HDFS. The regionserver should interrupt requests in progress to 
> facilitate smaller/shorter close times on an opt-in basis.
> Optionally, via configuration parameter -- which would be a system wide 
> default set in hbase-site.xml in common practice but could be overridden in 
> table schema for per table settings -- interrupt requests in progress holding 
> the region lock rather than wait for completion of all operations in flight. 
> Send back NotServingRegionException("region is closing") to the clients of 
> the interrupted operations, like we do after the write lock is acquired. The 
> client will transparently relocate the region data and resubmit the aborted 
> requests per normal retry policy. This can be less disruptive than waiting 
> for very long times for a region to close in extreme outlier cases (e.g. 50 
> minutes). In such extreme cases it is better to abort the regionserver if the 
> close lock cannot be acquired in a reasonable amount of time, because the 
> region cannot be made available again until it has closed.
> After waiting for all requests to complete then we flush the region's 
> memstore and finish the close. The flush portion of the close process is out 
> of scope of this proposal. Under normal conditions the flush portion of the 
> close completes quickly. It is specifically waits on the close lock that has 
> been an occasional issue in our production that causes difficulty achieving 
> 99.99% availability.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to