[jira] [Comment Edited] (HBASE-22460) Reopen a region if store reader references may have leaked

Viraj Jasani (Jira) Fri, 23 Aug 2019 05:00:16 -0700


    [ 
https://issues.apache.org/jira/browse/HBASE-22460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16914058#comment-16914058
 ]


Viraj Jasani edited comment on HBASE-22460 at 8/23/19 11:59 AM:
----------------------------------------------------------------

According to me, a couple of approaches to achieve reopen of a region with very 
high refCount:
 # We can have RegionServer background thread looking into refCount of all 
regions hosted on that server and if something looks abnormal(configurable), 
RegionServer itself should close the region and open it immediately. This way 
HMaster, AssignmentManager and Reopen region procedure don't get involved since 
it is quite immediate reopen of a region using close followed by open of 
region. (CloseRegionHandler & OpenRegionHandler)
 # We can have HMaster thread looking into refCount of all regions through each 
server metrics when it is reported to HMaster by individual 
RegionServer(regionServerReport: within the scope of this or create new report 
may be) and let HMaster take care of region reopen for region with abnormal 
refCount. In this case, we can reuse some part of ReopenTableRegionsProcedure 
and AssignmentManager will get involved for the entire state management. This 
might not be as quick as RS doing it but might be preferred due to state 
management? (RS → Metrics → HMaster → ReopenRegion using procedure).

I believe 1st approach might be better since it is RegionServer who can take 
care of regions hosted on itself and it is fast and no movement of region 
involved, but 2nd might have advantage of state management?

Requesting your opinions and please let me know if I am missing something. 

[~apurtell] [~busbey] [~Apache9] [~anoop.hbase] [~openinx]  [~stack] 
[~psomogyi] [~reidchan] @Watchers

 


was (Author: vjasani):
According to me, a couple of approaches to achieve reopen of a region with very 
high refCount:
 # We can have RegionServer background thread looking into refCount of all 
regions hosted on that server and if something looks abnormal(configurable), 
RegionServer itself should close the region and open it immediately. This way 
HMaster, AssignmentManager and Reopen region procedure don't get involved since 
it is quite immediate reopen of a region using close followed by open of region.
 # We can have HMaster thread looking into refCount of all regions through each 
server metrics when it is reported to HMaster by individual 
RegionServer(regionServerReport: within the scope of this or create new report 
may be) and let HMaster take care of region reopen for region with abnormal 
refCount. In this case, we can reuse some part of ReopenTableRegionsProcedure 
and AssignmentManager will get involved for the entire state management. This 
might not be as quick as RS doing it but might be preferred due to state 
management? (RS → Metrics → HMaster → ReopenRegion using procedure).

I believe 1st approach might be better since it is RegionServer who can take 
care of regions hosted on itself and it is fast and no movement of region 
involved, but 2nd might have advantage of state management?

Requesting your opinions and please let me know if I am missing something. 

[~apurtell] [~busbey] [~Apache9] [~anoop.hbase] [~openinx]  [~stack] 
[~psomogyi] [~reidchan] @Watchers

 

> Reopen a region if store reader references may have leaked
> ----------------------------------------------------------
>
>                 Key: HBASE-22460
>                 URL: https://issues.apache.org/jira/browse/HBASE-22460
>             Project: HBase
>          Issue Type: Sub-task
>            Reporter: Andrew Purtell
>            Assignee: Viraj Jasani
>            Priority: Minor
>
> We can leak store reader references if a coprocessor or core function somehow 
> opens a scanner, or wraps one, and then does not take care to call close on 
> the scanner or the wrapped instance. A reasonable mitigation for a reader 
> reference leak would be a fast reopen of the region on the same server 
> (initiated by the RS) This will release all resources, like the refcount, 
> leases, etc. The clients should gracefully ride over this like any other 
> region transition. This reopen would be like what is done during schema 
> change application and ideally would reuse the relevant code. If the refcount 
> is over some ridiculous threshold this mitigation could be triggered along 
> with a fat WARN in the logs. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Comment Edited] (HBASE-22460) Reopen a region if store reader references may have leaked

Reply via email to