[
https://issues.apache.org/jira/browse/HBASE-22460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16914058#comment-16914058
]
Viraj Jasani edited comment on HBASE-22460 at 8/23/19 11:59 AM:
----------------------------------------------------------------
According to me, a couple of approaches to achieve reopen of a region with very
high refCount:
# We can have RegionServer background thread looking into refCount of all
regions hosted on that server and if something looks abnormal(configurable),
RegionServer itself should close the region and open it immediately. This way
HMaster, AssignmentManager and Reopen region procedure don't get involved since
it is quite immediate reopen of a region using close followed by open of
region. (CloseRegionHandler & OpenRegionHandler)
# We can have HMaster thread looking into refCount of all regions through each
server metrics when it is reported to HMaster by individual
RegionServer(regionServerReport: within the scope of this or create new report
may be) and let HMaster take care of region reopen for region with abnormal
refCount. In this case, we can reuse some part of ReopenTableRegionsProcedure
and AssignmentManager will get involved for the entire state management. This
might not be as quick as RS doing it but might be preferred due to state
management? (RS → Metrics → HMaster → ReopenRegion using procedure).
I believe 1st approach might be better since it is RegionServer who can take
care of regions hosted on itself and it is fast and no movement of region
involved, but 2nd might have advantage of state management?
Requesting your opinions and please let me know if I am missing something.
[~apurtell] [~busbey] [~Apache9] [~anoop.hbase] [~openinx] [~stack]
[~psomogyi] [~reidchan] @Watchers
was (Author: vjasani):
According to me, a couple of approaches to achieve reopen of a region with very
high refCount:
# We can have RegionServer background thread looking into refCount of all
regions hosted on that server and if something looks abnormal(configurable),
RegionServer itself should close the region and open it immediately. This way
HMaster, AssignmentManager and Reopen region procedure don't get involved since
it is quite immediate reopen of a region using close followed by open of region.
# We can have HMaster thread looking into refCount of all regions through each
server metrics when it is reported to HMaster by individual
RegionServer(regionServerReport: within the scope of this or create new report
may be) and let HMaster take care of region reopen for region with abnormal
refCount. In this case, we can reuse some part of ReopenTableRegionsProcedure
and AssignmentManager will get involved for the entire state management. This
might not be as quick as RS doing it but might be preferred due to state
management? (RS → Metrics → HMaster → ReopenRegion using procedure).
I believe 1st approach might be better since it is RegionServer who can take
care of regions hosted on itself and it is fast and no movement of region
involved, but 2nd might have advantage of state management?
Requesting your opinions and please let me know if I am missing something.
[~apurtell] [~busbey] [~Apache9] [~anoop.hbase] [~openinx] [~stack]
[~psomogyi] [~reidchan] @Watchers
> Reopen a region if store reader references may have leaked
> ----------------------------------------------------------
>
> Key: HBASE-22460
> URL: https://issues.apache.org/jira/browse/HBASE-22460
> Project: HBase
> Issue Type: Sub-task
> Reporter: Andrew Purtell
> Assignee: Viraj Jasani
> Priority: Minor
>
> We can leak store reader references if a coprocessor or core function somehow
> opens a scanner, or wraps one, and then does not take care to call close on
> the scanner or the wrapped instance. A reasonable mitigation for a reader
> reference leak would be a fast reopen of the region on the same server
> (initiated by the RS) This will release all resources, like the refcount,
> leases, etc. The clients should gracefully ride over this like any other
> region transition. This reopen would be like what is done during schema
> change application and ideally would reuse the relevant code. If the refcount
> is over some ridiculous threshold this mitigation could be triggered along
> with a fat WARN in the logs.
--
This message was sent by Atlassian Jira
(v8.3.2#803003)