[ 
https://issues.apache.org/jira/browse/HBASE-22460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16914843#comment-16914843
 ] 

Viraj Jasani edited comment on HBASE-22460 at 8/24/19 10:52 AM:
----------------------------------------------------------------

{quote}Doing #1 violates Master being in charge of assign. Doing in Master is 
also the more frugal choice. #1 requires every RS running a monitoring thread. 
Instead we can run one task in Master for whole cluster to look at refcounts 
and it then does the reopen and no chance of it being surprised by 
self-ordained RS reopen.
{quote}
Sure if HMaster is preferred as initiator, good to go with #2.

 
{quote}The master can still be notified the region has closed, and then 
notified again when it has reopened. There may need to be master side changes 
to accommodate this, true.
{quote}
In this case, probably good to go with #2? Although I think no of RPC calls 
might remain same in both cases(#1: each RS->HM and #2: HM->each RS) but since 
#2 involves AM and State machine procedures (with rollback feature etc), may be 
good to pursue #2?

 
{quote}I suppose a hack an operator can do is watch ref count metrics and if 
judged to be indicative of a leak, could alter the table schema
{quote}
The only catch is it will reopen all regions of the table, not just the desired 
one.

 


was (Author: vjasani):
{quote}Doing #1 violates Master being in charge of assign. Doing in Master is 
also the more frugal choice. #1 requires every RS running a monitoring thread. 
Instead we can run one task in Master for whole cluster to look at refcounts 
and it then does the reopen and no chance of it being surprised by 
self-ordained RS reopen.
{quote}
Sure if HMaster is preferred as initiator, good to go with #2.

 
{quote}The master can still be notified the region has closed, and then 
notified again when it has reopened. There may need to be master side changes 
to accommodate this, true.
{quote}
In this case, probably good to go with #2? Although I think no of RPC calls 
might remain same in both cases(#1: each RS->HM and #2: HM->each RS) but since 
#2 involves AM and State machine procedures (with rollback feature etc), may be 
good to pursue #2?

 
{quote}I suppose a hack an operator can do is watch ref count metrics and if 
judged to be indicative of a leak, could alter the table schema
{quote}
The only catch is it will reopen all regions of the table, not the specific one.

 

> Reopen a region if store reader references may have leaked
> ----------------------------------------------------------
>
>                 Key: HBASE-22460
>                 URL: https://issues.apache.org/jira/browse/HBASE-22460
>             Project: HBase
>          Issue Type: Sub-task
>            Reporter: Andrew Purtell
>            Assignee: Viraj Jasani
>            Priority: Minor
>
> We can leak store reader references if a coprocessor or core function somehow 
> opens a scanner, or wraps one, and then does not take care to call close on 
> the scanner or the wrapped instance. A reasonable mitigation for a reader 
> reference leak would be a fast reopen of the region on the same server 
> (initiated by the RS) This will release all resources, like the refcount, 
> leases, etc. The clients should gracefully ride over this like any other 
> region transition. This reopen would be like what is done during schema 
> change application and ideally would reuse the relevant code. If the refcount 
> is over some ridiculous threshold this mitigation could be triggered along 
> with a fat WARN in the logs. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

Reply via email to