[ 
https://issues.apache.org/jira/browse/HBASE-22457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16990932#comment-16990932
 ] 

Viraj Jasani commented on HBASE-22457:
--------------------------------------

{quote} # detect run-away refCount by comparing any reader's refCount with the 
actual number of open scanners (which we track for each HRegion) if the 
refCount is larger we know we have a problem.
 # (a variation) when we attempt to archive an HFile that has refCount, check 
if there're any open scanners, if not archive anyway.

For #1 at least we could enhance the logging and include the number of 
currently scanners in the log (where we say that we cannot archive an HFile)
{quote}
For #1, since open scanners are tracked at HRegion(RegionScannerImpl) and not 
at Store level, we might not be able to compare refCount with open scanners? 
Also, #2 might also not be true due to open scanners at region level 
(non-compacted store files) and not at store level? I was thinking if we can 
also track no of open scanners at store level.

I was just going though comments here while looking into HBASE-23349 (refCount 
1 preventing archival of compacted store files)

> Harden the HBase HFile reader reference counting
> ------------------------------------------------
>
>                 Key: HBASE-22457
>                 URL: https://issues.apache.org/jira/browse/HBASE-22457
>             Project: HBase
>          Issue Type: Brainstorming
>            Reporter: Lars Hofhansl
>            Priority: Major
>         Attachments: 22457-random-1.5.txt
>
>
> The problem that any coprocessor hook that replaces a passed scanner without 
> closing it can cause an incorrect reference count.
> This was bad and wrong before of course, but now it has pretty bad 
> consequences, since an incorrect reference could will prevent HFiles from 
> being archived indefinitely.
> All hooks that are passed a scanner and return a scanner are suspect, since 
> the returned scanner may or may not close the passed scanner:
> * preCompact
> * preCompactScannerOpen
> * preFlush
> * preFlushScannerOpen
> * preScannerOpen
> * preStoreScannerOpen
> * preStoreFileReaderOpen...? (not sure about this one, it could mess with the 
> reader)
> I sampled the Phoenix and also Tephra code, and found a few instances where 
> this is happening.
> And for those I filed issued: TEPHRA-300, PHOENIX-5291
> (We're not using Tephra)
> The Phoenix ones should be rare. In our case we are seeing readers with 
> refCount > 1000.
> Perhaps there are other issues, a path where not all exceptions are caught 
> and scanner is left open that way perhaps. (Generally I am not a fan of 
> reference counting in complex systems - it's too easy to miss something. But 
> that's a different discussion. :) ).
> Let's brainstorm some way in which we can harden this.
> [~ram_krish], [~anoop.hbase], [~apurtell]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to