[ 
https://issues.apache.org/jira/browse/HBASE-22457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16846161#comment-16846161
 ] 

Lars Hofhansl commented on HBASE-22457:
---------------------------------------

Oh, it's a hack for sure and, as you point out, might hide the actual problem.

I do like your idea. Can we do a fast close without flushing the memstore? 
(Otherwise it might not be "fast" :))

Another hacks/ideas:
# detect run-away refCount by comparing any reader's refCount with the actual 
number of open scanners (which we track for each HRegion) if the refCount is 
larger we know we have a problem.
# (a variation) when we attempt to archive an HFile that has refCount, check if 
there're any open scanners, if not archive anyway.

For #1 at least we could enhance the logging and include the number of 
currently scanners in the log (where we say that we cannot archive an HFile)

What I'm really looking for is a structural fix where a coprocessor cannot mess 
things. Perhaps that's not possible without severely limiting what coprocessors 
are allowed to do.


> Harden the HBase HFile reader reference counting
> ------------------------------------------------
>
>                 Key: HBASE-22457
>                 URL: https://issues.apache.org/jira/browse/HBASE-22457
>             Project: HBase
>          Issue Type: Brainstorming
>            Reporter: Lars Hofhansl
>            Priority: Major
>
> The problem that any coprocessor hook that replaces a passed scanner without 
> closing it can cause an incorrect reference count.
> This was bad and wrong before of course, but now it has pretty bad 
> consequences, since an incorrect reference could will prevent HFiles from 
> being archived indefinitely.
> All hooks that are passed a scanner and return a scanner are suspect, since 
> the returned scanner may or may not close the passed scanner:
> * preCompact
> * preCompactScannerOpen
> * preFlush
> * preFlushScannerOpen
> * preScannerOpen
> * preStoreScannerOpen
> * preStoreFileReaderOpen...? (not sure about this one, it could mess with the 
> reader)
> I sampled the Phoenix and also Tephra code, and found a few instances where 
> this is happening.
> And for those I filed issued: TEPHRA-300, PHOENIX-5291
> (We're not using Tephra)
> The Phoenix ones should be rare. In our case we are seeing readers with 
> refCount > 1000.
> Perhaps there are other issues, a path where not all exceptions are caught 
> and scanner is left open that way perhaps. (Generally I am not a fan of 
> reference counting in complex systems - it's too easy to miss something. But 
> that's a different discussion. :) ).
> Let's brainstorm some way in which we can harden this.
> [~ram_krish], [~anoop.hbase], [~apurtell]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to