[jira] [Commented] (HBASE-19468) FNFE during scans and flushes

Anoop Sam John (JIRA) Thu, 14 Dec 2017 06:23:20 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-19468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16290897#comment-16290897
 ]


Anoop Sam John commented on HBASE-19468:
----------------------------------------

I was thinking that the StoreFileScanner open here then it self try doing a 
seek to the correct position. Seems not. That will happen only when the real 
next/seek call been done. Good.  When it is pRead, yes I can not see any extra 
resource consumption other than just creating some scanner objects. Only when 
the scan uses stream read type, the StoreFileScanner open itself makes a new 
reader open with stream mode.  But mostly for scans (that too stream), we will 
have next calls really happening after this flushes.  So this is not a big 
concern.   Said these and also considering the simplicity of the patch (To get 
the refCount ticking) I would say +1 for this approach.  Pls add some comments 
why we open fileScanner eagerly so that later no one would wonder why.   And 
remove the commented line.  Good one.

> FNFE during scans and flushes
> -----------------------------
>
>                 Key: HBASE-19468
>                 URL: https://issues.apache.org/jira/browse/HBASE-19468
>             Project: HBase
>          Issue Type: Sub-task
>          Components: regionserver, Scanners
>    Affects Versions: 1.3.1
>            Reporter: Thiruvel Thirumoolan
>            Priority: Critical
>             Fix For: 2.0.0, 1.4.1, 1.5.0, 1.3.3
>
>         Attachments: HBASE-19468-poc.patch, HBASE-19468_1.4.patch
>
>
> We see FNFE exceptions on our 1.3 clusters when scans and flushes happen at 
> the same time. This causes regionserver to throw a UnknownScannerException 
> and client retries.
> This happens during the following sequence:
> 1. Scanner open, client fetched some rows from regionserver and working on it
> 2. Flush happens and storeScanner is updated with flushed files 
> (StoreScanner.updateReaders())
> 3. Compaction happens on the region while scanner is still open
> 4. compaction discharger runs and cleans up the newly flushed file as we 
> don't have new scanners on it yet.
> 5. Client issues scan.next and during StoreScanner.resetScannerStack(), we 
> get a FNFE. RegionServer throws a UnknownScannerThe client retries in 1.3. 
> With branch-1.4, the scan fails with a DoNotRetryIOException.
> [~ram_krish], My proposal is to increment the reader count during 
> updateReaders() and decrement it during resetScannerStack(), so discharger 
> doesn't clean it up. Scan lease expiries also have to be taken care of. Am I 
> missing anything? Is there a better approach?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HBASE-19468) FNFE during scans and flushes

Reply via email to