[jira] [Commented] (HBASE-19468) FNFE during scans and flushes

Anoop Sam John (JIRA) Mon, 11 Dec 2017 02:47:59 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-19468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16285747#comment-16285747
 ]


Anoop Sam John commented on HBASE-19468:
----------------------------------------

May be in below scenario issue will come?
We are in btw a read as said in issue desc and got some data at client side and 
processing
Mean while in server one flush happened. This file is added to the 
flushedStoreFiles.  
Again one more flush and this is also added
One more flush and this is also added
And this makes 3 small files and assume a compaction started and completed. 3 
new files are compacted away
Remember the scanner did not call any seek or next yet and so no processing of 
the 3 new flushed files happened. Means no scanners were obtained on them and 
so ref count on 3 files are still 0 (?)
The CompactedHFilesDischarger may be able to remove these files then as they 
are not yet referenced!
Now if a next() happens, still these 3 files are there in 'flushedStoreFiles' 
and if tried to make scanners, we will get FNFE.
Just on a quick look at related code am asking. Pls correct if wrong.

> FNFE during scans and flushes
> -----------------------------
>
>                 Key: HBASE-19468
>                 URL: https://issues.apache.org/jira/browse/HBASE-19468
>             Project: HBase
>          Issue Type: Sub-task
>          Components: regionserver, Scanners
>    Affects Versions: 1.3.1
>            Reporter: Thiruvel Thirumoolan
>            Priority: Critical
>             Fix For: 2.0.0, 1.4.1, 1.5.0, 1.3.3
>
>
> We see FNFE exceptions on our 1.3 clusters when scans and flushes happen at 
> the same time. This causes regionserver to throw a UnknownScannerException 
> and client retries.
> This happens during the following sequence:
> 1. Scanner open, client fetched some rows from regionserver and working on it
> 2. Flush happens and storeScanner is updated with flushed files 
> (StoreScanner.updateReaders())
> 3. Compaction discharger runs and cleans up the newly flushed file as we 
> don't have new scanners on it yet.
> 4. Client issues scan.next and during StoreScanner.resetScannerStack(), we 
> get a FNFE. RegionServer throws a UnknownScannerThe client retries in 1.3. 
> With branch-1.4, the scan fails with a DoNotRetryIOException.
> [~ram_krish], My proposal is to increment the reader count during 
> updateReaders() and decrement it during resetScannerStack(), so discharger 
> doesn't clean it up. Scan lease expiries also have to be taken care of. Am I 
> missing anything? Is there a better approach?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HBASE-19468) FNFE during scans and flushes

Reply via email to