[jira] [Commented] (HBASE-19468) FNFE during scans and flushes

Thiruvel Thirumoolan (JIRA) Mon, 11 Dec 2017 15:54:47 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-19468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16286553#comment-16286553
 ]


Thiruvel Thirumoolan commented on HBASE-19468:
----------------------------------------------

[~ram_krish] - Thanks for taking time to look into this. timezone has a big 
impact :)

Compaction happens in this case. I thought mentioning compaction discharger was 
good enough, but I should have been clearer. I have updated the description 
now. I wanted to post a unit test and a prelim patch on Fri, but ran into 
another FNFE during region opening + small compaction (will raise another one 
once I narrow it down).

We (Y!) have 1.3 on some of our clusters (the less loaded clusters) and FNFE 
happens atleast 2-3 times a day on all of them. Some of the issues in the 
umbrella jira HBASE-18397 helped. I will raise new ones for whatever we find.

> FNFE during scans and flushes
> -----------------------------
>
>                 Key: HBASE-19468
>                 URL: https://issues.apache.org/jira/browse/HBASE-19468
>             Project: HBase
>          Issue Type: Sub-task
>          Components: regionserver, Scanners
>    Affects Versions: 1.3.1
>            Reporter: Thiruvel Thirumoolan
>            Priority: Critical
>             Fix For: 2.0.0, 1.4.1, 1.5.0, 1.3.3
>
>
> We see FNFE exceptions on our 1.3 clusters when scans and flushes happen at 
> the same time. This causes regionserver to throw a UnknownScannerException 
> and client retries.
> This happens during the following sequence:
> 1. Scanner open, client fetched some rows from regionserver and working on it
> 2. Flush happens and storeScanner is updated with flushed files 
> (StoreScanner.updateReaders())
> 3. Compaction happens on the region while scanner is still open
> 4. compaction discharger runs and cleans up the newly flushed file as we 
> don't have new scanners on it yet.
> 5. Client issues scan.next and during StoreScanner.resetScannerStack(), we 
> get a FNFE. RegionServer throws a UnknownScannerThe client retries in 1.3. 
> With branch-1.4, the scan fails with a DoNotRetryIOException.
> [~ram_krish], My proposal is to increment the reader count during 
> updateReaders() and decrement it during resetScannerStack(), so discharger 
> doesn't clean it up. Scan lease expiries also have to be taken care of. Am I 
> missing anything? Is there a better approach?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HBASE-19468) FNFE during scans and flushes

Reply via email to