[ https://issues.apache.org/jira/browse/HBASE-19468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16286553#comment-16286553 ]
Thiruvel Thirumoolan commented on HBASE-19468: ---------------------------------------------- [~ram_krish] - Thanks for taking time to look into this. timezone has a big impact :) Compaction happens in this case. I thought mentioning compaction discharger was good enough, but I should have been clearer. I have updated the description now. I wanted to post a unit test and a prelim patch on Fri, but ran into another FNFE during region opening + small compaction (will raise another one once I narrow it down). We (Y!) have 1.3 on some of our clusters (the less loaded clusters) and FNFE happens atleast 2-3 times a day on all of them. Some of the issues in the umbrella jira HBASE-18397 helped. I will raise new ones for whatever we find. > FNFE during scans and flushes > ----------------------------- > > Key: HBASE-19468 > URL: https://issues.apache.org/jira/browse/HBASE-19468 > Project: HBase > Issue Type: Sub-task > Components: regionserver, Scanners > Affects Versions: 1.3.1 > Reporter: Thiruvel Thirumoolan > Priority: Critical > Fix For: 2.0.0, 1.4.1, 1.5.0, 1.3.3 > > > We see FNFE exceptions on our 1.3 clusters when scans and flushes happen at > the same time. This causes regionserver to throw a UnknownScannerException > and client retries. > This happens during the following sequence: > 1. Scanner open, client fetched some rows from regionserver and working on it > 2. Flush happens and storeScanner is updated with flushed files > (StoreScanner.updateReaders()) > 3. Compaction happens on the region while scanner is still open > 4. compaction discharger runs and cleans up the newly flushed file as we > don't have new scanners on it yet. > 5. Client issues scan.next and during StoreScanner.resetScannerStack(), we > get a FNFE. RegionServer throws a UnknownScannerThe client retries in 1.3. > With branch-1.4, the scan fails with a DoNotRetryIOException. > [~ram_krish], My proposal is to increment the reader count during > updateReaders() and decrement it during resetScannerStack(), so discharger > doesn't clean it up. Scan lease expiries also have to be taken care of. Am I > missing anything? Is there a better approach? -- This message was sent by Atlassian JIRA (v6.4.14#64029)