[ https://issues.apache.org/jira/browse/HBASE-19468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16287084#comment-16287084 ]
ramkrishna.s.vasudevan commented on HBASE-19468: ------------------------------------------------ [~thiruvel] Is the POC patch complete? Probably you missed to upload the entire patch unknowingly. So the scans are slower to such an extent that a compaction gets completed in between the scan does a next()? I think the case could be a newly flushed file gets picked up with old set of already flushed file rather than new puts and those puts causing newer flushes. > FNFE during scans and flushes > ----------------------------- > > Key: HBASE-19468 > URL: https://issues.apache.org/jira/browse/HBASE-19468 > Project: HBase > Issue Type: Sub-task > Components: regionserver, Scanners > Affects Versions: 1.3.1 > Reporter: Thiruvel Thirumoolan > Priority: Critical > Fix For: 2.0.0, 1.4.1, 1.5.0, 1.3.3 > > Attachments: HBASE-19468-poc1.patch > > > We see FNFE exceptions on our 1.3 clusters when scans and flushes happen at > the same time. This causes regionserver to throw a UnknownScannerException > and client retries. > This happens during the following sequence: > 1. Scanner open, client fetched some rows from regionserver and working on it > 2. Flush happens and storeScanner is updated with flushed files > (StoreScanner.updateReaders()) > 3. Compaction happens on the region while scanner is still open > 4. compaction discharger runs and cleans up the newly flushed file as we > don't have new scanners on it yet. > 5. Client issues scan.next and during StoreScanner.resetScannerStack(), we > get a FNFE. RegionServer throws a UnknownScannerThe client retries in 1.3. > With branch-1.4, the scan fails with a DoNotRetryIOException. > [~ram_krish], My proposal is to increment the reader count during > updateReaders() and decrement it during resetScannerStack(), so discharger > doesn't clean it up. Scan lease expiries also have to be taken care of. Am I > missing anything? Is there a better approach? -- This message was sent by Atlassian JIRA (v6.4.14#64029)