Francis Liu commented on HBASE-19468:
cherry picked this to 1.3.2
> FNFE during scans and flushes
> Key: HBASE-19468
> URL: https://issues.apache.org/jira/browse/HBASE-19468
> Project: HBase
> Issue Type: Sub-task
> Components: regionserver, Scanners
> Affects Versions: 1.3.1
> Reporter: Thiruvel Thirumoolan
> Assignee: ramkrishna.s.vasudevan
> Priority: Critical
> Fix For: 2.0.0, 1.3.2, 1.4.1, 1.5.0
> Attachments: HBASE-19468-poc.patch, HBASE-19468_1.4.patch,
> We see FNFE exceptions on our 1.3 clusters when scans and flushes happen at
> the same time. This causes regionserver to throw a UnknownScannerException
> and client retries.
> This happens during the following sequence:
> 1. Scanner open, client fetched some rows from regionserver and working on it
> 2. Flush happens and storeScanner is updated with flushed files
> 3. Compaction happens on the region while scanner is still open
> 4. compaction discharger runs and cleans up the newly flushed file as we
> don't have new scanners on it yet.
> 5. Client issues scan.next and during StoreScanner.resetScannerStack(), we
> get a FNFE. RegionServer throws a UnknownScannerThe client retries in 1.3.
> With branch-1.4, the scan fails with a DoNotRetryIOException.
> [~ram_krish], My proposal is to increment the reader count during
> updateReaders() and decrement it during resetScannerStack(), so discharger
> doesn't clean it up. Scan lease expiries also have to be taken care of. Am I
> missing anything? Is there a better approach?
This message was sent by Atlassian JIRA