[
https://issues.apache.org/jira/browse/HBASE-19468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ramkrishna.s.vasudevan updated HBASE-19468:
-------------------------------------------
Attachment: HBASE-19468_1.4.patch
Attaching a tentative patch with test case. the test case can be used to
reproduce the FNFE case. With patch it does not occur.
So when we do update readers instead of storing the list of files, now we
create scanners and store them so that on resetScannerStack() we just use those
scanners directly. Ensure on close() we clear this new set of scanners also. In
general case this list would be empty only when a scanner lease expires without
calling a next() this will really get closed.
[~thiruvel], [~chia7712]
Pls have a look at this.
Will check if there are any other cases.
> FNFE during scans and flushes
> -----------------------------
>
> Key: HBASE-19468
> URL: https://issues.apache.org/jira/browse/HBASE-19468
> Project: HBase
> Issue Type: Sub-task
> Components: regionserver, Scanners
> Affects Versions: 1.3.1
> Reporter: Thiruvel Thirumoolan
> Priority: Critical
> Fix For: 2.0.0, 1.4.1, 1.5.0, 1.3.3
>
> Attachments: HBASE-19468_1.4.patch
>
>
> We see FNFE exceptions on our 1.3 clusters when scans and flushes happen at
> the same time. This causes regionserver to throw a UnknownScannerException
> and client retries.
> This happens during the following sequence:
> 1. Scanner open, client fetched some rows from regionserver and working on it
> 2. Flush happens and storeScanner is updated with flushed files
> (StoreScanner.updateReaders())
> 3. Compaction happens on the region while scanner is still open
> 4. compaction discharger runs and cleans up the newly flushed file as we
> don't have new scanners on it yet.
> 5. Client issues scan.next and during StoreScanner.resetScannerStack(), we
> get a FNFE. RegionServer throws a UnknownScannerThe client retries in 1.3.
> With branch-1.4, the scan fails with a DoNotRetryIOException.
> [~ram_krish], My proposal is to increment the reader count during
> updateReaders() and decrement it during resetScannerStack(), so discharger
> doesn't clean it up. Scan lease expiries also have to be taken care of. Am I
> missing anything? Is there a better approach?
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)