[ 
https://issues.apache.org/jira/browse/HBASE-18786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16199403#comment-16199403
 ] 

Andrew Purtell commented on HBASE-18786:
----------------------------------------

What this patch removed was a silent rescan of storefiles if there is some kind 
of glitch leading to a FNFE during scanning.  If there was a temporary store 
file accounting failure then after one server aborts and another picks it up, 
the new server will not see another FNFE. 

If there is a permanent condition, like loss of files or directories in HDFS, 
leading to FNFEs and a cascading failure situation, then I don't see how 
rescanning would help, and anyway we should handle it differently. Previously 
we would have silently opened the region with missing files (?). That would be 
bad. Aborting would be bad too in that case. Rather than aborting we should 
fail the region open. This should be handled with a new JIRA.

> FileNotFoundException should not be silently handled for primary region 
> replicas
> --------------------------------------------------------------------------------
>
>                 Key: HBASE-18786
>                 URL: https://issues.apache.org/jira/browse/HBASE-18786
>             Project: HBase
>          Issue Type: Sub-task
>          Components: regionserver, Scanners
>            Reporter: Ashu Pachauri
>            Assignee: Andrew Purtell
>             Fix For: 2.0.0, 3.0.0, 1.4.0, 1.5.0
>
>         Attachments: HBASE-18786-branch-1.3.patch, 
> HBASE-18786-branch-1.patch, HBASE-18786-branch-1.patch, HBASE-18786.patch, 
> HBASE-18786.patch
>
>
> This is a follow up for HBASE-18186.
> FileNotFoundException while scanning from a primary region replica can be 
> indicative of a more severe problem. Handling them silently can cause many 
> underlying issues go undetected. We should either
> 1. Hard fail the regionserver if there is a FNFE on a primary region replica, 
> OR
> 2. Report these exceptions as some region / server level metric so that these 
> can be proactively investigated.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to