[ https://issues.apache.org/jira/browse/HBASE-18786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16199403#comment-16199403 ]
Andrew Purtell edited comment on HBASE-18786 at 10/10/17 9:28 PM: ------------------------------------------------------------------ What this patch removed was a silent rescan of storefiles if there is some kind of glitch leading to a FNFE during scanning. If there was a temporary store file accounting failure then after one server aborts and another picks it up, the new server will not see another FNFE, and so this is not a cascading failure condition. If there is a permanent condition which could lead to a cascade, like hard loss of files or directories in HDFS, leading to FNFEs and a cascading failure situation, then I don't see how rescanning would help, and anyway we should handle it differently. Previously we would have silently opened the region with missing files (?). That would be bad. Aborting would be bad too in that case. Rather than aborting we should fail the region open. This should be handled with a new JIRA. was (Author: apurtell): What this patch removed was a silent rescan of storefiles if there is some kind of glitch leading to a FNFE during scanning. If there was a temporary store file accounting failure then after one server aborts and another picks it up, the new server will not see another FNFE. If there is a permanent condition, like loss of files or directories in HDFS, leading to FNFEs and a cascading failure situation, then I don't see how rescanning would help, and anyway we should handle it differently. Previously we would have silently opened the region with missing files (?). That would be bad. Aborting would be bad too in that case. Rather than aborting we should fail the region open. This should be handled with a new JIRA. > FileNotFoundException should not be silently handled for primary region > replicas > -------------------------------------------------------------------------------- > > Key: HBASE-18786 > URL: https://issues.apache.org/jira/browse/HBASE-18786 > Project: HBase > Issue Type: Sub-task > Components: regionserver, Scanners > Reporter: Ashu Pachauri > Assignee: Andrew Purtell > Fix For: 2.0.0, 3.0.0, 1.4.0, 1.5.0 > > Attachments: HBASE-18786-branch-1.3.patch, > HBASE-18786-branch-1.patch, HBASE-18786-branch-1.patch, HBASE-18786.patch, > HBASE-18786.patch > > > This is a follow up for HBASE-18186. > FileNotFoundException while scanning from a primary region replica can be > indicative of a more severe problem. Handling them silently can cause many > underlying issues go undetected. We should either > 1. Hard fail the regionserver if there is a FNFE on a primary region replica, > OR > 2. Report these exceptions as some region / server level metric so that these > can be proactively investigated. -- This message was sent by Atlassian JIRA (v6.4.14#64029)