[ https://issues.apache.org/jira/browse/HDFS-13031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Siyao Meng reassigned HDFS-13031: --------------------------------- Assignee: Adam Antal (was: Siyao Meng) > To detect fsimage corruption on the spot > ---------------------------------------- > > Key: HDFS-13031 > URL: https://issues.apache.org/jira/browse/HDFS-13031 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs > Environment: > Reporter: Yongjun Zhang > Assignee: Adam Antal > Priority: Major > > Since we fixed HDFS-9406, there are new cases reported from the field that > similar fsimage corruption happens. We need good fsimage + editlogs to replay > to reproduce the corruption. However, usually when the corruption is detected > (at later NN restart), the good fsimage is already deleted. > We need to have a way to detect fsimage corruption on the spot. Currently > what I think we could do is: > # after SNN creates a new fsimage, it spawn a new modified NN process (NN > with some new command line args) to just load the fsimage and do nothing > else. > # If the process failed, the currently running SNN will do either a) backup > the fsimage + editlogs or b) no longer do checkpointing. And it need to > somehow raise a flag to user that the fsimage is corrupt. > In step 2, if we do a, we need to introduce new NN->JN API to backup > editlogs; if we do b, it changes SNN's behavior, and kind of not compatible. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org