[ 
https://issues.apache.org/jira/browse/HADOOP-4885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12658620#action_12658620
 ] 

Boris Shkolnik commented on HADOOP-4885:
----------------------------------------

Current implementation:
 There is  a list of StorageDir objects associated with each FSImage.
 Also there is a list of EditLogs with each FSImage. One edit log has a
 corresponding StorageDir (same directory). When an IO error happens a
 corresponding EditLog and StorageDir are removed from the corresponding
 lists.
 
 
 Suggested solution:
 When a StorageDir is removed - instead of discarding it we will put it into
 a separate list (removedDir list).
 Edit log is removed and discarded.
 When a secondary node starts a checkpoint it first "rolls" editLogs
 (rollEditLogs).This function verifies that there is no edits.new in any of
 the currently active directories and than create them.
 
 Before it actually creates edits.new, we can go over the list of all the
 removed dirs and check if they became writable. If so - we can put them back
 into the list. So edits.new will be created there. We also will create
 EditLogs object. And when later checkpoint  "puts" (putFSImage) fsimage
 there - the directory will became active.



> Try to restore failed replicas of Name Node storage (at checkpoint time)
> ------------------------------------------------------------------------
>
>                 Key: HADOOP-4885
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4885
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: Boris Shkolnik
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to