[ https://issues.apache.org/jira/browse/HDFS-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13029387#comment-13029387 ]
Todd Lipcon commented on HDFS-1799: ----------------------------------- {quote} FSEditLog has a list of JournalManagers and ELOS. On initJournals, the ELOS list is populated from the JournalManagers. If a failure occurs, the ELOS is simply dropped. Then on beginRoll, ELOS is cleared and repopulated from the JournalManagers (this will readd any failed journal which is once again available). Since a mapping between JournalManagers and ELOS is unnecessary, why add it? {quote} I started off exactly on the route you're describing. I needed to add the coupling in FSEditLog so I could go backwards from a stream to a StorageDirectory, and also so that, when I close a stream at roll-time, I could finalize the file with the JournalManager. Neither of these things belongs in the output-stream itself in my opinion. Additionally, if we want to keep the ability to _not_ automatically try to restore edit logs at each roll, we need to be able to mark the JournalManager as bad on failure. The current patch does not do this yet, but it's worth a discussion on another JIRA. For monitoring, I think it will also be useful to be able to iterate the pairs of JournalManager->Stream. For example, right now, the NN displays the list of storage directories and whether they're active or failed, but that code will probably need updating to deal with non-StorageDirectory output streams. > Refactor log rolling and filename management out of FSEditLog > ------------------------------------------------------------- > > Key: HDFS-1799 > URL: https://issues.apache.org/jira/browse/HDFS-1799 > Project: Hadoop HDFS > Issue Type: Sub-task > Affects Versions: Edit log branch (HDFS-1073) > Reporter: Todd Lipcon > Assignee: Todd Lipcon > Fix For: Edit log branch (HDFS-1073) > > Attachments: 0001-Added-state-management-to-FSEditLog.patch, > 0002-Standardised-error-pattern.patch, > 0003-Add-JournalFactory-and-move-divert-revert-out-of-FSE.patch, > HDFS-1799-all.diff, hdfs-1799-alternate-design.txt, hdfs-1799.txt, > hdfs-1799.txt, hdfs-1799.txt, hdfs-1799.txt > > > This is somewhat similar to HDFS-1580, but less ambitious. While that JIRA > focuses on pluggability, this task is simply the minimum needed for HDFS-1073: > - Refactor the filename-specific code for rolling, diverting, and reverting > log streams out of FSEditLog into a new class > - Clean up the related code in FSEditLog a bit > Notably, this JIRA is going to temporarily break the BackupNode. I plan to > circle back on the BackupNode later on this branch. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira