[ https://issues.apache.org/jira/browse/HDFS-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hanisha Koneru updated HDFS-4025: --------------------------------- Attachment: HDFS-4025.008.patch Thank you [~jingzhao] for reviewing the patch. {quote} 5. Similarly please see if we still need JNStorage#getTemporaryEditsFile and JNStorage#getFinalizedEditsFile. {quote} We would need these two methods as the corresponding methods in NNStorage require the current storage directory to passed as arguments. {quote} 12. The whole "getMissingLogSegments" may need to be redesigned: Each time we download a missing segment successfully, we should update lastSyncedTxId accordingly. {quote} Suppose the lastSyncedTxId is 10 and the other journal node from which we are downloading missing logs has logs starting from edits_20_30. then we should not update the lastSyncedTxId to 30 as we might still get the missing edits 11 to 20 in another journal node. Instead, if we update the lastSyncedTxId at the end of one sync cycle (after downloading all missing logs from a journal), then we can avoid this scenario. I have addressed rest of the comments in patch v08. > QJM: Sychronize past log segments to JNs that missed them > --------------------------------------------------------- > > Key: HDFS-4025 > URL: https://issues.apache.org/jira/browse/HDFS-4025 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha > Affects Versions: QuorumJournalManager (HDFS-3077) > Reporter: Todd Lipcon > Assignee: Hanisha Koneru > Fix For: QuorumJournalManager (HDFS-3077) > > Attachments: HDFS-4025.000.patch, HDFS-4025.001.patch, > HDFS-4025.002.patch, HDFS-4025.003.patch, HDFS-4025.004.patch, > HDFS-4025.005.patch, HDFS-4025.006.patch, HDFS-4025.007.patch, > HDFS-4025.008.patch > > > Currently, if a JournalManager crashes and misses some segment of logs, and > then comes back, it will be re-added as a valid part of the quorum on the > next log roll. However, it will not have a complete history of log segments > (i.e any individual JN may have gaps in its transaction history). This > mirrors the behavior of the NameNode when there are multiple local > directories specified. > However, it would be better if a background thread noticed these gaps and > "filled them in" by grabbing the segments from other JournalNodes. This > increases the resilience of the system when JournalNodes get reformatted or > otherwise lose their local disk. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org