[
https://issues.apache.org/jira/browse/HDFS-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15849052#comment-15849052
]
Jing Zhao commented on HDFS-4025:
---------------------------------
Thanks for the updating the patch, [~hanishakoneru]. The latest patch looks
pretty good to me. Some minor comments:
# In hdfs-default.xml, "i" --> "if"
{code}
+ <name>dfs.journalnode.enable.sync</name>
+ <value>true</value>
+ <description>
+ If true, the journal nodes wil sync with each other. The journal nodes
+ will periodically gossip with other journal nodes to compare edit log
+ manifests and i they detect any missing log segment, they will download
+ it from the other journal nodes.
+ </description>
+</property>
{code}
# In JournalNodeSyncer.java, the following code will generate an
{{UnsupportedOperationException}} since thisJournalEditLogs is an immutable
list. In fact this add op can be skipped.
{code}
if (success) {
thisJournalEditLogs.add(missingLog);
}
{code}
# Maybe "Transferring" can be changed to "Downloading"?
{code}
LOG.info("Transferring Missing Edit Log from " + url + " to " + jnStorage
.getRoot());
{code}
# {{finalEditsFile}} should be {{tmpEditsFile}}.
{code}
LOG.info("Downloaded file " + tmpEditsFile.getName() + " size " +
finalEditsFile.length() + " bytes.");
{code}
# In {{TestJournalNodeSync}}, {{jid}} can be declared as final, and
{{editLogExists}} can be private.
# For {{deleteEditLog}}, we can either change the while loop to an if, or
refresh logFile instance within the while loop.
{code}
+ while (logFile.isInProgress()) {
+ dfsCluster.getNameNode(0).getRpcServer().rollEditLog();
{code}
# The following code can be simplified as "Assert.assertTrue("Couldn't delete
edit log file", deleteFile.delete());"
{code}
+ if (!deleteFile.delete()) {
+ assert false: "Couldn't delete edit log file";
+ return null;
+ }
{code}
# In {{generateEditLog}}, let's also check the result of {{doAndEdit}}. I.e.,
we do "Assert.assertTrue(doAnEdit());"
> QJM: Sychronize past log segments to JNs that missed them
> ---------------------------------------------------------
>
> Key: HDFS-4025
> URL: https://issues.apache.org/jira/browse/HDFS-4025
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: ha
> Affects Versions: QuorumJournalManager (HDFS-3077)
> Reporter: Todd Lipcon
> Assignee: Hanisha Koneru
> Fix For: QuorumJournalManager (HDFS-3077)
>
> Attachments: HDFS-4025.000.patch, HDFS-4025.001.patch,
> HDFS-4025.002.patch, HDFS-4025.003.patch, HDFS-4025.004.patch,
> HDFS-4025.005.patch, HDFS-4025.006.patch, HDFS-4025.007.patch,
> HDFS-4025.008.patch, HDFS-4025.009.patch
>
>
> Currently, if a JournalManager crashes and misses some segment of logs, and
> then comes back, it will be re-added as a valid part of the quorum on the
> next log roll. However, it will not have a complete history of log segments
> (i.e any individual JN may have gaps in its transaction history). This
> mirrors the behavior of the NameNode when there are multiple local
> directories specified.
> However, it would be better if a background thread noticed these gaps and
> "filled them in" by grabbing the segments from other JournalNodes. This
> increases the resilience of the system when JournalNodes get reformatted or
> otherwise lose their local disk.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]