[jira] [Commented] (HDFS-6440) Support more than 2 NameNodes

Aaron T. Myers (JIRA) Tue, 05 May 2015 18:14:27 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-6440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529699#comment-14529699
 ]


Aaron T. Myers commented on HDFS-6440:
--------------------------------------

Hi Jesse and Lars, 

My sincere apologies it took so long for me to post a review. No good excuse 
except being busy, but what else is new.

Anyway, the patch looks pretty good to me. Most everything that's below is 
pretty small stuff.

One small potential correctness issue:
# In {{StandbyCheckpointer#doCheckpoint}}, unless I'm missing something, I 
don't think the variable "{{ie}}" can ever be non-null, and yet we check for 
whether or not it's null later in the method to determine if we should shut 
down.

Two things I'd really like to see some test coverage for:
# The changes to {{BlockTokenSecretManager}} - they look fine to me in general, 
but I'd love to see some extra tests of this functionality with several NNs in 
play. Unless I missed something, I don't think there are any tests that would 
exercise more than 2 {{BlockTokenSecretManager}}s.
# Rolling upgrades/downgrades/rollbacks. I agree with you in general that this 
change should likely not affect anything, but I think it's important that we 
have some test(s) exercising this regardless.

Several little nits:
# In {{MiniZKFCCluster}}, this method now supports more than just two services: 
"+   * Set up two services and their failover controllers."
# Recommend making {{intRange}} and {{nnRangeStart}} final in 
{{BlockTokenSecretManager}}.
# Should document the behavior of both of the newly-introduced config keys 
(dfs.namenode.checkpoint.check.quiet-multiplier and 
dfs.hs.tail-edits.namenode-retries) in hdfs-default.xml.
# I think this error message could be a bit clearer:
{quote}
+                    "Node is currently not in the active state, state:" + 
state +
+                    " does not support reading FSImages from other NameNodes");
{quote}
Recommend something like "NameNode <hostname or IP address> is currently not in 
a state which can accept uploads of new fsimages. State: <state>".
# Would be great for debugging purposes if we could include the hostname or IP 
address of the checkpointer already doing the upload with the higher txid in 
this message:
{quote}
+                    "Another checkpointer is already in the process of 
uploading a" +
+                        " checkpoint made up to transaction ID " + 
larger.last());
{quote}
# Spelled "failure" incorrectly here: "AUTHENTICATION_FAILRE"
# Sorry, I don't quite follow this comment in {{BootstrapStandby}}:
{quote}
+        // get the namespace from any active NN. On a fresh cluster, this is 
the active. On a
+        // running cluster, this works on any node.
{quote}
What's a "fresh cluster" vs. a "running cluster" in this sense?
# In {{HATestUtil#waitForStandbyToCatchUp}}, looks like you changed the method 
comment to indicate that the method takes multiple standbys as an argument, but 
in fact the method functionality is unchanged. There's just some whitespace 
changes in that method.
# In {{TestPipelinesFailover#doWriteOverFailoverTest}}, is changing the value 
of {{FAILOVER_SEED}} going to do anything, given that it's only ever read at 
the static initialization of the {{failoverRandom}}?

Also, not a problem at all, but just want to say that I really like the way 
this patch changes TransferFsImage, and the additional diagnostic info it 
provides when uploads fail. That's a nice little improvement by itself.

I'll be +1 once this stuff is addressed.

> Support more than 2 NameNodes
> -----------------------------
>
>                 Key: HDFS-6440
>                 URL: https://issues.apache.org/jira/browse/HDFS-6440
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: auto-failover, ha, namenode
>    Affects Versions: 2.4.0
>            Reporter: Jesse Yates
>            Assignee: Jesse Yates
>         Attachments: Multiple-Standby-NameNodes_V1.pdf, 
> hdfs-6440-cdh-4.5-full.patch, hdfs-6440-trunk-v1.patch, 
> hdfs-6440-trunk-v1.patch, hdfs-multiple-snn-trunk-v0.patch
>
>
> Most of the work is already done to support more than 2 NameNodes (one 
> active, one standby). This would be the last bit to support running multiple 
> _standby_ NameNodes; one of the standbys should be available for fail-over.
> Mostly, this is a matter of updating how we parse configurations, some 
> complexity around managing the checkpointing, and updating a whole lot of 
> tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6440) Support more than 2 NameNodes

Reply via email to