[jira] [Commented] (HDFS-2185) HA: HDFS portion of ZK-based FailoverController

Aaron T. Myers (Commented) (JIRA) Mon, 26 Mar 2012 15:36:56 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13238922#comment-13238922
 ]


Aaron T. Myers commented on HDFS-2185:
--------------------------------------

Patch looks pretty good, Todd. A few small comments:

# Is it intentional that you call "super.setConf(...)" in 
DFSZKFailoverController#setConf with the passed-conf as-is, instead of the 
slightly mutated localNNConf? If it is intentional, then please move the 
super.setConf(...) call higher in the method to make that clear.
# Is it necessary for DFSZKFailoverController#getLocalTarget to return a new 
NNHAServiceTarget object each time it is called? If so, please add a comment 
explaining why. If not, then let's just store a single instance of 
NNHAServiceTarget instead of the separate conf/nsId/nnId instance variables.
# I think you might end up with a race condition if you don't add a 
"waitForHAState(1, HAServiceState.STANDBY)" in 
TestDFSZKFailoverController#setup after starting the second ZKFCThread.
# A little too much indentation at the end of TestDFSZKFailoverController#setup.
# After the call to "cluster.restartNameNode(0)", I think we should assert that 
the second NN is still active. As-written, the test could succeed if the system 
failed back to the first NN upon restarting it. This assertion is sort of more 
a test of the common side of this patch, though, so feel free to punt.
# "Test-thread which runs a ZK Failover Controller corresponding to a given 
dummy service" - it's not a dummy service in this case.
                
> HA: HDFS portion of ZK-based FailoverController
> -----------------------------------------------
>
>                 Key: HDFS-2185
>                 URL: https://issues.apache.org/jira/browse/HDFS-2185
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: ha
>    Affects Versions: HA branch (HDFS-1623)
>            Reporter: Eli Collins
>            Assignee: Todd Lipcon
>         Attachments: Failover_Controller.jpg, hdfs-2185.txt, hdfs-2185.txt, 
> zkfc-design.pdf
>
>
> This jira is for a ZK-based FailoverController daemon. The FailoverController 
> is a separate daemon from the NN that does the following:
> * Initiates leader election (via ZK) when necessary
> * Performs health monitoring (aka failure detection)
> * Performs fail-over (standby to active and active to standby transitions)
> * Heartbeats to ensure the liveness
> It should have the same/similar interface as the Linux HA RM to aid 
> pluggability.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-2185) HA: HDFS portion of ZK-based FailoverController

Reply via email to