[jira] [Commented] (HDFS-1623) High Availability Framework for HDFS NN

Ivan Kelly (JIRA) Mon, 21 Mar 2011 07:09:43 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-1623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13009130#comment-13009130
 ]


Ivan Kelly commented on HDFS-1623:
----------------------------------

{quote}
> How does heartbeat deal with network partition? My understanding of it is 
> that it sends packets at intervals to the other node, and if they don't get 
> through it considers the other dead. This could create a situation where both 
> active and standby think that the other is dead, and both become active, 
> leading to divergent filesystem states on each machine.
This is discussed in the document as split brain and fencing requirements 
right?{quote}
Ah, missed this. The fencing section does cover this.

{quote}
> Also, the design indicates that more than 2 NN is out of scope. Why? Surely 
> it's as easy to design for N namenodes as it is for 2 namenodes.
Why do you need more than 2 NNs? Having more than 2NNs could solve need for 
outside quorum service. But number of NNs could be huge, especially in 
federated clusters.
{quote}
Its mentioned as "Out of scope" but having read operations on a standby could 
be a use case for this. Read throughput could be increased by adding more 
standby nodes. While this is out of scope, it would be good to keep it in mind 
now so that the design doesn't end up being tied to just 2 nodes which may be 
hard to rectify later.

{quote}
> If you want manual failover, from the server perspective you need to do 
> nothing. Operators can have 2 namenode machines, with the namenode only 
> running on one, writing to shared storage. When the want to failover to the 
> standby they just have to ensure that the active is down and start the 
> namenode daemon on the standby.
Not sure what you are getting at here, in reference to the attached document?
{quote}
My point was that one of your requirements is "First class support for manual 
failover" and that this can doesn't need any changes to implement. It's 
available now provided you are logging to shared storage.

{quote}
> I proposed a design last week for streaming updates from an active to a 
> standby, it may be interesting to you (ZOOKEEPER-1016). It does have some 
> mentions of active/standby detection, which I should remove. It occurs to me 
> now, that this functionallity should be separated out completely from the 
> WALing and should live at the level of NameNode.java.
I do not understand this point. Will take a look at your proposal. But as 
regards to this jira, BookKeeper could be a component in the solution and not 
the only component.{quote}
Just thought it would be useful for you guys to be aware of it. The tackles a 
different problem, but in a related area.

> High Availability Framework for HDFS NN
> ---------------------------------------
>
>                 Key: HDFS-1623
>                 URL: https://issues.apache.org/jira/browse/HDFS-1623
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>            Reporter: Sanjay Radia
>            Assignee: Sanjay Radia
>         Attachments: Namenode HA Framework.pdf
>
>


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1623) High Availability Framework for HDFS NN

Reply via email to