[
https://issues.apache.org/jira/browse/HDFS-1623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13039891#comment-13039891
]
Eli Collins commented on HDFS-1623:
-----------------------------------
Thanks for incorporating the feedback. New doc looks good. Some comments:
* Section 8.1 - I think the BN approach is to run *multiple* BNs, this way the
3f use case is not a problem as long as you have at least one BN alive, and you
don't need shared storage to address 3f. This is similar to GFS' multiple
shadow masters.
* Section 8.3 - fail-over time doesn't need to be longer if the client is
notified when there's a new primary. One idea, clients could watch an ephemeral
ZK node, though there's an open question as to whether ZK can support as many
observers as we have clients.
* Section 8.5 - We need to figure out where the FailoverController (FC) runs,
if lives in the same failure domain as the primary then you've still got a
single point of failure. If it lives on a different failure domain then it may
not be able to tell if the primary has failed, or be able to take the
appropriate action if it has (eg due to lack of connectivity). Obviously the FC
needs to be HA itself too (eg leader elected, new FC is spawned if the primary
FC fails).
* Section 9.9.1 - Todd and I have investigated fencing in NFS some. In v3
locking (NLM) doesn't work because dead clients maintain the lock. We'll need
to have a pluggable shell command (eg some vendors provide a perl module that
can ssh in and fence a particular IP) if we don't want to require IPMI, ILO,
etc for stonith.
> High Availability Framework for HDFS NN
> ---------------------------------------
>
> Key: HDFS-1623
> URL: https://issues.apache.org/jira/browse/HDFS-1623
> Project: Hadoop HDFS
> Issue Type: New Feature
> Reporter: Sanjay Radia
> Assignee: Sanjay Radia
> Attachments: HDFS-High-Availability.pdf, NameNode HA_v2.pdf, NameNode
> HA_v2_1.pdf, Namenode HA Framework.pdf
>
>
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira