[ 
https://issues.apache.org/jira/browse/HDFS-1623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins updated HDFS-1623:
------------------------------

    Attachment: HDFS-High-Availability.pdf

Hey Sanjay, Suresh,

Read through the doc, here are some comments/questions I've been meaning to 
post. Also, I've attached a doc that Todd, atm, and I wrote so we can compare 
notes.

Overall, looks great, I agree with the scope.

Agree with failures supported, you mean simultaneous failures right? We should 
list these failures out. Eg the NN today can cope with multiple disk failures 
but not a single failed dimm.

How does NN fail-over impact federation? Eg does viewfs have any special 
requirements as a client?

Agree that short GC pauses should not be considered failures, and the corrolary 
that long GC pauses should. This only works well if GC on the standby is not 
correlated with GC on the active NN (othewise you fail over to standby just to 
have it GC). How can we ensure this is the case? Perhaps force regular GC's on 
the standby?

Wrt requirement #2, I assume fail over to new version of HDFS (ie having 
different versions of HDF interoperate) is out of scope for this framework. 
However fail-over to a minimally patched NN (ie no protocol, wire-level format 
changes) should work.

Wrt requirement #4, this means all state changes need to be made persistently 
off-host before success is reported to clients.

> High Availability Framework for HDFS NN
> ---------------------------------------
>
>                 Key: HDFS-1623
>                 URL: https://issues.apache.org/jira/browse/HDFS-1623
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>            Reporter: Sanjay Radia
>            Assignee: Sanjay Radia
>         Attachments: HDFS-High-Availability.pdf, Namenode HA Framework.pdf
>
>


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to