[ 
https://issues.apache.org/jira/browse/HDFS-3962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13648505#comment-13648505
 ] 

Kihwal Lee commented on HDFS-3962:
----------------------------------

Here is one more thing I though about. I think NN exits when a required journal 
is unavailable. This is fine for start-up, but during run-time something else 
can be tried. If the failed one is the shared edit dir, some users may want to 
have an option to continue serving if FC determines SBN is dead or experiencing 
the same problem. The fail-over may be disabled at this point. This will 
increase the service availability.  The downside is that we cannot assume the 
shared edit dir is always source of truth. But I think this will be a special 
case where manual recovery and manual re-enabling of fail-over are needed.
                
> NN should periodically check writability of 'required' journals
> ---------------------------------------------------------------
>
>                 Key: HDFS-3962
>                 URL: https://issues.apache.org/jira/browse/HDFS-3962
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: ha, namenode
>    Affects Versions: 3.0.0, 2.0.1-alpha
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>         Attachments: hdfs-3962.txt
>
>
> Currently, our HA design ensures "write fencing" by having the failover 
> controller call a fencing script before transitioning a new node to active. 
> However, if the fencing script is based on storage fencing (and not stonith), 
> there is no _read_ fencing. That is to say, the old active may continue to 
> believe himself active for an unbounded amount of time, assuming that it does 
> not try to write to its edit log.
> This isn't super problematic, but it would be beneficial for monitoring, etc, 
> to have the old NN periodically check the writability of any "required" 
> journals, and abort if they become unwritable, even if there are no writes 
> coming into the system.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to