[jira] [Commented] (HDFS-3962) NN should periodically check writability of 'required' journals

Kihwal Lee (JIRA) Fri, 03 May 2013 08:30:16 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-3962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13648496#comment-13648496
 ]


Kihwal Lee commented on HDFS-3962:
----------------------------------

In general the storage error detection and handling is not ideal.

* NNStorage : The checks in here seems mainly for detecting ROFS. The checks 
are performed when rolling edit or writing an fsimage.
* FSEditLog : I/O errors are handled when logSync() fails. The storage state is 
kept locally and not shared with NNStorage.
* Resource monitor : Enter safe mode if resource requirement is not met. E.g. 
not enough space. 

Since NNStorage is unaware of the errors detected in FSEditLog, the failed 
storages will get retried on the next edit rolling. The restoreFailedStorage 
setting and admin command have no effect, since they only applies to NNStorage. 
 It will be better if these are tied together in some way, especially when 
journals are also written to the image directory.

Another important piece missing is configurable storage timeout. I assume HA 
journal managers are fine and it is just FileJournalManager. In most cases, 
logSync() will be the first one who sees storage errors in namenode. (regular 
logging will also see it if the same fs is used) For local disks, I/O timeout 
depends on the driver, which is usually much longer than what a HA namenode 
wants it to be.  If the thread stuck in flush() can be interrupted on timeout, 
local storage errors can be more reliably detected and service availability can 
improve.  Unfortunately most test cases only simulate ROFS or layout changes 
due to difficulties simulating or creating this condition, so this most common 
failure mode is often missed in testing. You might have more experience on 
this, as you have done a lot of testing for HA.

For NFS mounts, one could adjust mount options to make I/O timeout early rather 
than hanging. But I still think FileJournalManager should have a capability to 
timeout.  Even if NN does not depend on local storage for the data consistency 
and correctness, it does affect service availability if it does not have 
control over how long it will block.  

I don't mean to hijack this jira, but wanted to hear your view on storage error 
detection and handling, and perhaps file jiras if we can identify what should 
be done.
                
> NN should periodically check writability of 'required' journals
> ---------------------------------------------------------------
>
>                 Key: HDFS-3962
>                 URL: https://issues.apache.org/jira/browse/HDFS-3962
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: ha, namenode
>    Affects Versions: 3.0.0, 2.0.1-alpha
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>         Attachments: hdfs-3962.txt
>
>
> Currently, our HA design ensures "write fencing" by having the failover 
> controller call a fencing script before transitioning a new node to active. 
> However, if the fencing script is based on storage fencing (and not stonith), 
> there is no _read_ fencing. That is to say, the old active may continue to 
> believe himself active for an unbounded amount of time, assuming that it does 
> not try to write to its edit log.
> This isn't super problematic, but it would be beneficial for monitoring, etc, 
> to have the old NN periodically check the writability of any "required" 
> journals, and abort if they become unwritable, even if there are no writes 
> coming into the system.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3962) NN should periodically check writability of 'required' journals

Reply via email to