[
https://issues.apache.org/jira/browse/HDFS-3962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13648496#comment-13648496
]
Kihwal Lee commented on HDFS-3962:
----------------------------------
In general the storage error detection and handling is not ideal.
* NNStorage : The checks in here seems mainly for detecting ROFS. The checks
are performed when rolling edit or writing an fsimage.
* FSEditLog : I/O errors are handled when logSync() fails. The storage state is
kept locally and not shared with NNStorage.
* Resource monitor : Enter safe mode if resource requirement is not met. E.g.
not enough space.
Since NNStorage is unaware of the errors detected in FSEditLog, the failed
storages will get retried on the next edit rolling. The restoreFailedStorage
setting and admin command have no effect, since they only applies to NNStorage.
It will be better if these are tied together in some way, especially when
journals are also written to the image directory.
Another important piece missing is configurable storage timeout. I assume HA
journal managers are fine and it is just FileJournalManager. In most cases,
logSync() will be the first one who sees storage errors in namenode. (regular
logging will also see it if the same fs is used) For local disks, I/O timeout
depends on the driver, which is usually much longer than what a HA namenode
wants it to be. If the thread stuck in flush() can be interrupted on timeout,
local storage errors can be more reliably detected and service availability can
improve. Unfortunately most test cases only simulate ROFS or layout changes
due to difficulties simulating or creating this condition, so this most common
failure mode is often missed in testing. You might have more experience on
this, as you have done a lot of testing for HA.
For NFS mounts, one could adjust mount options to make I/O timeout early rather
than hanging. But I still think FileJournalManager should have a capability to
timeout. Even if NN does not depend on local storage for the data consistency
and correctness, it does affect service availability if it does not have
control over how long it will block.
I don't mean to hijack this jira, but wanted to hear your view on storage error
detection and handling, and perhaps file jiras if we can identify what should
be done.
> NN should periodically check writability of 'required' journals
> ---------------------------------------------------------------
>
> Key: HDFS-3962
> URL: https://issues.apache.org/jira/browse/HDFS-3962
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: ha, namenode
> Affects Versions: 3.0.0, 2.0.1-alpha
> Reporter: Todd Lipcon
> Assignee: Todd Lipcon
> Attachments: hdfs-3962.txt
>
>
> Currently, our HA design ensures "write fencing" by having the failover
> controller call a fencing script before transitioning a new node to active.
> However, if the fencing script is based on storage fencing (and not stonith),
> there is no _read_ fencing. That is to say, the old active may continue to
> believe himself active for an unbounded amount of time, assuming that it does
> not try to write to its edit log.
> This isn't super problematic, but it would be beneficial for monitoring, etc,
> to have the old NN periodically check the writability of any "required"
> journals, and abort if they become unwritable, even if there are no writes
> coming into the system.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira