[
https://issues.apache.org/jira/browse/HDFS-4233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13505972#comment-13505972
]
Kihwal Lee commented on HDFS-4233:
----------------------------------
bq. I am unsure of what fd exhaustion means (is it hitting nofile limits?),....
Yes. In a very big cluster, we've seen NN running out of 64K file descriptors.
I was told that it can be raised further (e.g. 1M) without much negative impact
on performance, at least on Linux. So there are ways to avoid it or minimize
the possibility, but NN still needs to be able to deal with the situation.
Monitoring and limiting number of connections can be tricky. Ideally we want
the average number to be reasonable, but also want NN to absorb a short burst
of requests instead of rejecting them. The client-side retry mechanism will
require some changes, if IPC start actively rejecting requests. The things get
very nasty if IPC connections get "reset" or fall into syn backlog and stay
there for long. Massive lease renewal failures will likely occur and that will
cause block recoveries and so on. In short, protecting namenode might be
simple, but that sometimes actually hurt cluster availability.
> NN keeps serving even after no journals started while rolling edit
> ------------------------------------------------------------------
>
> Key: HDFS-4233
> URL: https://issues.apache.org/jira/browse/HDFS-4233
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: name-node
> Affects Versions: 0.23.5
> Reporter: Kihwal Lee
> Priority: Critical
>
> We've seen namenode keeps serving even after rollEditLog() failure. Instead
> of taking a corrective action or regard this condition as FATAL, it keeps on
> serving and modifying its file system state. No logs are written from this
> point, so if the namenode is restarted, there will be data loss.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira