[
https://issues.apache.org/jira/browse/HDFS-14201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16834468#comment-16834468
]
He Xiaoqiao commented on HDFS-14201:
------------------------------------
Hi [~surmountian] [~elgoiri], It is true that log output about exception stack
is very frequent. Once per second before leave safemode by default.
Another side, this stack is printed by RPC framework when meet exception. And
it seems that we have to throw HCFE/HealthCheckFailedException when do health
check and still in safemode since ZKFC can only know HCEF to decide NameNode
health or not.
I try to set {{HealthCheckFailedException}} as one of terse logging exception
when init RPC#Server in [^HDFS-14201.005.patch] and reduce exception log
output.
Please help to review. Thanks.
> Ability to disallow safemode NN to become active
> ------------------------------------------------
>
> Key: HDFS-14201
> URL: https://issues.apache.org/jira/browse/HDFS-14201
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: auto-failover
> Affects Versions: 3.1.1, 2.9.2
> Reporter: Xiao Liang
> Assignee: Xiao Liang
> Priority: Major
> Attachments: HDFS-14201.001.patch, HDFS-14201.002.patch,
> HDFS-14201.003.patch, HDFS-14201.004.patch, HDFS-14201.005.patch
>
>
> Currently with HA, Namenode in safemode can be possibly selected as active,
> for availability of both read and write, Namenodes not in safemode are better
> choices to become active though.
> It can take tens of minutes for a cold started Namenode to get out of
> safemode, especially when there are large number of files and blocks in HDFS,
> that means if a Namenode in safemode become active, the cluster will be not
> fully functioning for quite a while, even if it can while there is some
> Namenode not in safemode.
> The proposal here is to add an option, to allow Namenode to report itself as
> UNHEALTHY to ZKFC, if it's in safemode, so as to only allow fully functioning
> Namenode to become active, improving the general availability of the cluster.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]