[
https://issues.apache.org/jira/browse/HDFS-12323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16162384#comment-16162384
]
Konstantin Shvachko commented on HDFS-12323:
--------------------------------------------
The change looks good you are increasing timeout by the actual GC pause.
My only concerns that you removed private no-argument constructor
{{QuorumCall()}}, which can lead to different problems, like if it is used
somewhere via reflections or if you subclass {{QuorumCall}} in the future. May
be better to add it back saying explicitly not to use anywhere.
> NameNode terminates after full GC thinking QJM unresponsive if full GC is
> much longer than timeout
> --------------------------------------------------------------------------------------------------
>
> Key: HDFS-12323
> URL: https://issues.apache.org/jira/browse/HDFS-12323
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: namenode, qjm
> Affects Versions: 2.7.4
> Reporter: Erik Krogen
> Assignee: Erik Krogen
> Attachments: HDFS-12323.000.patch, HDFS-12323.001.patch
>
>
> HDFS-10733 attempted to fix the issue where the Namenode process would
> terminate itself if it had a GC pause which lasted longer than the QJM
> timeout, since it would think that the QJM had taken too long to respond.
> However, it only bumps up the timeout expiration by one timeout length, so if
> the GC pause was e.g. 2x the length of the timeout, a TimeoutException will
> be thrown and the NN will still terminate itself.
> Thanks to [~yangjiandan] for noting this issue as a comment on HDFS-10733; we
> have also seen this issue on a real cluster even after HDFS-10733 is applied.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]