[
https://issues.apache.org/jira/browse/HDFS-12323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Erik Krogen updated HDFS-12323:
-------------------------------
Attachment: HDFS-12323.001.patch
Attaching v001 patch which now does include the unit test.
> NameNode terminates after full GC thinking QJM unresponsive if full GC is
> much longer than timeout
> --------------------------------------------------------------------------------------------------
>
> Key: HDFS-12323
> URL: https://issues.apache.org/jira/browse/HDFS-12323
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: namenode, qjm
> Affects Versions: 2.7.4
> Reporter: Erik Krogen
> Assignee: Erik Krogen
> Attachments: HDFS-12323.000.patch, HDFS-12323.001.patch
>
>
> HDFS-10733 attempted to fix the issue where the Namenode process would
> terminate itself if it had a GC pause which lasted longer than the QJM
> timeout, since it would think that the QJM had taken too long to respond.
> However, it only bumps up the timeout expiration by one timeout length, so if
> the GC pause was e.g. 2x the length of the timeout, a TimeoutException will
> be thrown and the NN will still terminate itself.
> Thanks to [~yangjiandan] for noting this issue as a comment on HDFS-10733; we
> have also seen this issue on a real cluster even after HDFS-10733 is applied.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]