[jira] [Commented] (HDFS-12323) NameNode terminates after full GC thinking QJM unresponsive if full GC is much longer than timeout

Konstantin Shvachko (JIRA) Mon, 11 Sep 2017 19:25:01 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-12323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16162384#comment-16162384
 ]


Konstantin Shvachko commented on HDFS-12323:
--------------------------------------------

The change looks good you are increasing timeout by the actual GC pause.
My only concerns that you removed private no-argument constructor 
{{QuorumCall()}}, which can lead to different problems, like if it is used 
somewhere via reflections or if you subclass {{QuorumCall}} in the future. May 
be better to add it back saying explicitly not to use anywhere.

> NameNode terminates after full GC thinking QJM unresponsive if full GC is 
> much longer than timeout
> --------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-12323
>                 URL: https://issues.apache.org/jira/browse/HDFS-12323
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode, qjm
>    Affects Versions: 2.7.4
>            Reporter: Erik Krogen
>            Assignee: Erik Krogen
>         Attachments: HDFS-12323.000.patch, HDFS-12323.001.patch
>
>
> HDFS-10733 attempted to fix the issue where the Namenode process would 
> terminate itself if it had a GC pause which lasted longer than the QJM 
> timeout, since it would think that the QJM had taken too long to respond. 
> However, it only bumps up the timeout expiration by one timeout length, so if 
> the GC pause was e.g. 2x the length of the timeout, a TimeoutException will 
> be thrown and the NN will still terminate itself.
> Thanks to [~yangjiandan] for noting this issue as a comment on HDFS-10733; we 
> have also seen this issue on a real cluster even after HDFS-10733 is applied.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDFS-12323) NameNode terminates after full GC thinking QJM unresponsive if full GC is much longer than timeout

Reply via email to