[ 
https://issues.apache.org/jira/browse/HDFS-12323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Krogen updated HDFS-12323:
-------------------------------
    Attachment: HDFS-12323.000.patch

Attaching v000 patch which solves the issue by measuring the estimated pause 
time and increasing the end (timeout) time by that amount instead of the 
initial wait time.

I have a test prepared as well but it relies on {{StopWatch}} being able to be 
controlled during a test; filed HADOOP-14827 which I've already submitted a 
patch for. After that goes through I will attach the patch with the test but 
will refrain from doing so for now to avoid upsetting Jenkins.

> NameNode terminates after full GC thinking QJM unresponsive if full GC is 
> much longer than timeout
> --------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-12323
>                 URL: https://issues.apache.org/jira/browse/HDFS-12323
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode, qjm
>    Affects Versions: 2.7.4
>            Reporter: Erik Krogen
>            Assignee: Erik Krogen
>         Attachments: HDFS-12323.000.patch
>
>
> HDFS-10733 attempted to fix the issue where the Namenode process would 
> terminate itself if it had a GC pause which lasted longer than the QJM 
> timeout, since it would think that the QJM had taken too long to respond. 
> However, it only bumps up the timeout expiration by one timeout length, so if 
> the GC pause was e.g. 2x the length of the timeout, a TimeoutException will 
> be thrown and the NN will still terminate itself.
> Thanks to [~yangjiandan] for noting this issue as a comment on HDFS-10733; we 
> have also seen this issue on a real cluster even after HDFS-10733 is applied.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to