[
https://issues.apache.org/jira/browse/HADOOP-13657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Kihwal Lee resolved HADOOP-13657.
---------------------------------
Resolution: Duplicate
> IPC Reader thread could silently die and leave NameNode unresponsive
> --------------------------------------------------------------------
>
> Key: HADOOP-13657
> URL: https://issues.apache.org/jira/browse/HADOOP-13657
> Project: Hadoop Common
> Issue Type: Bug
> Components: ipc
> Reporter: Zhe Zhang
> Priority: Critical
>
> For each listening port, IPC {{Server#Listener#Reader}} is a single thread in
> charge of moving {{Connection}} items from {{pendingConnections}} (capacity
> 100) to the {{callQueue}}.
> We have experienced an incident where the {{Reader}} thread for HDFS NameNode
> died from runtime exception. Then the {{pendingConnections}} queue became
> full and the NameNode port became inaccessible.
> In our particular case, what killed {{Reader}} was a NPE caused by
> https://bugs.openjdk.java.net/browse/JDK-8024883. But in general, other types
> of runtime exceptions could cause this issue as well.
> We should add logic to either make the {{Reader}} more robust in case of
> runtime exceptions, or at least treat it as a FATAL exception so that
> NameNode can fail over to standby, and admins get alerted of the real issue.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]