Hi,

I am running a Hadoop cluster with 20 Data node. Yesterday I found that the
Namenode was not responding ( No write/read to HDFS is happening). It got
stuck for few hours, then I shut down the Namenode and found the following
error from the Name node log.

2011-07-21 16:15:31,500 WARN org.apache.hadoop.ipc.Server: IPC Server
Responder, call
getProtocolVersion(org.apache.hadoop.hdfs.protocol.ClientProtocol, 41) from
xx.xx.xx.xx:13568: output error

This error was coming for every data node and data nodes are not able to
communicate with the Name node

After I restart the Namenode

2011-07-21 16:31:54,110 INFO
org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG:
2011-07-21 16:31:54,216 INFO org.apache.hadoop.ipc.metrics.RpcMetrics:
Initializing RPC Metrics with hostName=NameNode, port=9000
2011-07-21 16:31:54,223 INFO
org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at:
xx.xx.xx.xx:9000
2011-07-21 16:31:54,225 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
Initializing JVM Metrics with processName=NameNode, sessionId=null
2011-07-21 16:31:54,226 INFO
org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics: Initializing
NameNodeMeterics using context
object:org.apache.hadoop.metrics.spi.NullContext
2011-07-21 16:31:54,280 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=hadoop,hadoop
2011-07-21 16:31:54,280 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup
2011-07-21 16:31:54,280 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
isPermissionEnabled=false
2011-07-21 16:31:54,287 INFO
org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics:
Initializing FSNamesystemMetrics using context
object:org.apache.hadoop.metrics.spi.NullContext
2011-07-21 16:31:54,289 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered
FSNamesystemStatusMBean
2011-07-21 16:31:54,880 INFO org.apache.hadoop.hdfs.server.common.Storage:
Number of files = 15817482
2011-07-21 16:34:38,463 INFO org.apache.hadoop.hdfs.server.common.Storage:
Number of files under construction = 82
2011-07-21 16:34:41,177 INFO org.apache.hadoop.hdfs.server.common.Storage:
Image file of size 2042701824 loaded in 166 seconds.
2011-07-21 16:58:07,624 INFO org.apache.hadoop.hdfs.server.common.Storage:
Edits file /home/hadoop/current/edits of size 12751835 edits # 138217 loaded
in 1406 seconds.

And it goes for a long halt. After about an hour it starts working again.

My question is when the error "IPC Server Responde" comes and is there a way
to deal with it.
Also if my Namenode is busy doing something then what is the way to find out
what it is doing.

Regards,
Rahul

Reply via email to