Do you have an instance of the SecondaryNamenode in your cluster? -Joey
On Fri, Jul 22, 2011 at 3:15 AM, Rahul Das <rahul.h...@gmail.com> wrote: > Hi, > > I am running a Hadoop cluster with 20 Data node. Yesterday I found that the > Namenode was not responding ( No write/read to HDFS is happening). It got > stuck for few hours, then I shut down the Namenode and found the following > error from the Name node log. > > 2011-07-21 16:15:31,500 WARN org.apache.hadoop.ipc.Server: IPC Server > Responder, call > getProtocolVersion(org.apache.hadoop.hdfs.protocol.ClientProtocol, 41) from > xx.xx.xx.xx:13568: output error > > This error was coming for every data node and data nodes are not able to > communicate with the Name node > > After I restart the Namenode > > 2011-07-21 16:31:54,110 INFO > org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG: > 2011-07-21 16:31:54,216 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: > Initializing RPC Metrics with hostName=NameNode, port=9000 > 2011-07-21 16:31:54,223 INFO > org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at: > xx.xx.xx.xx:9000 > 2011-07-21 16:31:54,225 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: > Initializing JVM Metrics with processName=NameNode, sessionId=null > 2011-07-21 16:31:54,226 INFO > org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics: Initializing > NameNodeMeterics using context > object:org.apache.hadoop.metrics.spi.NullContext > 2011-07-21 16:31:54,280 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=hadoop,hadoop > 2011-07-21 16:31:54,280 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup > 2011-07-21 16:31:54,280 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: > isPermissionEnabled=false > 2011-07-21 16:31:54,287 INFO > org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics: > Initializing FSNamesystemMetrics using context > object:org.apache.hadoop.metrics.spi.NullContext > 2011-07-21 16:31:54,289 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered > FSNamesystemStatusMBean > 2011-07-21 16:31:54,880 INFO org.apache.hadoop.hdfs.server.common.Storage: > Number of files = 15817482 > 2011-07-21 16:34:38,463 INFO org.apache.hadoop.hdfs.server.common.Storage: > Number of files under construction = 82 > 2011-07-21 16:34:41,177 INFO org.apache.hadoop.hdfs.server.common.Storage: > Image file of size 2042701824 loaded in 166 seconds. > 2011-07-21 16:58:07,624 INFO org.apache.hadoop.hdfs.server.common.Storage: > Edits file /home/hadoop/current/edits of size 12751835 edits # 138217 loaded > in 1406 seconds. > > And it goes for a long halt. After about an hour it starts working again. > > My question is when the error "IPC Server Responde" comes and is there a > way to deal with it. > Also if my Namenode is busy doing something then what is the way to find > out what it is doing. > > Regards, > Rahul -- Joseph Echeverria Cloudera, Inc. 443.305.9434