Do you have an instance of the SecondaryNamenode in your cluster?

-Joey

On Fri, Jul 22, 2011 at 3:15 AM, Rahul Das <rahul.h...@gmail.com> wrote:

> Hi,
>
> I am running a Hadoop cluster with 20 Data node. Yesterday I found that the
> Namenode was not responding ( No write/read to HDFS is happening). It got
> stuck for few hours, then I shut down the Namenode and found the following
> error from the Name node log.
>
> 2011-07-21 16:15:31,500 WARN org.apache.hadoop.ipc.Server: IPC Server
> Responder, call
> getProtocolVersion(org.apache.hadoop.hdfs.protocol.ClientProtocol, 41) from
> xx.xx.xx.xx:13568: output error
>
> This error was coming for every data node and data nodes are not able to
> communicate with the Name node
>
> After I restart the Namenode
>
> 2011-07-21 16:31:54,110 INFO
> org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG:
> 2011-07-21 16:31:54,216 INFO org.apache.hadoop.ipc.metrics.RpcMetrics:
> Initializing RPC Metrics with hostName=NameNode, port=9000
> 2011-07-21 16:31:54,223 INFO
> org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at:
> xx.xx.xx.xx:9000
> 2011-07-21 16:31:54,225 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
> Initializing JVM Metrics with processName=NameNode, sessionId=null
> 2011-07-21 16:31:54,226 INFO
> org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics: Initializing
> NameNodeMeterics using context
> object:org.apache.hadoop.metrics.spi.NullContext
> 2011-07-21 16:31:54,280 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=hadoop,hadoop
> 2011-07-21 16:31:54,280 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup
> 2011-07-21 16:31:54,280 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
> isPermissionEnabled=false
> 2011-07-21 16:31:54,287 INFO
> org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics:
> Initializing FSNamesystemMetrics using context
> object:org.apache.hadoop.metrics.spi.NullContext
> 2011-07-21 16:31:54,289 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered
> FSNamesystemStatusMBean
> 2011-07-21 16:31:54,880 INFO org.apache.hadoop.hdfs.server.common.Storage:
> Number of files = 15817482
> 2011-07-21 16:34:38,463 INFO org.apache.hadoop.hdfs.server.common.Storage:
> Number of files under construction = 82
> 2011-07-21 16:34:41,177 INFO org.apache.hadoop.hdfs.server.common.Storage:
> Image file of size 2042701824 loaded in 166 seconds.
> 2011-07-21 16:58:07,624 INFO org.apache.hadoop.hdfs.server.common.Storage:
> Edits file /home/hadoop/current/edits of size 12751835 edits # 138217 loaded
> in 1406 seconds.
>
> And it goes for a long halt. After about an hour it starts working again.
>
> My question is when the error "IPC Server Responde" comes and is there a
> way to deal with it.
> Also if my Namenode is busy doing something then what is the way to find
> out what it is doing.
>
> Regards,
> Rahul




-- 
Joseph Echeverria
Cloudera, Inc.
443.305.9434

Reply via email to