Re: Hadoop Namenode problem

Joey Echeverria Thu, 28 Jul 2011 05:47:27 -0700

Nothing from around 1630?

-Joey




On Jul 28, 2011, at 5:06, Rahul Das <rahul.h...@gmail.com> wrote:

> Hi Joey,
> 
> The log is too big to attach into mail. What I found that there is no error 
> during this time. 
> Only few Warnings are coming like
> 
> 2011-07-21 14:13:47,814 WARN 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
> PendingReplicationMonitor timed out block blk_-6058282241824946206_13375223
> ...
> ...
> 2011-07-21 14:30:49,511 WARN 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Inconsistent size for 
> block blk_8615896953045629213_15838442 reported from xx.xx.xx.xx:50010 
> current size is 1950720 reported size is 2448907
> 
> I think the edit file size was too huge thats why it took long time.
> 
> Regards,
> Rahul
> 
> On Fri, Jul 22, 2011 at 9:33 PM, Joey Echeverria <j...@cloudera.com> wrote:
> The long startup time after the restart looks like it was caused because the 
> SecondaryNameNode hasn't been able to roll the edits log for some time. Can 
> you post your Namenode log from around the same time in this 
> SecondaryNameNode log (2011-07-21 16:00-16:30)?
> 
> -Joey
> 
> 
> On Fri, Jul 22, 2011 at 8:29 AM, Rahul Das <rahul.h...@gmail.com> wrote:
> Yes I have a secondary Namenode running. Here are the log for 
> SecondaryNamenode
> 
> 2011-07-21 16:02:47,908 INFO org.apache.hadoop.hdfs.server.common.Storage: 
> Edits file /home/hadoop/tmp/dfs/namesecondary/current/edits of size 12751835 
> edits # 138217 loaded in 1581 seconds.
> 2011-07-21 16:03:21,925 INFO org.apache.hadoop.hdfs.server.common.Storage: 
> Image file of size 2045516451 saved in 29 seconds.
> 2011-07-21 16:03:24,974 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of transactions: 
> 0 Total time for transactions(ms): 0Number of transactions batched in Syncs: 
> 0 Number of syncs: 0 SyncTimes(ms): 0 
> 2011-07-21 16:03:25,545 INFO 
> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Posted URL 
> xx.xx.xx.xx:50070putimage=1&port=50090&machine=xx.xx.xx.xx&token=-18:1554828842:0:1311242583000:1311240481442
> 2011-07-21 16:29:24,356 ERROR 
> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Exception in 
> doCheckpoint: 
> 2011-07-21 16:29:24,358 ERROR 
> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: 
> java.io.IOException: Call to xx.xx.xx.xx:9000 failed on local exception: 
> java.io.IOException: Connection reset by peer
> 
> Regards,
> Rahul
> 
> 
> On Fri, Jul 22, 2011 at 5:40 PM, Joey Echeverria <j...@cloudera.com> wrote:
> Do you have an instance of the SecondaryNamenode in your cluster?
> 
> -Joey
> 
> 
> On Fri, Jul 22, 2011 at 3:15 AM, Rahul Das <rahul.h...@gmail.com> wrote:
> Hi,
> 
> I am running a Hadoop cluster with 20 Data node. Yesterday I found that the 
> Namenode was not responding ( No write/read to HDFS is happening). It got 
> stuck for few hours, then I shut down the Namenode and found the following 
> error from the Name node log.
> 
> 2011-07-21 16:15:31,500 WARN org.apache.hadoop.ipc.Server: IPC Server 
> Responder, call 
> getProtocolVersion(org.apache.hadoop.hdfs.protocol.ClientProtocol, 41) from 
> xx.xx.xx.xx:13568: output error
> 
> This error was coming for every data node and data nodes are not able to 
> communicate with the Name node
> 
> After I restart the Namenode
> 
> 2011-07-21 16:31:54,110 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: 
> STARTUP_MSG:
> 2011-07-21 16:31:54,216 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: 
> Initializing RPC Metrics with hostName=NameNode, port=9000
> 2011-07-21 16:31:54,223 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: 
> Namenode up at: xx.xx.xx.xx:9000
> 2011-07-21 16:31:54,225 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: 
> Initializing JVM Metrics with processName=NameNode, sessionId=null
> 2011-07-21 16:31:54,226 INFO 
> org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics: Initializing 
> NameNodeMeterics using context 
> object:org.apache.hadoop.metrics.spi.NullContext
> 2011-07-21 16:31:54,280 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=hadoop,hadoop
> 2011-07-21 16:31:54,280 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup
> 2011-07-21 16:31:54,280 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: isPermissionEnabled=false
> 2011-07-21 16:31:54,287 INFO 
> org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics: 
> Initializing FSNamesystemMetrics using context 
> object:org.apache.hadoop.metrics.spi.NullContext
> 2011-07-21 16:31:54,289 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered 
> FSNamesystemStatusMBean
> 2011-07-21 16:31:54,880 INFO org.apache.hadoop.hdfs.server.common.Storage: 
> Number of files = 15817482
> 2011-07-21 16:34:38,463 INFO org.apache.hadoop.hdfs.server.common.Storage: 
> Number of files under construction = 82
> 2011-07-21 16:34:41,177 INFO org.apache.hadoop.hdfs.server.common.Storage: 
> Image file of size 2042701824 loaded in 166 seconds.
> 2011-07-21 16:58:07,624 INFO org.apache.hadoop.hdfs.server.common.Storage: 
> Edits file /home/hadoop/current/edits of size 12751835 edits # 138217 loaded 
> in 1406 seconds.
> 
> And it goes for a long halt. After about an hour it starts working again.
> 
> My question is when the error "IPC Server Responde" comes and is there a way 
> to deal with it.
> Also if my Namenode is busy doing something then what is the way to find out 
> what it is doing.
> 
> Regards,
> Rahul
> 
> 
> 
> -- 
> Joseph Echeverria
> Cloudera, Inc.
> 443.305.9434
> 
> 
> 
> 
> 
> -- 
> Joseph Echeverria
> Cloudera, Inc.
> 443.305.9434
> 
>

Re: Hadoop Namenode problem

Reply via email to