Hi Joey, The log is too big to attach into mail. What I found that there is no error during this time. Only few Warnings are coming like
2011-07-21 14:13:47,814 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: PendingReplicationMonitor timed out block blk_-6058282241824946206_13375223 ... ... 2011-07-21 14:30:49,511 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Inconsistent size for block blk_8615896953045629213_15838442 reported from xx.xx.xx.xx:50010 current size is 1950720 reported size is 2448907 I think the edit file size was too huge thats why it took long time. Regards, Rahul On Fri, Jul 22, 2011 at 9:33 PM, Joey Echeverria <j...@cloudera.com> wrote: > The long startup time after the restart looks like it was caused because > the SecondaryNameNode hasn't been able to roll the edits log for some time. > Can you post your Namenode log from around the same time in this > SecondaryNameNode log (2011-07-21 16:00-16:30)? > > -Joey > > > On Fri, Jul 22, 2011 at 8:29 AM, Rahul Das <rahul.h...@gmail.com> wrote: > >> Yes I have a secondary Namenode running. Here are the log for >> SecondaryNamenode >> >> 2011-07-21 16:02:47,908 INFO org.apache.hadoop.hdfs.server.common.Storage: >> Edits file /home/hadoop/tmp/dfs/namesecondary/current/edits of size 12751835 >> edits # 138217 loaded in 1581 seconds. >> 2011-07-21 16:03:21,925 INFO org.apache.hadoop.hdfs.server.common.Storage: >> Image file of size 2045516451 saved in 29 seconds. >> 2011-07-21 16:03:24,974 INFO >> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of transactions: >> 0 Total time for transactions(ms): 0Number of transactions batched in Syncs: >> 0 Number of syncs: 0 SyncTimes(ms): 0 >> 2011-07-21 16:03:25,545 INFO >> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Posted URL >> xx.xx.xx.xx:50070putimage=1&port=50090&machine=xx.xx.xx.xx&token=-18:1554828842:0:1311242583000:1311240481442 >> 2011-07-21 16:29:24,356 ERROR >> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Exception in >> doCheckpoint: >> 2011-07-21 16:29:24,358 ERROR >> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: >> java.io.IOException: Call to xx.xx.xx.xx:9000 failed on local exception: >> java.io.IOException: Connection reset by peer >> >> Regards, >> Rahul >> >> >> On Fri, Jul 22, 2011 at 5:40 PM, Joey Echeverria <j...@cloudera.com>wrote: >> >>> Do you have an instance of the SecondaryNamenode in your cluster? >>> >>> -Joey >>> >>> >>> On Fri, Jul 22, 2011 at 3:15 AM, Rahul Das <rahul.h...@gmail.com> wrote: >>> >>>> Hi, >>>> >>>> I am running a Hadoop cluster with 20 Data node. Yesterday I found that >>>> the Namenode was not responding ( No write/read to HDFS is happening). It >>>> got stuck for few hours, then I shut down the Namenode and found the >>>> following error from the Name node log. >>>> >>>> 2011-07-21 16:15:31,500 WARN org.apache.hadoop.ipc.Server: IPC Server >>>> Responder, call >>>> getProtocolVersion(org.apache.hadoop.hdfs.protocol.ClientProtocol, 41) from >>>> xx.xx.xx.xx:13568: output error >>>> >>>> This error was coming for every data node and data nodes are not able to >>>> communicate with the Name node >>>> >>>> After I restart the Namenode >>>> >>>> 2011-07-21 16:31:54,110 INFO >>>> org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG: >>>> 2011-07-21 16:31:54,216 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: >>>> Initializing RPC Metrics with hostName=NameNode, port=9000 >>>> 2011-07-21 16:31:54,223 INFO >>>> org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at: >>>> xx.xx.xx.xx:9000 >>>> 2011-07-21 16:31:54,225 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: >>>> Initializing JVM Metrics with processName=NameNode, sessionId=null >>>> 2011-07-21 16:31:54,226 INFO >>>> org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics: >>>> Initializing >>>> NameNodeMeterics using context >>>> object:org.apache.hadoop.metrics.spi.NullContext >>>> 2011-07-21 16:31:54,280 INFO >>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=hadoop,hadoop >>>> 2011-07-21 16:31:54,280 INFO >>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup >>>> 2011-07-21 16:31:54,280 INFO >>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: >>>> isPermissionEnabled=false >>>> 2011-07-21 16:31:54,287 INFO >>>> org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics: >>>> Initializing FSNamesystemMetrics using context >>>> object:org.apache.hadoop.metrics.spi.NullContext >>>> 2011-07-21 16:31:54,289 INFO >>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered >>>> FSNamesystemStatusMBean >>>> 2011-07-21 16:31:54,880 INFO >>>> org.apache.hadoop.hdfs.server.common.Storage: Number of files = 15817482 >>>> 2011-07-21 16:34:38,463 INFO >>>> org.apache.hadoop.hdfs.server.common.Storage: Number of files under >>>> construction = 82 >>>> 2011-07-21 16:34:41,177 INFO >>>> org.apache.hadoop.hdfs.server.common.Storage: Image file of size >>>> 2042701824 loaded in 166 seconds. >>>> 2011-07-21 16:58:07,624 INFO >>>> org.apache.hadoop.hdfs.server.common.Storage: Edits file >>>> /home/hadoop/current/edits of size 12751835 edits # 138217 loaded in 1406 >>>> seconds. >>>> >>>> And it goes for a long halt. After about an hour it starts working >>>> again. >>>> >>>> My question is when the error "IPC Server Responde" comes and is there a >>>> way to deal with it. >>>> Also if my Namenode is busy doing something then what is the way to find >>>> out what it is doing. >>>> >>>> Regards, >>>> Rahul >>> >>> >>> >>> >>> -- >>> Joseph Echeverria >>> Cloudera, Inc. >>> 443.305.9434 >>> >>> >> > > > -- > Joseph Echeverria > Cloudera, Inc. > 443.305.9434 > >