No there was no error only following things happens. 2011-07-21 14:14:30,039 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=hadoop,hadoop ip=/xx.xx.xx.xx cmd=create src=/user/hdfs/files/d954x328-85x8-4dfe-b73c-34a7a2c1xb0f dst=null perm=hadoop:supergroup:rw-r--r-- 2011-07-21 14:14:30,041 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.allocateBlock: /user/hdfs/files/d954x328-85x8-4dfe-b73c-34a7a2c1xb0f . blk_-3217626427379030207_15834365 2011-07-21 14:14:30,120 INFO org.apache.hadoop.hdfs.StateChange: DIR* NameSystem.completeFile: file /user/hdfs/files/d954x328-85x8-4dfe-b73c-34a7a2c1xb0f is closed by DFSClient_1277823200
Is there any way I can find out from the log when the safe mode gets over. Regards, Rahul On Thu, Jul 28, 2011 at 6:16 PM, Joey Echeverria <j...@cloudera.com> wrote: > Nothing from around 1630? > > -Joey > > > > On Jul 28, 2011, at 5:06, Rahul Das <rahul.h...@gmail.com> wrote: > > Hi Joey, > > The log is too big to attach into mail. What I found that there is no error > during this time. > Only few Warnings are coming like > > 2011-07-21 14:13:47,814 WARN > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: > PendingReplicationMonitor timed out block blk_-6058282241824946206_13375223 > ... > ... > 2011-07-21 14:30:49,511 WARN > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Inconsistent size for > block blk_8615896953045629213_15838442 reported from xx.xx.xx.xx:50010 > current size is 1950720 reported size is 2448907 > > I think the edit file size was too huge thats why it took long time. > > Regards, > Rahul > > On Fri, Jul 22, 2011 at 9:33 PM, Joey Echeverria < <j...@cloudera.com> > j...@cloudera.com> wrote: > >> The long startup time after the restart looks like it was caused because >> the SecondaryNameNode hasn't been able to roll the edits log for some time. >> Can you post your Namenode log from around the same time in this >> SecondaryNameNode log (2011-07-21 16:00-16:30)? >> >> -Joey >> >> >> On Fri, Jul 22, 2011 at 8:29 AM, Rahul Das < <rahul.h...@gmail.com> >> rahul.h...@gmail.com> wrote: >> >>> Yes I have a secondary Namenode running. Here are the log for >>> SecondaryNamenode >>> >>> 2011-07-21 16:02:47,908 INFO >>> org.apache.hadoop.hdfs.server.common.Storage: Edits file >>> /home/hadoop/tmp/dfs/namesecondary/current/edits of size 12751835 edits # >>> 138217 loaded in 1581 seconds. >>> 2011-07-21 16:03:21,925 INFO >>> org.apache.hadoop.hdfs.server.common.Storage: Image file of size >>> 2045516451 saved in 29 seconds. >>> 2011-07-21 16:03:24,974 INFO >>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of transactions: >>> 0 Total time for transactions(ms): 0Number of transactions batched in Syncs: >>> 0 Number of syncs: 0 SyncTimes(ms): 0 >>> 2011-07-21 16:03:25,545 INFO >>> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Posted URL >>> xx.xx.xx.xx:50070putimage=1&port=50090&machine=xx.xx.xx.xx&token=-18:1554828842:0:1311242583000:1311240481442 >>> 2011-07-21 16:29:24,356 ERROR >>> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Exception in >>> doCheckpoint: >>> 2011-07-21 16:29:24,358 ERROR >>> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: >>> java.io.IOException: Call to xx.xx.xx.xx:9000 failed on local exception: >>> java.io.IOException: Connection reset by peer >>> >>> Regards, >>> Rahul >>> >>> >>> On Fri, Jul 22, 2011 at 5:40 PM, Joey Echeverria < <j...@cloudera.com> >>> j...@cloudera.com> wrote: >>> >>>> Do you have an instance of the SecondaryNamenode in your cluster? >>>> >>>> -Joey >>>> >>>> >>>> On Fri, Jul 22, 2011 at 3:15 AM, Rahul Das < <rahul.h...@gmail.com> >>>> rahul.h...@gmail.com> wrote: >>>> >>>>> Hi, >>>>> >>>>> I am running a Hadoop cluster with 20 Data node. Yesterday I found that >>>>> the Namenode was not responding ( No write/read to HDFS is happening). It >>>>> got stuck for few hours, then I shut down the Namenode and found the >>>>> following error from the Name node log. >>>>> >>>>> 2011-07-21 16:15:31,500 WARN org.apache.hadoop.ipc.Server: IPC Server >>>>> Responder, call >>>>> getProtocolVersion(org.apache.hadoop.hdfs.protocol.ClientProtocol, 41) >>>>> from >>>>> xx.xx.xx.xx:13568: output error >>>>> >>>>> This error was coming for every data node and data nodes are not able >>>>> to communicate with the Name node >>>>> >>>>> After I restart the Namenode >>>>> >>>>> 2011-07-21 16:31:54,110 INFO >>>>> org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG: >>>>> 2011-07-21 16:31:54,216 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: >>>>> Initializing RPC Metrics with hostName=NameNode, port=9000 >>>>> 2011-07-21 16:31:54,223 INFO >>>>> org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at: >>>>> xx.xx.xx.xx:9000 >>>>> 2011-07-21 16:31:54,225 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: >>>>> Initializing JVM Metrics with processName=NameNode, sessionId=null >>>>> 2011-07-21 16:31:54,226 INFO >>>>> org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics: >>>>> Initializing >>>>> NameNodeMeterics using context >>>>> object:org.apache.hadoop.metrics.spi.NullContext >>>>> 2011-07-21 16:31:54,280 INFO >>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=hadoop,hadoop >>>>> 2011-07-21 16:31:54,280 INFO >>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup >>>>> 2011-07-21 16:31:54,280 INFO >>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: >>>>> isPermissionEnabled=false >>>>> 2011-07-21 16:31:54,287 INFO >>>>> org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics: >>>>> Initializing FSNamesystemMetrics using context >>>>> object:org.apache.hadoop.metrics.spi.NullContext >>>>> 2011-07-21 16:31:54,289 INFO >>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered >>>>> FSNamesystemStatusMBean >>>>> 2011-07-21 16:31:54,880 INFO >>>>> org.apache.hadoop.hdfs.server.common.Storage: Number of files = 15817482 >>>>> 2011-07-21 16:34:38,463 INFO >>>>> org.apache.hadoop.hdfs.server.common.Storage: Number of files under >>>>> construction = 82 >>>>> 2011-07-21 16:34:41,177 INFO >>>>> org.apache.hadoop.hdfs.server.common.Storage: Image file of size >>>>> 2042701824 loaded in 166 seconds. >>>>> 2011-07-21 16:58:07,624 INFO >>>>> org.apache.hadoop.hdfs.server.common.Storage: Edits file >>>>> /home/hadoop/current/edits of size 12751835 edits # 138217 loaded in 1406 >>>>> seconds. >>>>> >>>>> And it goes for a long halt. After about an hour it starts working >>>>> again. >>>>> >>>>> My question is when the error "IPC Server Responde" comes and is there >>>>> a way to deal with it. >>>>> Also if my Namenode is busy doing something then what is the way to >>>>> find out what it is doing. >>>>> >>>>> Regards, >>>>> Rahul >>>> >>>> >>>> >>>> >>>> -- >>>> Joseph Echeverria >>>> Cloudera, Inc. >>>> 443.305.9434 >>>> >>>> >>> >> >> >> -- >> Joseph Echeverria >> Cloudera, Inc. >> 443.305.9434 >> >> >