[
https://issues.apache.org/jira/browse/CHUKWA-487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12865918#action_12865918
]
Bill Graham commented on CHUKWA-487:
------------------------------------
Here's what I saw in the logs when I had to restart my NN. It took a little
while to exit safe mode. I had to restore from he secondary name node so there
might have been some data loss upon restore.
131122010-05-06 17:32:19,515 INFO Timer-3 SeqFileWriter -
stat:datacollection.writer.hdfs dataSize=318716 dataRate=10622
2010-05-06 17:32:49,518 INFO Timer-3 SeqFileWriter -
stat:datacollection.writer.hdfs dataSize=196741 dataRate=6557
2010-05-06 17:33:06,367 INFO Timer-1 root -
stats:ServletCollector,numberHTTPConnection:129,numberchunks:217
2010-05-06 17:33:19,521 INFO Timer-3 SeqFileWriter -
stat:datacollection.writer.hdfs dataSize=0 dataRate=0
2010-05-06 17:33:49,523 INFO Timer-3 SeqFileWriter -
stat:datacollection.writer.hdfs dataSize=0 dataRate=0
2010-05-06 17:34:01,142 WARN
org.apache.hadoop.dfs.dfsclient$leasechec...@36b60b93 DFSClient - Problem
renewing lease for DFSClient_-10
88933168: org.apache.hadoop.ipc.RemoteException:
org.apache.hadoop.dfs.SafeModeException: Cannot renew lease for
DFSClient_-1088933168.
Name node is in safe mode.
The ratio of reported blocks 0.0000 has not reached the threshold 0.9990. Safe
mode will be turned off automatically.
at org.apache.hadoop.dfs.FSNamesystem.renewLease(FSNamesystem.java:1823)
at org.apache.hadoop.dfs.NameNode.renewLease(NameNode.java:458)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:890)
at org.apache.hadoop.ipc.Client.call(Client.java:716)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
at org.apache.hadoop.dfs.$Proxy0.renewLease(Unknown Source)
at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at org.apache.hadoop.dfs.$Proxy0.renewLease(Unknown Source)
at org.apache.hadoop.dfs.DFSClient$LeaseChecker.run(DFSClient.java:781)
at java.lang.Thread.run(Thread.java:619)
2010-05-06 17:34:01,608 WARN Timer-2094 SeqFileWriter - Got an exception in
rotate
org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.dfs.SafeModeException:
Cannot complete file
/chukwa/logs/201006172737418_xxxxxxxxxcom_71ea99261284ab9f0566faa.chukwa. Name
node is in safe mode.
The ratio of reported blocks 0.0000 has not reached the threshold 0.9990. Safe
mode will be turned off automatically.
at
org.apache.hadoop.dfs.FSNamesystem.completeFileInternal(FSNamesystem.java:1209)
at
org.apache.hadoop.dfs.FSNamesystem.completeFile(FSNamesystem.java:1200)
at org.apache.hadoop.dfs.NameNode.complete(NameNode.java:351)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:890)
at org.apache.hadoop.ipc.Client.call(Client.java:716)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
at org.apache.hadoop.dfs.$Proxy0.complete(Unknown Source)
at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at org.apache.hadoop.dfs.$Proxy0.complete(Unknown Source)
at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.closeInternal(DFSClient.java:2736)
at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.close(DFSClient.java:2657)
at
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:59)
at
org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:79)
at
org.apache.hadoop.chukwa.datacollection.writer.SeqFileWriter.rotate(SeqFileWriter.java:194)
at
org.apache.hadoop.chukwa.datacollection.writer.SeqFileWriter$1.run(SeqFileWriter.java:235)
at java.util.TimerThread.mainLoop(Timer.java:512)
at java.util.TimerThread.run(Timer.java:462)
2010-05-06 17:34:01,647 FATAL Timer-2094 SeqFileWriter - IO Exception in
rotate. Exiting!
2010-05-06 17:34:01,661 FATAL btpool0-6248 SeqFileWriter - IOException when
trying to write a chunk, Collector is going to exit!
java.io.IOException: Stream closed.
at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.isClosed(DFSClient.java:2245)
at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.writeChunk(DFSClient.java:2481)
at
org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunk(FSOutputSummer.java:155)
at
org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:132)
at
org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:121)
at org.apache.hadoop.fs.FSOutputSummer.write1(FSOutputSummer.java:112)
at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:86)
at
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:47)
at java.io.DataOutputStream.write(DataOutputStream.java:90)
at
org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1016)
at
org.apache.hadoop.chukwa.datacollection.writer.SeqFileWriter.add(SeqFileWriter.java:281)
at
org.apache.hadoop.chukwa.datacollection.collector.servlet.ServletCollector.accept(ServletCollector.java:152)
at
org.apache.hadoop.chukwa.datacollection.collector.servlet.ServletCollector.doPost(ServletCollector.java:190)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:727)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at
org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:487)
at
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:362)
at
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
at
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:729)
at
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:324)
at
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:505)
at
org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:843)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:647)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:211)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:380)
at
org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:395)
at
org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:450)
2010-05-06 17:34:06,370 INFO Timer-1 root -
stats:ServletCollector,numberHTTPConnection:28,numberchunks:0
2010-05-06 17:35:06,375 INFO Timer-1 root -
stats:ServletCollector,numberHTTPConnection:0,numberchunks:0
2010-05-06 17:36:06,379 INFO Timer-1 root -
stats:ServletCollector,numberHTTPConnection:0,numberchunks:0
2010-05-06 17:37:06,384 INFO Timer-1 root -
stats:ServletCollector,numberHTTPConnection:0,numberchunks:0
...
> Collector left in a bad state after temprorary NN outage
> --------------------------------------------------------
>
> Key: CHUKWA-487
> URL: https://issues.apache.org/jira/browse/CHUKWA-487
> Project: Hadoop Chukwa
> Issue Type: Bug
> Components: data collection
> Affects Versions: 0.4.0
> Reporter: Bill Graham
>
> When the name node returns errors to the collector, at some point the
> collector dies half way. This behavior should be changed to either resemble
> the agents and keep trying, or to completely shutdown. Instead, what I'm
> seeing is that the collector logs that it's shutting down, and the
> var/pidDir/Collector.pid file gets removed, but the collector continues to
> run, albeit not handling new data. Instead, this log entry is repeated ad
> infinitum:
> 2010-05-06 17:35:06,375 INFO Timer-1 root -
> stats:ServletCollector,numberHTTPConnection:0,numberchunks:0
> 2010-05-06 17:36:06,379 INFO Timer-1 root -
> stats:ServletCollector,numberHTTPConnection:0,numberchunks:0
> 2010-05-06 17:37:06,384 INFO Timer-1 root -
> stats:ServletCollector,numberHTTPConnection:0,numberchunks:0
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.