Saisai Shao created CHUKWA-663:
----------------------------------
Summary: SystemMetrics parse error
Key: CHUKWA-663
URL: https://issues.apache.org/jira/browse/CHUKWA-663
Project: Chukwa
Issue Type: Bug
Components: Data Collection
Affects Versions: 0.6.0
Environment: RHEL6.1 Hadoop 1.0.3 HBase 0.94.0
Reporter: Saisai Shao
In some OS environment, Chukwa Agent cannot fetch all the system data, but in
the Chukwa Collect, the format is hard coded, so this will lead to a
NullPointException, and this NullPointException will lead to other Exception.
For example:
In my environment, swap information will no be fetched, so in this parse class
(org.apache.hadoop.chukwa.extraction.demux.processor.mapper.SystemMetrics.java)
a exception will be thrown:
java.lang.NullPointerException
at
org.apache.hadoop.chukwa.extraction.demux.processor.mapper.SystemMetrics.parse(SystemMetrics.java:156)
at
org.apache.hadoop.chukwa.extraction.demux.processor.mapper.AbstractProcessor.process(AbstractProcessor.java:82)
at
org.apache.hadoop.chukwa.datacollection.writer.hbase.HBaseWriter.add(HBaseWriter.java:194)
at
org.apache.hadoop.chukwa.datacollection.writer.SocketTeeWriter.add(SocketTeeWriter.java:252)
at
org.apache.hadoop.chukwa.datacollection.writer.PipelineStageWriter.add(PipelineStageWriter.java:40)
at
org.apache.hadoop.chukwa.datacollection.collector.servlet.ServletCollector.accept(ServletCollector.java:159)
at
org.apache.hadoop.chukwa.datacollection.collector.servlet.ServletCollector.doPost(ServletCollector.java:208)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:727)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at
org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
at
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:401)
at
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at
org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at
org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
at
org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:451)
In this line: JSONObject swap = (JSONObject) json.get("swap");
the swap object in null, which will lead to next line's exception. And
AbstractProcessor class will catch this error in this line:
saveChunkInError(e), this error will also be packaged in a KeyValue format, and
insert in HBase. But HBase table schema has no column family named
"SystemMetricsInError", so another exception will be introduced:
WARN btpool0-54 HBaseWriter -
org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1
action: DoNotRetryIOException: 1 time, servers with issues: sr116:60020,
at
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1591)
at
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1367)
at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:945)
at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:801)
at org.apache.hadoop.hbase.client.HTable.put(HTable.java:784)
at
org.apache.hadoop.hbase.client.HTablePool$PooledHTable.put(HTablePool.java:402)
at
org.apache.hadoop.chukwa.datacollection.writer.hbase.HBaseWriter.add(HBaseWriter.java:195)
at
org.apache.hadoop.chukwa.datacollection.writer.SocketTeeWriter.add(SocketTeeWriter.java:252)
at
org.apache.hadoop.chukwa.datacollection.writer.PipelineStageWriter.add(PipelineStageWriter.java:40)
at
org.apache.hadoop.chukwa.datacollection.collector.servlet.ServletCollector.accept(ServletCollector.java:159)
at
org.apache.hadoop.chukwa.datacollection.collector.servlet.ServletCollector.doPost(ServletCollector.java:208)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:727)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at
org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
at
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:401)
at
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at
org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at
org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
at
org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:451)
so there exists two problems:
1. until you meet parse error, you have already collect a part of metric data.
But after you throw exception, you do not continue parse left parts of data,
this will lead to half-integrity data in this timestamp saved in HBase, if your
half-integrity data do not include "system:ctags", you will not retrieve
cluster name, this really is the problem.
2. org.apache.hadoop.chukwa.extraction.demux.processor.mapper.SystemMetrics
class is not fine designed, some json keys are hard coded in the file, but
there is no any assurance that these keys is existed, miss protection code.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira