[
https://issues.apache.org/jira/browse/CHUKWA-487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12865920#action_12865920
]
Ari Rabkin commented on CHUKWA-487:
-----------------------------------
So there's basically two fixes. We can System.exit() on that error and hope the
daemon respawns, or try to handle it in Chukwa. I prefer the former approach.
Comments or objections?
> Collector left in a bad state after temprorary NN outage
> --------------------------------------------------------
>
> Key: CHUKWA-487
> URL: https://issues.apache.org/jira/browse/CHUKWA-487
> Project: Hadoop Chukwa
> Issue Type: Bug
> Components: data collection
> Affects Versions: 0.4.0
> Reporter: Bill Graham
>
> When the name node returns errors to the collector, at some point the
> collector dies half way. This behavior should be changed to either resemble
> the agents and keep trying, or to completely shutdown. Instead, what I'm
> seeing is that the collector logs that it's shutting down, and the
> var/pidDir/Collector.pid file gets removed, but the collector continues to
> run, albeit not handling new data. Instead, this log entry is repeated ad
> infinitum:
> 2010-05-06 17:35:06,375 INFO Timer-1 root -
> stats:ServletCollector,numberHTTPConnection:0,numberchunks:0
> 2010-05-06 17:36:06,379 INFO Timer-1 root -
> stats:ServletCollector,numberHTTPConnection:0,numberchunks:0
> 2010-05-06 17:37:06,384 INFO Timer-1 root -
> stats:ServletCollector,numberHTTPConnection:0,numberchunks:0
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.