[jira] Updated: (CHUKWA-487) Collector left in a bad state after temprorary NN outage

Ari Rabkin (JIRA) Mon, 10 May 2010 14:06:55 -0700

     [ 
https://issues.apache.org/jira/browse/CHUKWA-487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Ari Rabkin updated CHUKWA-487:
------------------------------

    Priority: Blocker  (was: Major)

Also, this is a big enough problem for reliability that it should block any 
0.5.  

> Collector left in a bad state after temprorary NN outage
> --------------------------------------------------------
>
>                 Key: CHUKWA-487
>                 URL: https://issues.apache.org/jira/browse/CHUKWA-487
>             Project: Hadoop Chukwa
>          Issue Type: Bug
>          Components: data collection
>    Affects Versions: 0.4.0
>            Reporter: Bill Graham
>            Priority: Blocker
>
> When the name node returns errors to the collector, at some point the 
> collector dies half way. This behavior should be changed to either resemble 
> the agents and keep trying, or to completely shutdown. Instead, what I'm 
> seeing is that the collector logs that it's shutting down, and the 
> var/pidDir/Collector.pid file gets removed, but the collector continues to 
> run, albeit not handling new data. Instead, this log entry is repeated ad 
> infinitum:
> 2010-05-06 17:35:06,375 INFO Timer-1 root - 
> stats:ServletCollector,numberHTTPConnection:0,numberchunks:0
> 2010-05-06 17:36:06,379 INFO Timer-1 root - 
> stats:ServletCollector,numberHTTPConnection:0,numberchunks:0
> 2010-05-06 17:37:06,384 INFO Timer-1 root - 
> stats:ServletCollector,numberHTTPConnection:0,numberchunks:0

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (CHUKWA-487) Collector left in a bad state after temprorary NN outage

Reply via email to