[ https://issues.apache.org/jira/browse/HADOOP-5118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jerome Boulon updated HADOOP-5118: ---------------------------------- Description: Watchdog is watching for ChukwaAgent only once every 5 minutes, so there's no point in retrying more than once every 5 mins. In practice, if the watchdog is not able to automatically restart the agent, it will take more than 20 minutes to get Ops to restart it. Also Ops want us to limit the number of communications between Hadoop and Chukwa, that's why 30 minutes. was: if the agent is down, most chances are that either it will be up again not before 1 minute (watchdog) or it will take longer So it's better to retry in 1 minute for the first time then try every 30 minutes for the next 24 hours > ChukwaAgent controller should retry to register for a longer period but not > as frequent as now > ----------------------------------------------------------------------------------------------- > > Key: HADOOP-5118 > URL: https://issues.apache.org/jira/browse/HADOOP-5118 > Project: Hadoop Core > Issue Type: Improvement > Components: contrib/chukwa > Reporter: Jerome Boulon > Assignee: Jerome Boulon > Attachments: HADOOP-5118-2.patch, HADOOP-5118.patch > > > Watchdog is watching for ChukwaAgent only once every 5 minutes, so there's no > point in retrying more than once every 5 mins. > In practice, if the watchdog is not able to automatically restart the agent, > it will take more than 20 minutes to get Ops to restart it. > Also Ops want us to limit the number of communications between Hadoop and > Chukwa, that's why 30 minutes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.