[ 
https://issues.apache.org/jira/browse/HADOOP-5118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jerome Boulon updated HADOOP-5118:
----------------------------------

    Description: 
Watchdog is watching for ChukwaAgent only once every 5 minutes, so there's no 
point in retrying more than once every 5 mins.

In practice, if the watchdog is not able to automatically restart the agent, it 
will take more than 20 minutes to get Ops to restart it.
Also Ops want us to limit the number of communications between Hadoop and 
Chukwa, that's why 30 minutes.

  was:
if the agent is down, most chances are that either it will be up again not 
before 1 minute (watchdog) or it will take longer
So it's better to retry in 1 minute for the first time then try every 30 
minutes for the next 24 hours



> ChukwaAgent controller should retry to register for a longer period but not 
> as frequent as now 
> -----------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-5118
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5118
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/chukwa
>            Reporter: Jerome Boulon
>            Assignee: Jerome Boulon
>         Attachments: HADOOP-5118-2.patch, HADOOP-5118.patch
>
>
> Watchdog is watching for ChukwaAgent only once every 5 minutes, so there's no 
> point in retrying more than once every 5 mins.
> In practice, if the watchdog is not able to automatically restart the agent, 
> it will take more than 20 minutes to get Ops to restart it.
> Also Ops want us to limit the number of communications between Hadoop and 
> Chukwa, that's why 30 minutes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to