[
https://issues.apache.org/jira/browse/HADOOP-4805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Robert Chansler updated HADOOP-4805:
------------------------------------
Release Note: (was: Removed black list collector feature from Chukwa
Agent HTTP Sender.)
No release note for "just a bug."
> Remove black list feature from Chukwa Agent to Chukwa Collector communication
> -----------------------------------------------------------------------------
>
> Key: HADOOP-4805
> URL: https://issues.apache.org/jira/browse/HADOOP-4805
> Project: Hadoop Core
> Issue Type: Bug
> Components: contrib/chukwa
> Environment: Redhat EL 5, Java 6
> Reporter: Eric Yang
> Assignee: Eric Yang
> Fix For: 0.20.0
>
> Attachments: HADOOP-4805.patch
>
>
> Recently, new load balance algorithm was added to improve chukwa agent to
> chukwa collector communication. The design was to send one HTTP POST per
> collector, and rotate through the list of collector to load balance the
> collectors. When a collector fail to respond, the collector is black listed
> for 5 minutes. If all collectors are not responding, sleep for random 1-5
> minutes. Unfortunately, this algorithm produced problem for slower machines.
> The slower machines end up black list all collectors and sleep indefinitely.
> This ticket is to restore the algorithm to the original design. The agent
> will shuffle the collector list. The agent will try it's best effort to make
> HTTP POST to the same collector until error occurs, then it will iterate
> through the list of random collectors.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.