Siddharth Wagle created AMBARI-16828: ----------------------------------------
Summary: Support round-robin scheduling with failover for Sinks with distributed collector Key: AMBARI-16828 URL: https://issues.apache.org/jira/browse/AMBARI-16828 Project: Ambari Issue Type: Task Components: amvari-me Affects Versions: 2.4.1 Reporter: Siddharth Wagle Assignee: Siddharth Wagle Fix For: 2.4.1 - Initial set of collectors is configured in the configuration files - Find available collectors by connecting to zookeeper thereafter - Remember available collectors, refresh this information only when one collector cannot be reached with a very low frequency of checks, example: random interval between 10-12 minutes, check if a new collector is available. Set a low client side zk timeout. - Round robin the write between the collector choosing the first one at random - If a write timed out, choose the next available collector, remember the attempts with the first one - Set a configurable attempt count for failed connector (default = 3), after which the failed connector is no longer in the available collectors list. - The next retry will be triggered after refresh with zookeeper is successful - If no failed collectors available, zk refresh interval should be chosen randomly between 1-2 minutes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)