Siddharth Wagle created AMBARI-16828:
----------------------------------------

             Summary: Support round-robin scheduling with failover for Sinks 
with distributed collector
                 Key: AMBARI-16828
                 URL: https://issues.apache.org/jira/browse/AMBARI-16828
             Project: Ambari
          Issue Type: Task
          Components: amvari-me
    Affects Versions: 2.4.1
            Reporter: Siddharth Wagle
            Assignee: Siddharth Wagle
             Fix For: 2.4.1


- Initial set of collectors is configured in the configuration files
- Find available collectors by connecting to zookeeper thereafter 
- Remember available collectors, refresh this information only when one 
collector cannot be reached with a very low frequency of checks, example: 
random interval between 10-12 minutes, check if a new collector is available. 
Set a low client side zk timeout.
- Round robin the write between the collector choosing the first one at random
- If a write timed out, choose the next available collector, remember the 
attempts with the first one
- Set a configurable attempt count for failed connector (default = 3), after 
which the failed connector is no longer in the available collectors list. 
- The next retry will be triggered after refresh with zookeeper is successful
- If no failed collectors available, zk refresh interval should be chosen 
randomly between 1-2 minutes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to