Would appreciate feedback/comments on this proposal. Thanks Anindya
> On Feb 12, 2017, at 9:03 PM, Anindya Sinha <anindya_si...@apple.com> wrote: > > Reference: https://issues.apache.org/jira/browse/MESOS-7087 > <https://issues.apache.org/jira/browse/MESOS-7087> > > Currently, we have at least 3 types of backoff such as: > 1) Exponential backoff with randomness, as in framework/agent registration. > 2) Exponential backoff with no randomness, as in status updates. > 3) Linear backoff with randomness, as in executor registration. > > In framework registration as an example, each retry ranges between [0 .. > b*2^(n-1)] for nth retry attempt as long as each interval is less than 1 min. > > For clusters with large number of frameworks and/or agents, the randomness > may not be enough since the timeout can end up being very small for a > substantial number of clients (agents and/or frameworks) due to the fact that > the allowed range is [0 .. <n>] for all retry attempts. > > The following doc looks at an enhancement to the existing proposal to ensure > that the timeout values are not extremely small, and that every subsequent > retry should have a timeout value atleast as much as the previous iteration. > > https://docs.google.com/document/d/1nUxvh6BbB8jv5G-MvckGj9XzFYLBrUM0O5Go_Zmdftk/edit?usp=sharing > > <https://docs.google.com/document/d/1nUxvh6BbB8jv5G-MvckGj9XzFYLBrUM0O5Go_Zmdftk/edit?usp=sharing> > > Feedback welcome. > > Thanks > Anindya >