I could't add comments to the document, thus am posting my comments to the
mailing list
One more approach could be to do the current measurement as it is, but
instead of leaving the quitting decision to the stmgtclient, have
stmgrclientmgr do the decision. Thus everytime a stmgr client detects
connection issues, inform that to stmgrclientmgr which keeps a map of
peerstmgrid to error count. Thus it is able to decide things like am i
seeing connection errors from all stmgrs or if only a few of them are
having issues. Then it can take the decisions better.

On Sat, Feb 3, 2018 at 8:11 PM, Ning Wang <wangnin...@gmail.com> wrote:

> Hi, heron devs~
>
> I think the current stream manager's quitting logic on connection failures
> is problematic. We saw a few internal cases in Twitter that this logic
> could cause extra issue.
>
> Here is a doc with more details:
>
> https://docs.google.com/document/d/1WHNc2NEp2gVL9ge2QVKp9t4Hpd4U9
> sAbzBqCu4-iDUM/edit#
>
> Comments and feedbacks are welcome!
>
> Thanks.
> --ning
>

Reply via email to