Yeah. That is an option too. In fact it was my first try:
https://github.com/twitter/heron/pull/2693 (just an initiative, not
completed, a count map should be used instead of a single total count)

In most cases, I think both solutions should have the same result. A few
reasons I changed to a tmaster check:
- with tmaster, there is only one source of truth and tmaster is more
critical anyway. If the tmaster link is not healthy, stmgrs won't work
correctly: topology may have created replacement nodes but the disconnected
nodes could keep going by themselves.
- it is more straightforward. The logic is the same as the current one. One
the other side, if we use an array for all remote stmgrs, we could have a
smarter logic (which is good) but it could make stmgrs more complicated and
less straightforward (bad). I left the stmgr counters there so if in future
we decide to add this feature, it should be easy to add. There is a gap
between "errors from all" and "errors from a few" and this is not a
simple/quick question.




On Sun, Feb 4, 2018 at 6:48 PM, Sanjeev Kulkarni <sanjee...@gmail.com>
wrote:

> I could't add comments to the document, thus am posting my comments to the
> mailing list
> One more approach could be to do the current measurement as it is, but
> instead of leaving the quitting decision to the stmgtclient, have
> stmgrclientmgr do the decision. Thus everytime a stmgr client detects
> connection issues, inform that to stmgrclientmgr which keeps a map of
> peerstmgrid to error count. Thus it is able to decide things like am i
> seeing connection errors from all stmgrs or if only a few of them are
> having issues. Then it can take the decisions better.
>
> On Sat, Feb 3, 2018 at 8:11 PM, Ning Wang <wangnin...@gmail.com> wrote:
>
> > Hi, heron devs~
> >
> > I think the current stream manager's quitting logic on connection
> failures
> > is problematic. We saw a few internal cases in Twitter that this logic
> > could cause extra issue.
> >
> > Here is a doc with more details:
> >
> > https://docs.google.com/document/d/1WHNc2NEp2gVL9ge2QVKp9t4Hpd4U9
> > sAbzBqCu4-iDUM/edit#
> >
> > Comments and feedbacks are welcome!
> >
> > Thanks.
> > --ning
> >
>

Reply via email to