On Tue, 28 Jan 2020 at 11:19, Robert Raszuk <rob...@raszuk.net> wrote:
> So at t0+N I record how many packets entered my system. (We are already at > loss here as RE can generate packets unless you add to this RE outbound > packets). Then at t0+N+uS (uS) delta of switching via fabric you record > number of packets which left the box. > > What is your N and uS ? You're not gonna get it 1:1, you will monitor the delta rate you see and react when the delta rate increases. Of course you can keep tuning this, by adding more and more drop counters to reduce known delta rate, but you always have to accept you can't explain it perfectly. But not every small issue is an important issue. Certainly your fabric lost 30% would have been blatantly obvious even in most naive such system. > > You don't need ML/AI to find problems in your network, using algorithm > > 'this counter which increments at rate X stopped incrementing or > > started to increment 100 times slower' > > Well the way I read Adam's note was that learning this rate X is what he > (IMHO correctly) calls ML :) What I mean current_rate = X, if now_rate > X*100 or now_rate < X/100, no ML, just stupid static comparison of dramatic rate change. And even this is advanced by today's standard. Even counter rate went to 0 from non-zero or went to non-zero from 0 exposes lot of real issues, but issues which happen so rarely customers are not complaining about them. Particular example, all of us have some ip checksum errors in the network, when it's on edge router, edge interface ingress direction you can ignore it 'someone elses problem', but we also see it in other interface/direction where it means we flipped bits somewhere and calculated correct FCS over the broken data, i.e. we have broken memory somewhere. But it probably isn't broken enough to matter, it probably mangles packets rather rarely. -- ++ytti _______________________________________________ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp