Ken, I should have made clear I wasn't replying to you. I was replying to Brielle's comment:
> Is it bad that the first thing that came to mind is "Oh FFS, another troll"? -mel beckman > On Jul 7, 2016, at 2:35 PM, Ken Chase <m...@sizone.org> wrote: > > On Thu, Jul 07, 2016 at 08:32:19PM +0000, Mel Beckman said: >> Yes. It indicates that there was never a time when you did not know >> everything :) >> >> -mel beckman > > The issue isnt knowing everything, it's making accusations of issues while > you still > dont know how much you dont know. (~D. Rumsfeld) -- My customers in a nutshell > (they pay to be able to yell about random stuff I guess, and I provide that > service!). > > The OP didnt make any accusations however, and just asked what was going on > (sorry > if I sounded harsh in reply). Once, Google having a 8.8.8.8 failure locally on > its (anycast?) dns servers resulted in dozens of calls to us "your server > hosting our site must be down!! Our website isnt working! People are calling > us!". > > Most of my work is with these situations is spent proving it's not our fault. > Mtr makes it very hard because it's a very subtle tool, and only gives partial > information. (I still think mtr is a killer app though!) > > consider this (fake, example) trace: > > 6. 100ge13-1.core1.chi1.he.net 0.0% 10 > 7. 100ge14-1.core2.chi1.he.net 0.0% 10 > 8. 100ge3-1.core1.sjc2.he.net 30.0% 10 > 9. ??? > 10. UNKNOWN-216-115-101-X.yahoo.com 10.0% 10 > 11. routerer-ext.ysv.freebsd.org 20.0% 10 > 12. wfe0.ysv.freebsd.org 30.0% 10 > > First off, the OP may have asked "who's fault is hop 9, yahoo or HE?" and > seen it > as an issue. Ignoring that for now, the rest of the packetloss is an issue -- > where is the problem though? > > This is very tricky - it looks like hop 8 is at fault of course - or is it > just dropping ICMP as it's allowed to? How did hop 10 get only 10% loss then > if > 8 has 30? Is 8 then dropping ~20% (not statistically correct..) of ICMP just > cuz > it can, and then having a 'real' 10% loss on top of that? > > Or it's hop 11? But hop 12 has more PL, perhaps hop 12 is the issue > all along and 8 10 and 11 are just dropping ICMP? Or it's 8, 11 and 12 doing > ~10% each? (not statistically correct.) > > Can't say for sure - it's a probabilities game - and being completely correct > about it, hop 6 isn't blameless either (just very unlikely to be at fault > statistically, though not impossible with only 10 pings per hop - a > statistician > can calculate it for us). > > This is why more pings are required to be sure of the situation - I like to do > -i 0.1 -c 100 so it's completed quickly before conditions change. Then you > can make a statistically valid pronouncement of where the problem MIGHT BE > within a useful confidence interval - however, without the return route we're > still largely in the dark as to the actual location of the issue. You cant be > '100% sure' with this stuff - technically speaking, it's all 'luck of the > draw'. > > (Beware: this one time, at band camp, some etherchannel or equiv at HE was > showing PL only for specific ips in any target subnet -- because they were > xor'ing > the source & target IP to load balance and one channel was wonky. Fun times > debugging that one: "WFM from here, what's your issue?") > > /kc > -- > Ken Chase - k...@heavycomputing.ca skype:kenchase23 +1 416 897 6284 Toronto > Canada > Heavy Computing - Clued bandwidth, colocation and managed linux VPS @151 > Front St. W.