On Jan 15, 2014, at 10:26 AM, William Herrin <b...@herrin.us> wrote:

> 
> Of course working, monitorable and testable are three different
> things. If my NMS can't reach the IXP's addresses, my view of the IXP
> is impaired. And "the Internet is broken" is not a trouble report that
> leads to a successful outcome with customer support... it helps to be
> able to pin things down with some specificity.

This approach concerns me for a number of reasons.

First, having your NMS ping your upstream’s IXP peers probably doesn’t scale. 
If I’m a peer of a reasonably large provider, I’m pretty sure I don’t want all 
their customers hammering my management plane. Even if you’re the only one 
doing it, you also don’t know if I’m rate-limiting pings for that or any other 
reason.

Second, what information do you get that you didn’t already have? If you saw 
the IP in a traceroute then you know it exists, is alive, is in the path, and a 
rough estimation of the latency. Pinging it may even give you negative 
information. Platforms vary and all, but in my experience pinging a router, 
especially a potentially busy one peering at an IXP, shows notably worse 
performance than “real” traffic experiences (admittedly somewhat true of TTL 
Expired responses, but less so in my experience). Now you’re potentially seeing 
high latency and packet loss which in reality might not even be there at all.

Third, you don’t know that your ping to the peering IP is even taking the same 
path as the packets addressed to the real destination. MTR for example looks 
nice, but it would probably be more accurate if it simply ran the traceroute 
over and over instead of pinging each hop directly. You would also detect path 
changes for the real destination that pinging intermediate hops wouldn’t show 
you.

While I appreciate the desire to be able to do as much of your own detective 
work as possible, I can also see where you’re now shifting workload onto 
someone else’s support organization when they’re not necessarily the problem 
either (“Hey, my NMS says your peering router is causing latency and packet 
loss, fix it!”).

I’m also not saying there isn’t a troubleshooting gap caused by this. I’m just 
not sure being able to ping the IXP hop solves that problem either.


Semi-related tangent: Working in an IXP setting I have seen weird corner cases 
cause issues in conjunction with the IXP subnet existing in BGP. Say someone’s 
got proxy ARP enabled on their router (sadly, more common than it should be, 
and not just from noobs at startups). Now say your IXP is growing and you 
expand the subnet. No matter how much you harp on the customers to make the 
change, they don’t all do it at once. Someone announces the new, larger subnet 
in BGP. Now when anyone ARPs for IPs in the new part of the range, proxy ARP 
guy (still on the smaller subnet) says “hey I have a route for that, send it 
here”. That was fun to troubleshoot. :)


-c


Reply via email to