> From: Blake Dunlap [mailto:[email protected]] > While any provider will attempt to fix peer / upstream issues as they can, any > SLA you would have is between two points on their private network, not > from point A to point Z that they have no control over across multiple peers > and the public internet itself.
makes sense- thanks for confirming > The much more common design is using a single > provider for each thread between sites. Then at least you have an end-to- > end SLA in effect, as well as a single entity that is responsible for the > entire > link in question. > > This sounds like you're trying to achieve private link IGP / FRR level site > to site > failover/convergence across the public internet. Perhaps you should rethink > your goals here or your design? Kind of- I can actually tolerate the blips, but I want to be able to measure and track them in such a way that I know where the loss is occurring. If a particular path is reconverging more often than should be reasonably expected I want to be able to prove it within reason. We also have a customer who happens to host at DC B with the same connectivity. Every time there is one of these blips their alerting fires off a thousand messages and they open a ticket with us. I'd like to be able to show them some good data on the path during the blip so we back a discussion along the lines of "live with it, or pay to privately connect to us". -andy > -Blake > > On Mon, Jul 15, 2013 at 4:18 PM, Andy Litzinger > <[email protected]> wrote: > Hi, > > Does anyone have any recommendations on how to pinpoint and react to > packet loss across the internet? preferably in an automated fashion. For > detection I'm currently looking at trying smoketrace to run from inside my > network, but I'd love to be able to run traceroutes from my edge routers > triggered during periods of loss. I have Juniper MX80s on one end- which I'm > hopeful I'll be able to cobble together some combo of RPM and event > scripting to kick off a traceroute. We have Cisco4900Ms on the other end and > maybe the same thing is possible but I'm not so sure. > > I'd love to hear other suggestions and experience for detection and also for > options on what I might be able to do when loss is detected on a path. > > In my specific situation I control equipment on both ends of the path that I > care about with details below. > > we are a hosted service company and we currently have two data centers, > DC A and DC B. DC A uses juniper MX routers, advertises our own IP space > and takes full BGP feeds from two providers, ISPs A1 and A2. At DC B we > have a smaller installation and instead take redundant drops (and IP space) > from a single provider, ISP B1, who then peers upstream with two providers, > B2 and B3 > > We have a fairly consistent bi-directional stream of traffic between DC A and > DC B. Both of ISP A1 and A2 have good peering with ISP B2 so under normal > network conditions traffic flows across ISP B1 to B2 and then to either ISP A1 > or A2 > > oversimplified ascii pic showing only the normal best paths: > > -- ISP A1----------------------ISP B2-- DC A-- > | |--- ISP > B1 ----- DC B > -- ISP A2----------------------ISP B2-- > > > with increasing frequency we've been experiencing packet loss along the > path from DC A to DC B. Usually the periods of loss are brief, 30 seconds > to a > minute, but they are total blackouts. > > I'd like to be able to collect enough relevant data to pinpoint the trouble > spot as much as possible so I can take it to the ISPs and request a > solution. The blackouts are so quick that it's impossible to log in and get a > trace- hence the desire to automate it. > > I can provide more details off list if helpful- I'm trying not to vilify > anyone- > especially without copious amounts of data points. > > As a side question, what should my expectation be regarding packet loss > when sending packets from point A to point B across multiple providers > across the internet? Is 30 seconds to a minute of blackout between two > destinations every couple of weeks par for the course? My directly > connected ISPs offer me an SLA, but what should I reasonably expect from > them when one of their upstream peers (or a peer of their peers) has > issues? If this turns out to be BGP reconvergence or similar do I have any > options? > > many thanks, > -andy

