Re: Evaluating Tier 1 Internet providers
On Wed, Aug 28, 2013 at 09:54:28AM -0700, Michael Smith wrote: It's really can reach versus how well can they reach. I can't any provider that would have less than a full view of the DFZ but, if your primary traffic is to Provider X, and one of your Tier 1's peers locally and the other peers in France, then you would look more closely at the closer one. Unless, of course, that local peer was saturated 99% of the time. Then France might be attractive. One thing to keep in mind is that for major Tier 1s, it's not at all uncommon to see some very large percentages of traffic (like say well north of 50%) stay completely on-net, going from customer to customer. In this type of model, capacity to other third party peers (typically the other Tier 1's) becomes secondary to other considerations like backbone capacity, which is why those huge Tier 1 networks often have much less peering capacity than you might otherwise expect. Tier 2's on the other hand, typically spend the vast majority of their time/money/effort figuring out how they can deliver traffic to other networks via peering and transit relationships. This usually means they have much smaller amounts of backbone capacity, but relative to their total sizes they often have a lot more capacity to the other major peering/transit networks. The economics of each model are vastly different too. Tier 2's are typically always looking to take advantage of tricks like hot potato routing and 95th percentile billing to get free inbound to minimize their backhaul costs. All too often people tend to get caught in the mentral trap of thinking peering == free, but in reality the Tier 1's are just shifting the majority of their operational costs into backbone instead, and peering becomes the way to handle the leftovers. Each model has its advantages and weaknesses, but most people who haven't lived in both worlds tend to vastly underestimate the realities of the other side's cost models. There is a lot to be said for the value of a Tier 2 network. Sometimes throwing a token amount of money at a problem solves it much more effectively than waiting for two squabbling Tier 1's to fight over the principal of not paying anything or risking being perceived as weak. And a Tier 2 with multiple transit paths and extensive peering options may be able to easily reroute traffic around a particular problem spot in a way that a Tier 1 just doesn't have the ability to do. Then again, sometimes there is value in just buying transit from someone who operates a massive entwork, with the economy of scale necessary to implement terabits of backbone capacity for cheap, and a huge customer base. As for the which one should I buy question, a smart person would realize the different strengths and weaknesses of each model, and probably end up buying from (at least) one of each to take advantage of this. Of course in reality 99% of people fail to understand any of this, and turn off their brains after thinking things like 1 2 so it must be better. :) -- Richard A Steenbergen r...@e-gerbil.net http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
Re: Evaluating Tier 1 Internet providers
On Thu, Aug 29, 2013 at 08:25:41PM -0700, Luke S. Crawford wrote: I have no idea how to solve this sort of problem automatically. Ideally, if someone has a congested or down link, I'd prefer that they not announce routes to that part of the internet, as I do have a backup, but that isn't how it works. BGP best path routing decisions are made by completely irrelevent criteria like AS-PATH lengths and lower router-id's, and are completely blind to things that actualy matter like latency, capacity, packet loss, etc. Fundamentally it's impossible to fix automatically with the current routing protocols, and at best the protocol extensions like BGP AIGP (which could help at least convey some of the data, like the oh crap I just got rerouted to a different exit with much higher latency situation you mentioned) are still a long way from being practically usable. At best you can aim your default/tie breaks towards networks you have more faith in, but that doesn't mean much in practice. :) -- Richard A Steenbergen r...@e-gerbil.net http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
Re: nLayer IP transit
On Fri, Aug 02, 2013 at 07:11:34AM +1000, Mark Tees wrote: Thanks for the replies. I think I saw somewhere around the Cloudflare outage post someone mentioning that since the person at Juniper that was responsible for Flowspec left it all went down hill. I take it then Flowspec is still used internally then? I am still wondering if its best to avoid Flowspec and roll your own firewall rules applied via Netconf for transit interfaces to achieve the same sort of functionality. It's a lot less likely to go south if you control the routes that go into the system. That said, it still breaks some things just by having it enabled (like NSR, though I suppose one could argue that NSR breaks itself :P), so you might be better served with a netconf distribution of rules if you want to avoid those potential issues. -- Richard A Steenbergen r...@e-gerbil.net http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
Re: nLayer IP transit
On Thu, Aug 01, 2013 at 10:00:49AM +1000, Mark Tees wrote: Howdy listers, I remember reading a while back that customers of nLayer IP transit services could send in Flowspec rules to nLayer. Anyone know if that is true/current? We were forced to stop offering flowspec connections to customers, after we started experiencing a rash of issues with it. Among other things, we found ways for flowspec generated rules to easily cause non line-rate performance on Juniper MX boxes, and we had a couple of incidents where customer generated routes were able to cause cascading failure behaviors like crashing the firewall compiler processes across the entire network. I previously mentioned some of this here: http://mailman.nanog.org/pipermail/nanog/2011-January/030051.html There have also been a few other high profile outages caused by bugs in the Juniper implementation, for example: https://support.cloudflare.com/entries/23294588-CloudFlare-Post-Mortem-from-Outage-on-March-3-2013 As a concept I still very much like Flowspec, and wish we could continue to offer it to customers, but as with any new routing protocol there are significant risks of network-wide impact if the implementation is not stable. IMHO Juniper has done a horrible job of maintaining support for Flowspec in recent years, and has effectively abandoned doing the proper testing and support necessary to run it in production. Until that changes, or until some other major router vendors pick it up and do better with it, I don't expect to see any major changes in this position any time soon. -- Richard A Steenbergen r...@e-gerbil.net http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
Re: GTT/Inteliquent/nLayer
On Wed, Jul 31, 2013 at 09:28:50AM -0400, Tim Durack wrote: Any experience/comments on the GTT Global eXpress service? Looks interesting but odd. Why would I use a virtual IXP? Who participates? Comments on-list or off-list are fine. This was an old PacketExchange service, essentially just a single large VPLS-based global layer 2 virtual IXP service, which combined long-haul transport and multi-party interconnection. It's somewhat interesting as a concept (since I'm not aware of anyone else offering anything similar), but IMHO not the most practical thing in the world, which is why it hasn't really been promoted as a new product in many years. If you've heard differently, please contact me off-list. :) -- Richard A Steenbergen r...@e-gerbil.net http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
Re: Issues with level3?
On Tue, Jan 15, 2013 at 04:12:12PM +, Network Operations wrote: Anyone seeing any issues with level3? We can connect to every other IP in our Class C. When tracerouting to individual IP's, (x.x.x.50/51/52/53) we get a drop at ge-4-16.car2.Washington1.Level3.net [4.59.146.53] for 50, but 51 is fine, drop for 52, 53 is fine. Sounds like a classic problem with a member of a bundle (like a link-agg or ECMP) breaking. Level3 tends not to do anything in bundles of 2, so you might want to look elsewhere, like with your own connections to them, possibly on the reverse path. Now, please go find a blunt object and hit yourself in the head as punishment for using the words Class C in 2013 in a non-historic or ironic context. Hard. :) -- Richard A Steenbergen r...@e-gerbil.net http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
Re: [j-nsp] Krt queue issues
On Tue, Jan 08, 2013 at 03:45:10PM +0100, Tim Vollebregt wrote: Hi, What we do nowadays as some workaround, is configuring a default route towards a core router on 8 x 10G before maintaining an MX box. Which will be installed before BGP sessions come up, this will cause some packet loss during burst hour outages but is fine during maintenance hours. I've seen cases where it took up to 30 minutes before the full table was installed correctly in the PFE's. Currently this issue/bug is holding back our Juniper deployments. As far as I know Juniper created a project group for this bug, and so far they were able to reproduce the issue. Looks like the issue is being taken serious from now. PR 836197 I actually have very good luck reproducing it: http://cluepon.net/ras/rpdstall.png The issue appears to be that when rpd is busy processing incoming BGP updates (such as when you turn up a large number of peers simultaniously), it starves the rest of the process from actually spending any CPU time handling/installing the route. The graph above shows a plot of the total BGP paths, the number of routes in the pending state, and the number of routes actually installed into the forwarding hardware. This is a very simplified example (nothing but IBGP sessions with very simple policies here, not even any EBGP neighbors), using the latest top of the line routing engine, so in real life the issue is much worse. As you can see, while rpd is still busy receiving and processing the incoming updates, the number of pending routes rises and doesn't fall, and the number of routes installed in the PFE stays almost non-existant. A few routes actually manage to squeek in before all of the BGP sessions come up, which is why it has any at all for the period between 0 and 330 seconds. After the router finishes receiving the BGP paths, the pending routes clear very quickly, and then the FIB installation process begins. 8 minutes after turning up the BGP sessions, this router finally has a full table installed in hardware. The pending routes actually clear much quicker than this once the BGP routes stop coming int, I need to update this graph with a higher resolution to show it. :) Juniper actually DOES have a fix for this issue, tweaking the scheduler in rpd so that the router still processes BGP routes even when it's spending a lot of time receiving new routes. Unfortunately they haven't yet decided to prioritize implementing this fix, so it's still stuck in development. If this issue drives you as insane as it does me, I highly encourage you to talk to your account team about PR 836197 and why 8-20+ minutes to install routes to the FIB is not acceptable to you. -- Richard A Steenbergen r...@e-gerbil.net http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
Re: [j-nsp] Krt queue issues
On Tue, Jan 08, 2013 at 11:10:16PM +0100, bas wrote: Hi, On Tue, Jan 8, 2013 at 10:20 PM, Richard A Steenbergen r...@e-gerbil.net wrote: PR 836197 That looks like a spanking new PR number to me. The highest PR number I found in 12.2 release notes was 82. Rather strange that they didn't have an earlier PR number, while the issue has existed for such a long time. Oh I have a pile of PR's about a mile long, including some that I opened on this issue 5+ years ago. But I'm not going to harp on the complete absurdity of how long it has taken to finally figure this thing out, or the number of people who have seen this issue while they've claimed all along that nobody else sees it. I'm just going to focus on fixing it. This is the PR that they've chosen for implementing the actual fix, so that's what I'm going with for the sake of simplicity. :) I can't read PR836197 online as it is not public. Can you post it without liability? If you would be liable do not post it.. Also do _not_ email me off list with the PR description... Neither can I, but the basic description of the issue is what I said before. :) -- Richard A Steenbergen r...@e-gerbil.net http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
Re: Semi-automated L3 interface DNS records
On Thu, Oct 18, 2012 at 12:57:16PM -0700, Pedersen, Sean wrote: Does anyone out there have any experience with a script, tool or appliance that would help manage the creation and maintenance of DNS records for Layer 3 interfaces on routers and switches? http://cluepon.net/ras/generate_dnsptr_generic_php A relatively simple example using php, with the net-snmp module and Net_IPv4 from PEAR. For extra bonus points, it parses your BGP state and uses any neighbor ASNs it finds for the remote side of your /30 or /31s, and it resolves point-to-point SVIs to physical ports by checking against the vlan tables. The later part was only tested on Cisco 6500s, and I haven't touched that code (or those boxes) in many many years, so no guarantees about using it on anything else. :) Out of date DNS PTRs in traceroute make baby jesus cry, so please use copiously. -- Richard A Steenbergen r...@e-gerbil.net http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
Re: Real world sflow vs netflow?
On Mon, Sep 24, 2012 at 11:52:28AM -0700, Peter Phaal wrote: On Mon, Sep 24, 2012 at 11:19 AM, Joe Loiacono jloia...@csc.com wrote: OK, Well I guess I was thinking sFlow was primarily a switch oriented technology versus on a layer-3 peering router. The sFlow technology is a good fit for any device that performs a packet forwarding function (including routers) and the sFlow.org web site maintains a list of switches and routers that implement the technology, Minus a whole pile of babble from people who don't actually know what a router vs layer 3 switch is...The difference at this point is mostly that NetFlow has provisions to allow exporting all data about an ENTIRE flow, whereas sFlow is designed to only take statistical samples for overall traffic analysis. Tracking an entire flow is much harder, it requires keeping state on the router, so if you only care about overall traffic analysis sampling is just fine. Originally sFlow introduced features like raw packet export (including layer 2 headers), and extensible formatting, which NetFlow later copied with v9 and v10/IPFIX. At this point they're mostly on the same footing technically, though sFlow does have a counter export feature which is essentially a push version of polling SNMP IF-MIB counters. Only Cisco and Juniper are still trying to push NetFlow though, sFlow has been adopted by nearly ehter other vendor at this point. Even some Juniper products, like EX (which is really Marvell ASICs with a JUNOS wrapper), support sFlow only. -- Richard A Steenbergen r...@e-gerbil.net http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
Re: HE.net BGP origin attribute rewriting
On Fri, Jun 01, 2012 at 08:03:50PM +0200, Daniel Suchy wrote: By overwriting origin field, there's no warranty that someone improves performance at all - it's just imagination. In extreme cases, performance can be degraded when someone in the middle plays with origin field and doesn't know reasons, why originating network uses something else than IGP origin. In RFC 2119 words, full implications were not understanded - when this overwriting is done generally. Uh, what part of to prevent remote networks from improperly forcing a cold potato routing behavior on you sounds imaginary? Also, there must be some historical reason, why origin should not be rewritten (this changed in January 2006). For internal reasons within the network operator still haves enough knobs to enforce own policy (by setting localpref, med on his network). Not really, no. Not every RFC is 100% correct, and they're often written by people who are not operators (because operators are too busy running real networks :P). Besides, SHOULD NOT means you probably don't want to do this, unless you have a really good reason, and enforcing such an important point in a peering policy is a pretty good reason. You also clearly don't understand the practical use of localpref. When you're trying to implement a simple and relatively common policy like closest exit routing to a peer with multiple exits, you set the localprefs the same (localpref is usually used to determine WHICH peer you'll be sending to), you set the MEDs the same (if you don't want to artifically select which EXIT to use), AS-PATH lengths are obviously the same if you have multiple exits, and then the next one down is origin code. If you can't reset origin code, you run the risk of a remote network being able to force your network to do something you probably don't want to do (or at least probably wouldn't want to do, if you had any idea what you were doing :P). Please see the previous commentary from Joe Provo, Saku Ytti, etc, they are quite correct. -- Richard A Steenbergen r...@e-gerbil.net http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
Re: HE.net BGP origin attribute rewriting
On Thu, May 31, 2012 at 12:21:12PM -0400, Keegan Holley wrote: The internet by definition is a network of network so no one entity can keep traffic segregated to their network. Modifying someone else routing advertisements without their consent is just as bad as filtering them in my opinion. Doing so to move traffic into your AS in order to gain an advantage in peering arrangements and make more money off of the end user is just dastardly. There was one particularly (in)famous network *coughpeer1cough* which was well known for selectively rewriting the origin codes towards their peers a few years back. For example, if traffic was going to New York, they would advertise the prefix with IGP in New York, and Incomplete everywhere else, forcing other networks to haul the traffic to New York. This is a violation of most peering agreements, which require consistent advertisements unless otherwise agreed, but it was just sneaky enough that it flew under the radar of most folks for quite a while. When it was finally noticed and they refused to stop doing it when asked, a few folks just depeered them, but a bunch of others just solved the problem by rewriting the origin codes. This is why you still see a lot of rewriting happening today by default, to avoid a repeat of the same issue. Personally I was of the opinion that the correct solution to this particular problem was just to terminate the peering relationship, but honestly Origin code is a pretty useless attribute in the modern Internet, and it exists today only because it's impossible to take it out of the protocol. I don't see anyone complaining when we rewrite someone else's MEDs, sometimes as a trick to move traffic onto your network (*), or even that big of a complaint when we remove another networks' communities, so I don't see why anyone cares about this one. Maybe a better fix would be a local knob to ignore Origin code in the best path decision without having to modify it. Start asking your vendors for it now, maybe it'll show up around 2017... :) (*) I've seen a lot of inexperienced BGP speaking customers be very upset that they can't send any traffic using natural bgp (yes, there appears to be some kind of delusion running around that modifying BGP attributes to influence path selection is bad... What's next, organic routes, not from concentrate? :P), which in the end turned out to be us sending the customer MEDs based on our IGP cost, other networks sending them MEDs of 0, and them not knowing enough to do something useful with the data or else rewrite it to 0. -- Richard A Steenbergen r...@e-gerbil.net http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
Re: Did Internap lose all clue?
On Thu, Oct 20, 2011 at 10:48:34PM +0200, bas wrote: Recently I was contacted by an Internap sales person. The third line of the email read: As you know well, BGP makes all routing decisions simply based on HOP COUNT I blinked my eyes a couple of times.. Yes it really said hop count. Then I replied to the guy that if he tries to sell a technical product to technical people he should get his info straight. Errr, I think they mean AS hops, which is actually mostly correct. After you eliminate things that don't actually convey any information (like localpref, which you have to configure yourself), and things that don't provide any meaningful data in a multi-network path selection role (like MEDs), AS-PATH length is pretty much the only useful basis you have for picking a best path from your BGP peers. All other marketing crap asside, they aren't incorrect in pointing out that BGP really sucks as a way to pick a best path. :) -- Richard A Steenbergen r...@e-gerbil.net http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
Re: 4.0.0.0/8?
On Tue, Sep 20, 2011 at 08:13:09PM +0300, Hank Nussbacher wrote: Did Level3 withdraw 4.0.0.0/8 today and start announcing it as two /9s? Level3 has been announcing 2x /9's as well as the /8 for some time now, ever since Telefonica's unfortunate incident where they allowed a customer to hijack 12.0.0.0/8 because they don't prefix-list filter customers properly IIRC. -- Richard A Steenbergen r...@e-gerbil.net http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
Re: Cogent HE
On Thu, Jun 09, 2011 at 12:55:44AM -0700, Owen DeLong wrote: Respectfully, RAS, I disagree. I think there's a big difference between being utterly unwilling to resolve the situation by peering and merely refusing to purchase transit to a network that appears to offer little or no value to the purchaser or their customers. Owen, can you please name me one single instance in the history of the Internet where a peering dispute which lead to network partitioning did NOT involve one side saying hey, we're willing to peer and the other side saying no thanks? Being the one who wants to peer means absolutely NOTHING here, the real question is which side is causing the partitioning, and in this case the answer is very clearly HE. HE wants to peer with Cogent, Cogent doesn't want to peer with HE, and thus you have an impass and there will be no peering. HE has no problem using transit to reach Cogent for IPv4 (I see HE reaching Cogent via 1299/Telia, and Cogent reaching HE via 3549/Global Crossing, both very clearly HE transit providers and Cogent peers), but HE has chosen not to use transit for the IPv6 traffic. Quite simply, HE feels that they are entitled to peer with Cogent for the IPv6 traffic, and has deliberately chosen to create this partition to try and force the issue. These are *PRECISELY* the same motivations and actions as EVERY OTHER NETWORK who has ever created a network partition in pursuit of peering that the other party doesn't want to give them, period. Again, this isn't necessarily a bad thing if HE thinks it can work to their long term advantage, but to try and claim that this is anything else is completely disingenuous. I understand that you have a PR position to take, and you may even have done a good job convincing the weak minded who don't understand how peering works that HE is the victim, but please don't try to feed a load of bullshit to the rest of us. :) -- Richard A Steenbergen r...@e-gerbil.net http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
Re: Cogent HE
On Thu, Jun 09, 2011 at 07:06:29PM -0400, Brian Dickson wrote: So, long history short, there were in fact peering disputes that had one side saying, hey, we want to peer and the other side saying you don't have enough traffic, or your ratio is too imbalanced, or you're my customer - tough!. And some of those got resolved by the ratios changing, or the traffic levels reaching sufficiently high. (I can historically mention AS 6453.) How is that different from what I said? One side wants to peer, the other side says no thanks. A list of reasons is nice, especially if they will actually grant peering after you meet those requirements (instead of just changing their requirements to deny you again :P), but immaterial to the point. In EVERY peering dispute there is one side who wants to peer, but that doesn't make this side any more noble or right, especially if they don't meet the requirements and are simply trying to force the peering through intentionally creating a partition then playing the propaganda game to blame the other side for it. Everyone complained when Cogent did it to others, why should it be any different when HE does it to Cogent? I'm sorry but I don't accept because Cogent is giving away free IPv6 transit right now as a valid reason, especially when it very clearly advances their goals of artificially inflating their customer base specifically so they CAN engage in these peering disputes. It's a perfectly valid tactic that has been used by the finest networks for years, but at least have the decency to admit it for what it is, that's all I'm saying. :) -- Richard A Steenbergen r...@e-gerbil.net http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
Re: Cogent HE
On Thu, Jun 09, 2011 at 06:26:01PM -0500, Jimmy Hess wrote: Er, Sorry... you are kind of siding with Cogent and claiming HE responsible without any logically sound argument explicitly stated that supports that position... You're confused, read again. :) I would consider them both responsible for the partition, with Cogent slightly more complicit, in that Cogent's expectation of selling HE transit is slightly less reasonable than HE's expectation of Cogent peering with HE. Cogent is (unfortunately, note I have no particular love for Cogent here) a transit free network, who peers with every other Tier 1. HE is a perfectly fine network, but they are not even CLOSE to a transit free network. HE buys transit from multiple other networks, including 3549/Global Crossing and 1299/Telia (both easily visible in the routing table), which they use to reach Cogent for IPv4. There is absolutely NO requirement that there be a direct interconnection between HE and Cogent. None, period, and if you think otherwise you are vastly confused about routing on the Internet. Let me say this again, there is NO requirement that HE buy transit from Cogent, but there is a requirement that HE buy transit from *SOMEONE* if they are not a transit free network. HE has deliberately chosen NOT to use transit for their IPv6 routes, in order to force people like Cogent to peer with them so they can become an IPv6 Tier 1, and thus you have a partition. These are the same tactics and strategies used by every other network in pursuit of becoming a Tier 1, including Cogent, and everyone complained their ass off when Cogent caused partitioning several times during THEIR peering disputes on the road to their current transit free status. If your answer is I like HE better than Cogent so I'm willing to overlook it, that's fine, but you're just making things up if you're trying to claim that they AREN'T causing this partition. -- Richard A Steenbergen r...@e-gerbil.net http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
Re: Cogent HE
On Wed, Jun 08, 2011 at 06:39:02PM -0400, Patrick W. Gilmore wrote: Yes, both refuse to buy transit, yes. But HE is able, willing, and even begging to peer; Cogent is not. These are not the same thing. I'm ready, willing, and lets say for the purposes of this discussion begging to peer with every Tier 1, but some of them aren't willing to peer with me. Does that mean I should stop buying transit and blame them for my resulting lack of global reachability? If I could convince my customers to accept that line of bullshit it would certainly reduce my transit costs, but I have a sneaking suspicion they wouldn't. :) Ultimately it is the responsibility of everyone who connects to the Internet to make sure they are, you know, actually connected to the Internet. Choosing not to do so and then throwing up your hands and saying oh I can't help it, they won't peer with me is not a valid excuse, at least not in my book or the book of anyone who pays me money to deliver their packets. And this isn't even a case of not being ABLE to buy sufficient capacity via a transit path (ala Comcast), this is just two networks who have mutually decided two remain partitioned from each other in the pursuit of long term strategic advantage. Ultimately both parties share responsibility for this issue, and you can't escape that just because you have a tube of icing and some spare time. :) These are not the only two networks on the v6 Internet who are bifurcated. There are some in Europe I know of (e.g. Telecom Italia refuses to buy v6 transit and refuses to peer with some networks), and probably others. The v6 'Net is _not_ ready for prime time, and won't be until there is a financial incentive to stop the stupidity ego stroking. The Internet is a business. Vote with your wallet. I prefer to buy from people who do things that are in MY best interest. Giving money to Cogent will not put pressure on them peer with HE Google everyone else - just the opposite. Absolutely. This is just like any other IPv4 peering dispute, the only difference is IPv6 is so unimportant in the grand scheme of the Internet that there hasn't been enough external pressure from customers on either side to force a settlement. Shockingly, HE manages to buy plenty of IPv4 transit to reach Cogent and many other networks, no doubt because they wouldn't have any (paying) customers if they didn't. :) On the flip side, HE is an open peer, even to their own customers, and _gives away_ free v6 transit. Taking their free transit complaining that they do not buy capacity to Cogent seems more than silly. Plus, they are doing that I think is in my best interest as a customer - open peering. Trying to make them the bad guy here seems counter intuitive. I know you're not naive enough to think that HE is giving away free IPv6 transit purely out of the kindness of their heart. They're doing it to bulk up their IPv6 customer base, so they can compete with larger networks like Cogent, and make a play for Tier 1-dom in exactly the same way that Cogent has done with IPv4. And more power to them for it, it may well be a smart long term strategic move on their part, but with every wannabe Tier 1 network comes partitioning and peering disputes, as they try to trade short term customer pain for long term advantages. Sorry to all the HE guys, but trying to simultaniously complain about your treatment at the hands of other networks and their peering disputes while emulating their actions is bullshit and you know it. :) -- Richard A Steenbergen r...@e-gerbil.net http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
Re: Downstream Usage-BGP Communites
On Tue, May 10, 2011 at 05:52:39PM -0400, Nick Olsen wrote: Greetings NANOG, Was hoping to gain some insight into common practice with using BGP Communities downstream. For instance: We peer with AS100 (example) AS100 peers with TW Telecom (AS4323). Since I happen to know that AS100 doesn't sanitize the communities I send with my routes. I can take advantage of TW Telecom's BGP communities for traffic engineering. Such as 4323:666 (Keep in TWTC Backbone). Would this be something that is generally frowned upon? Still under the assumption that the communities aren't scrubbed off my routes. Could I do this with other AS's beyond TW Telecom? Such as TW's peering with Global Crossing (AS3549)? Well first off, if you're using the words peers with in the normal sense, your routes would never propagate to AS4323 in the first place. Assuming what you actually mean is that at least one of those sessions is a transit feed, essentially all (non-stupid) networks will filter their own TE communities from their transits/peers, so the odds of this working are almost non-existant. You also have about a 50/50 shot of AS100 stripping your communities before they even make it to AS4323 (or any other network). Personally my belief is that this is a bad thing, and you should only filter communities in your own name-space (i.e. $YOURASN:*), but this doesn't stop a large number of obnoxious networks from doing it anyways. :) -- Richard A Steenbergen r...@e-gerbil.net http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
Re: Downstream Usage-BGP Communites
On Tue, May 10, 2011 at 06:47:11PM -0400, Nick Olsen wrote: Ah, Sorry for the confusion. We have a mutual agreement with AS100 (call it transit or peering) we send them full routes, They send us full routes. AS100 is a transit customer of AS4323. I understand I would be at the mercy of how people have things setup. I do know for a fact I'm not filtered by AS100 as I've already tested it. Thanks to everyone for the info so far. Erm ok, well as long as you're a transit customer of AS100 (for some definition of transit customer), and they're a transit customer of AS4323, you should have no problems. This is completely different from peering, when money changes hands communities get listened to. :) -- Richard A Steenbergen r...@e-gerbil.net http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
Re: Anyone still maintaining altdb.net?
On Wed, Apr 20, 2011 at 10:30:44AM -0400, Jon Lewis wrote: On Wed, 20 Apr 2011, Bret Palsson wrote: I submitted my objects April 11. the mtrner object needs to be created by the db-admin. I realize this is a volunteer thing. Could I help out or could the people that are helping out look at adding my record? I need to setup some peering relationships. I'd prefer to support open communities rather than paying and am willing to help out if need be. If you're just getting started, it might make sense to look at another db. IIRC, RIPE's routing registry is free to use, supports md5crypt and PGP/GPG auth, and isn't a volunteer one-man show. One of the premises of AltDB is that no support is provided. For example, a lot of people send email asking how do I use this, and the unfortunate answer has to be sorry we can't help you. If you need support, then by all means pay the money to someone like RADB and let them help walk you through the process. Of course after the initial mntner creation everything is pretty much automated anyways, so if you know what you're doing AltDB provides a free method to maintain your IRR entries with very little sacrificied over a commercial solution. There is infact more than 1 person volunteering for AltDB, but from what I can see of this April 11th email, it falls into the please provide support category. :) -- Richard A Steenbergen r...@e-gerbil.net http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
Re: Level 3 Agrees to Purchase Global Crossing
On Mon, Apr 11, 2011 at 03:49:43PM -0700, Holmes,David A wrote: Way too many players ... means that the telecom marketplace is good for the consumer, with competition keeping prices low. Many network users feel that prices are still way too high, particularly for high speed circuits and dark fiber, areas in which Level 3 and Global Crossing have specialized. Cute theory, but unfortunately this has no basis in reality. Users can feel any way they'd like, but the truth is that the current market prices for wholesale IP transit, in which Level 3 and Global Crossing specialize, are far below cost and are impossible for any carrier to sustain long term. I'm not saying that either L3 or GX runs a completely optimal network (infact I'd say that GX may well be a case study in failure to do so :P), but a simple analysis of the costs of routers, colo, power, crossconnects, optical gear, etc, makes it abundantly clear that the current rush to the bottom pricing cannot possibly be supported even under optimal conditions and ignoring other overhead. The situation isn't significantly different for high-speed longhaul capacity, the revenue these these circuits generate at current market prices is barely offsetting their capex on the optical gear at this point. Anyone who told you that there is a cash cow in this particular market is woefully mistaken, any serious money to be had is coming from enterprise customers who can only be reached via unique metro assets. I have no doubt that there will be some modest reduction in competition following the acquisition, but I honestly don't think it is anything to get too worried about. Unlike L3's previous acquisitions (such as Wiltel, Telcove, Looking Glass, etc), it isn't really possible for them to disappear the assets from the market following the purchase. GX's longhaul fiber footprint is mostly still owned and operated by Qwest, they were never a big player in IRU dark sales to begin with, and they don't have much in the way of metro fiber assets to speak of. The two companies also not really in any danger of being able to stop the current tide of market transit prices, since this are being driven by many other companies. And L3 has already learned what happens to their market share when they try to alter market pricing by themselves, which is what led to their current Comcast debacle in the first place. The best case scenario that I see here is L3 being able to provide some technical leadership to significantly reduce GX's overhead, and hopefully fix some of their other problem areas too. But personally I'm not convinced that L3 is the technical or market force they used to be, and thus I question whether they'll be able to get it right themselves. Remember, it taks a LOT of work for a big telco to put all the pieces in place correctly, and any mistakes on their part will open the door for smaller carriers to show off the advantages of being nimble. If there is any significant reduction in competition that comes to either carrier, it will do exactly that. Infact, I encourage them to try, it will probably be good for my business. :) -- Richard A Steenbergen r...@e-gerbil.net http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
Re: Peering Traffic Volume
On Thu, Mar 24, 2011 at 07:27:08PM -0400, Ravi Ramaswamy wrote: Hi All - I am new to this mailer. Hopefully my question is posed to the correct list. I am using 2.5 Tbps as the peak volume of peering traffic over all peering points for a Tier 1 ISP, for some modeling purposes. Is that a reasonable estimate? The largest Tier 1's, like say Level 3, and god help me for saying it but... Cogent, are certainly in or beyond that kind of ballpark. But most of the smaller ones, like say ATT, Qwest, ATDN (if you even still want to count them), etc, not a chance in hell. And then there are plenty of non tier 1 networks (and some that aren't even actual single networks in the classic sense) that do far more traffic than that, for example some of the large CDNs like Akamai and LimeLight. On the modern Internet most of the traffic bypasses Tier 1 networks completely, going directly from content networks to eyeball networks, so the Tier 1's are effectively left as the higher priced and lower capacity last resorts for the remaining traffic. -- Richard A Steenbergen r...@e-gerbil.net http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
Re: bfd-like mechanism for LANPHY connections between providers
On Wed, Mar 16, 2011 at 06:56:28PM +0200, Tassos Chatzithomaoglou wrote: Are there any transit providers out there that accept using the BFD (or any other similar) mechanism for eBGP peerings? If no, how do you solve the issue with the physical interface state when LANPHY connections are used? Anyone messing with the BGP timers? If yes, what about multiple LAN connections with a single BGP peering? Well first off LAN PHY has a perfectly useful link state. That's pretty much the ONLY thing it has in the way of native OAM, but it does have that, and that's normally good enough to bring down your EBGP session quickly. Personally I find the risk of false positives when speaking to other people's random bad BGP implementations to be too great if you go much below 30 sec hold timers (and sadly, even 30 secs is too low for some people). We (nLayer) are still waiting for our first customer to request BFD, we'd be happy to offer it (with reasonable timer values of course). :) -- Richard A Steenbergen r...@e-gerbil.net http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
Re: bfd-like mechanism for LANPHY connections between providers
On Wed, Mar 16, 2011 at 02:55:14PM -0400, Jeff Wheeler wrote: This is often my topology as well. I am satisfied with BGP's mechanism and default timers, and have been for many years. The reason for this is quite simple: failures are relatively rare, my convergence time to a good state is largely bounded by CPU, and I do not consider a slightly improved convergence time to be worth an a-typical configuration. Case in point, Richard says that none of his customers have requested such configuration to date; and you indicate that Level3 will provision BFD only if you use a certain vendor and this is handled outside of their normal provisioning process. There are still a LOT of platforms where BFD doesn't work reliably (without false positives), doesn't work as advertised, doesn't work under every configuration (e.g. on SVIs), or doesn't scale very well (i.e. it would fall over if you had more than a few neighbors configured). The list of caveats is huge, the list of vendors which support it well is small, and there should be giant YMMV stickers everywhere. But Juniper (M/T/MX series at any rate) is definitely one of the better options (though not without its flaws, inability to configure on the group level and selectively disable per-peer, and lack of support on the group level where any IPv6 neighbor is configured, come to mind). Running BFD with a transit provider is USUALLY the least interesting use case, since you're typically connected either directly, or via a metro transport service which is capable of passing link state. One possible exception to this is when you need to bundle multiple links together, but link-agg isn't a good solution, and you need to limit the number of EBGP paths to reduce load on the routers. The typical solution for this is loopback peering, but this kills your link state detection mechanism for killing BGP during a failure, which is where BFD starts to make sense. For IX's, where you have an active L2 switch in the middle and no link state, BFD makes the most sense. Unfortunately it's the area where we've seen the least traction among peers, with zomg why are you sending me these udp packets complaints outnumbering people interesting in configuring BFD 10:1. -- Richard A Steenbergen r...@e-gerbil.net http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
Re: Internet Edge Router replacement - IPv6 route tablesizeconsiderations
On Fri, Mar 11, 2011 at 12:55:33PM -0600, James Stahr wrote: link-local address. Then I realized, why even assign a global in the first place? Traceroutes replies end up using the loopback. BGP will use loopbacks. So is there any obvious harm in this approach that I'm missing? Traceroute replies most assuredly do NOT use loopbacks on most networks, and it would make troubleshooting massively more difficult if this was the only option. Imagine any kind of complex network where there is more than one link between a pair of routers (and don't just picture your own internal network, but imagine customers connecting to their ISPs as well) , and now tell me how you plan on identifying a particular link with a traceroute. The two words that best sum this up would be epic disaster. -- Richard A Steenbergen r...@e-gerbil.net http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
Re: Internet Edge Router replacement - IPv6 route table sizeconsiderations
On Thu, Mar 10, 2011 at 10:52:37AM -0800, George Bonser wrote: What I have done on point to points and small subnets between routers is to simply make static neighbor entries. That eliminates any neighbor table exhaustion causing the desired neighbors to become unreachable. I also do the same with neighbors at public peering points. Yes, that comes at the cost of having to reconfigure the entry if a MAC address changes, but that doesn't happen often. And this is better than just not trying to implement IPv6 stateless auto-configuration on ptp links in the first place how exactly? Don't get taken in by the people waving an RFC around without actually taking the time to do a little critical thinking on their own first, /64s and auto-configuration just don't belong on router ptp links. And btw only a handful of routers are so poorly designed that they depend on not having subnets longer than /64s when doing IPv6 lookups, and there are many other good reasons why you should just not be using those boxes in the first place. :) -- Richard A Steenbergen r...@e-gerbil.net http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
Re: ATT via Tata and Level3
On Thu, Mar 03, 2011 at 11:15:51AM -0500, Morgan Miskell wrote: I've noticed that we have thousands of routes for ATT via Tata that we don't have from ATT through Level3. I would expect Level3 to have most of the routes for ATT that Tata does since they are both directly peered with ATT. Well, I don't know anything about this specific issue or any policy changes that may have been made, but at a high level I can tell you that BGP doesn't work like that. BGP is only capable of passing on a single best path for each route, and what is considered the best path is totally in the eye of the beholder. First off you must understand that the vast majority of Internet routes are multi-homed at some level. As you get into large Tier 1 carriers, the amount of overlap is massive (i.e. you'll hear the same route as a customer from multiple networks), and the question of which path will be selected is completely up to the policies of the network doing the selecting. Not only does this vary by policy, but it varies by the composition of other networks they peer with (or buy from), what other networks buy from them, and even their network topology (due to tie breaking rules like EBGP IBGP). For example, Level 3 is a much larger network with significantly more customer routes than Tata. I'm too lazy to do an actual comparison between the two, but odds are high that of the ATT customer routes that they announce to their peers, probably somewhere around 30-40% of those routes are also Level 3 customer routes as well. A network will ALWAYS prefer their customer routes above those learned from peers (or else they wouldn't be able to guarantee that they're actually providing full transit service), so those routes coming from ATT will never be selected. Meanwhile, Tata is receiving those same routes from both ATT and Level 3 (and potentially other peers and/or customers too), and is completely free to make their own best path selections based on their own local criteria. The result is that you should almost never expect to see the same paths for the same networks being selected by two different large networks, unless the routes in question are single homed and there are no other choices (which is a small minority of the routes on the Internet). -- Richard A Steenbergen r...@e-gerbil.net http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
Re: 6453 routing leaks (January and Today)
On Fri, Feb 25, 2011 at 07:22:36AM -0500, Jared Mauch wrote: Update: I have had a source ask me to post the following: -- snip -- The problem with route leaking was caused by specific routing platform resulting in some peer routes not being properly tagged. We are deploying additional measures to prevent this from happening in the future -- snip -- Hopefully someone learned a lesson about BGP community design, and how it should fail safe by NOT leaking if you accidentally fail to tag a route. Always require a positive match on a route to advertise to peers, not the absence of a negative match. -- Richard A Steenbergen r...@e-gerbil.net http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
Re: SFP vs. SFP+
On Thu, Feb 17, 2011 at 03:41:28PM -0800, Sam Chesluk wrote: Depends on the switch. Some, like the 2960S and 4948E, have 1G/10G ports. They will, however, not operate at 4Gbps (that particular speed was chosen to allow the core components to work for gigabit Ethernet, OC48, 2G FC, and 4G FC). 4G SFPs are relatively rare, and only for fibre channel. Multi-rate SFPs that do up to 2.5G (for OC48) are a lot more common, but they cost more than just a simple 1GE SFP. Since all you can do with Ethernet is 1G or 10G anyways, most SFPs you'll encounter in the field will be the cheaper non-multirate kind. For more information about SFP+, as well as some comparisons between different 10G optic types, take a look at: http://www.nanog.org/meetings/nanog42/presentations/pluggables.pdf As an update (since this presentation is from Feb 2008), SFP+ is just now finally starting to get into 40km/ER reach territory. Supplies are limited, as they just very recently started shipping, but they do exist. Of course since they moved the electronic dispersion compensation (EDC) off the optic and onto the host board, the exact distances you'll be able to achieve are still based on the quality of the device you're plugging them into. SFP+ is still mostly an enterprise box or high density / short reach offering, and XFP is still required for full functionality. -- Richard A Steenbergen r...@e-gerbil.net http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
Re: SFP vs. SFP+
On Thu, Feb 17, 2011 at 09:04:29PM -0600, Frank Bulk wrote: Are there are any optics that plug into 10G ports but have a copper or optical 1G interface? There's some equipment that I'm specing where it is $10K for a multi-port 1G card, even while I really may only *occasionally* need a single 1G port and there's a free 10G port for me to use. It doesn't work that way. The closest you can get is that the device can support either 1G or 10G in the same port (since SFP and SFP+ are physically and electrically the same), but it requires support from the device (since both PHYs have to be implemented). -- Richard A Steenbergen r...@e-gerbil.net http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
Re: SFP vs. SFP+
On Fri, Feb 18, 2011 at 12:55:45AM -0500, Peter Nowak wrote: You can plug SFP module (copper or fiber) into any SFP+ port. So, on 10G port you can run either 1GE or 10GE. Not true. Some devices support this, since SFP and SFP+ are physically and electrically compatible, but not all. The device must be specifically designed to support both PHYs, which is NOT a given. -- Richard A Steenbergen r...@e-gerbil.net http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
Re: Announcing the Community FlowSpec trial
On Wed, Jan 05, 2011 at 05:46:36PM -0600, John Kristoff wrote: Friends and colleagues, At NANOG 48 I talked about a community flow-spec service we were looking at trying to make work. This is the idea of using IETF RFC 5575 to pass around flow-based rules, in this case, primarily for dropping unwanted packets. This technology is not as widely deployed as traditional RTBH techniques for a number of reasons. However, we thought perhaps it was widely used enough, or could be, to justify what might be a helpful and free 3rd party feed of flow-spec routes to keep our networks a little bit cleaner. A trial of this feed based on the traditional bogon routes can be had by contacting me directly. We realize the traditional IPv4 reserved, special and unallocated IPv4 bogon address is dwindling. Maybe there is room for some other type of feed, but to justify that, we're looking to see if even enough people would set up this presumably simpler feed to help us and the community get some more experience with multi-hop flow-spec. As a word of warning to anyone who wants to deploy this on their Juniper routers (what other router vendors support it? :P), there are some pretty serious performance considerations of which you should be aware. For example, we discovered that on MX routers (with classic I-chip DPCs, the performance should be somewhat better for Trio cards but we haven't fully tested the exact numbers yet), installing as few as a dozen flowspec routes can create firewall filters that use enough SRAM accesses that you will no longer be able to achieve line rate packets/sec. With a few more rules, you may find that your 10GE's will only be able to handle 3-5Mpps instead of the normal 14.8Mpps. When this happens, excess traffic above what the firewall filters can handle will be silently discarded, with no indicaton in SNMP or show interface that you're dropping packets (though you may be able to see it in show pfe statistics traffic as Info cell drops). I can't tell you what the performance numbers are for other platforms, but anyone thinking about turning on flowspec from a third party source (especially one who may be sending them a large number of rules) should give serious consideration to the potential impact on their network first. -- Richard A Steenbergen r...@e-gerbil.net http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
Re: Comcast vs Level 3 - This time with video
On Mon, Dec 20, 2010 at 11:59:31AM -0500, Randy Epstein wrote: A simplified explanation of the situation between Level 3 and Comcast, from the perspective of a Comcast customer who is asking for the same thing Comcast is asking for. :) http://www.xtranormal.com/watch/8124137/ I have to question Richard on this interaction though. There is no way in hell a Comcast customer service rep would respond like that. Not at least without putting you on hold 5 times and then still, wouldn't know what in the hell you're talking about. In the end, the service rep would tell you they need to dispatch someone to your house. Hah, yes they did seem to skip over the usual bad ratios? have you tried rebooting your cable modem? part didn't they. I suppose I should have added the phrase highly fictionalized, but Xtranormal has something against allowing punctuation in their descriptions, and the existing one was confusing enough. FYI a bunch of people complained that the voices were hard to distinguish, so I did a modified version which is a little more intelligable. It's also linked to from the original, as part of the same series. http://www.xtranormal.com/watch/8134089/ -- Richard A Steenbergen r...@e-gerbil.net http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
Re: Some truth about Comcast - WikiLeaks style
On Sun, Dec 19, 2010 at 08:20:49PM -0500, Bryan Fields wrote: The government granting a monopoly is the problem, and more lame government regulation is not the solution. Let everyone compete on a level playing field, not by allowing one company to buy a monopoly enforced by men with guns. Running a wire to everyone's house is a natural monopoly. It just doesn't make sense, financially or technically, to try and manage 50 different companies all trying to install 50 different wires into every house just to have competition at the IP layer. It also wouldn't make sense to have 5 different competing water companies trying to service your house, etc. This is where government regulation of the entities who ARE granted the monopoly status comes into play, to protect consumers against abuses like we're seeing Comcast commit today. Personally I think the right answer is to enforce a legal separation between the layer 1 and layer 3 infrastructure providers, and require that the layer 1 network provide non-discriminatory access to any company who wishes to provide IP to the end user. But that would take a lot of work to implement, and there are billions of dollars at work lobbying against it, so I don't expect it to happen any time soon. :) -- Richard A Steenbergen r...@e-gerbil.net http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
Re: Some truth about Comcast - WikiLeaks style
On Sun, Dec 19, 2010 at 05:58:26PM -0800, Leo Bicknell wrote: I dream of a day where we have municipal fiber to the home, leased to any ISP who wants to show up at the local central office for a dollar a two a month so there can be true competition in end-user services. Take a second and think about what THAT would do to the ratio wars. Imagine if any hosting/content provider, with potentially hundreds or thousands of gigabits of unused inbound capacity on their networks, could easily get into providing IP service to eyeballs. Even ignoring the existing 95th percentile silliness like free inbound transit, which would no doubt rapidly evaporate under this kind of model, the difference in efficiencies between the highly competetive hosting world and the highly non-competetive last mile world are simply staggering. For many content networks, it would be an opportunity to start making money on their bits instead of paying for them, and networks without content expertise would be in serious trouble. I personally can't think of a single thing with more potential for massive disruption to the business models of incumbent providers. There are so many billions of dollars at stake protecting the status quo that it's not even funny, which IMHO is why you'll never see any of this happen in the US, in any kind of scale at any rate. :) -- Richard A Steenbergen r...@e-gerbil.net http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
Re: Some truth about Comcast - WikiLeaks style
On Sun, Dec 19, 2010 at 06:12:02PM -0800, JC Dill wrote: And if a competing water service thought they could do better than the incumbent, why not let them put in a competing water project? If they think they can make money after the cost of the infrastructure, then they may be onto something. We don't have to worry that too many would join in, the laws of diminishing returns would make it unprofitable for the nth company to build out the infrastructure to enter the market. The laws of diminishing returns have already set the bar for the point at which it's not profitable for a new company to enter the market and try to compete. Right now the number is roughly 2, cable and dsl, give or take a few outliers. I do believe the point would be to encourage a little more competition than that. :) -- Richard A Steenbergen r...@e-gerbil.net http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
Re: potential new and different architectural approach to solve the Comcast - L3 dispute
On Fri, Dec 17, 2010 at 11:15:14AM -0600, Benson Schliesser wrote: I have no direct knowledge of the situation, but my guess: I suspect the proposal was along the lines of longest-path / best-exit routing by Level(3). In other words, if L(3) carries the traffic (most of the way) to the customer, then Comcast has no complaint--the costs can be more fairly distributed. The modest investment is probably in tools to evaluate traffic and routing metrics, to make this work. This isn't really *new* to the peering community, but it isn't normal either. Nah, you're still thinking about this like it was a classic peering dispute over ratios, when nothing could be further from the truth. First off, by the very nature of a CDN, all of the Netflix/etc traffic is going to be delivered to the best exit on the long-haul network already. Second, Comcast is a FULL TRANSIT CUSTOMER of Level 3. Typically the customer gets to dictate the handoff point to the provider, by either advertising MEDs, or by sending inconsistent routes. The fact that the existing Level3/Comcast routing DOESN'T make Level 3 haul all of the bits to the best exit mean it's highly likely that Comcast agreeing to haul the bits was part of their commercial transit agreement, probably in exchange for lower transit prices. -- Richard A Steenbergen r...@e-gerbil.net http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
Comcast vs Level 3 - This time with video
A simplified explanation of the situation between Level 3 and Comcast, from the perspective of a Comcast customer who is asking for the same thing Comcast is asking for. :) http://www.xtranormal.com/watch/8124137/ -- Richard A Steenbergen r...@e-gerbil.net http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
Re: potential new and different architectural approach to solve the Comcast - L3 dispute
On Sat, Dec 18, 2010 at 01:07:15AM -0500, Patrick Giagnocavo wrote: Note that Comcast has never said that the Level3/Netflix issue is about users exceeding their allotted bandwidth (currently at about 250GB/month for residential); presumably, were a Comcast user to use 249GB of bandwidth downloading cute pictures of cats, Comcast would have no objection. I believe they want the cat people to pay too, it's just easier to go after Netflix first. Lets say for a moment that Comcast's overall ratio with its customers is approximately the same as their ratio in the leaked Tata graphs (yes I know that this proves nothing, but lets just assume it for a moment), i.e. 5:1. They then ask that every network who sends them traffic, even their transit providers (in the case of Level 3) be under 2:1. What is the point of insisting on a ratio that is not supported by the traffic their customers actually request? Because it gives them a convenient excuse to demand payment from nearly everyone on the Internet for being out of ratio, and to restrict capacity to those who do not pay. With so many transit ports running hot, and even peering ports running hot as in the recent example where they intentionally turned down Global Crossing capacity (which they claim is settlement free) and CAUSED congestion, the ISP who hosts the cute cat pictures may have little choice but to pay Comcast for access, or risk losing their cute cat hosting business to someone else who is willing to do so. I've also seen Comcast ignore several offers to honor MEDs or accept more-specifics from networks who DO meet their published peering requirements in every way except ratios, so I don't think they're interested in technical solutions a potential transport cost imbalance either. If it was about anything other than trying to extract a toll from content providers, one of these technical solutions would clearly have been better for them then continuing to force the traffic into their congested transit ports, which they not only pay for, but then also do the backhaul for across their own network. BTW, they rejected my very nice comment on their blog asking if they would be willing to share the graphs of their transit provider interfaces (which are NOT peering relationships, and not under NDA) to back up their claims that the published graphs are false, so I'm positive yours isn't going to get through. :) -- Richard A Steenbergen r...@e-gerbil.net http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
Re: Some truth about Comcast - WikiLeaks style
On Thu, Dec 16, 2010 at 02:48:56PM -0500, Randy Epstein wrote: I was in the IRC channel at the time and saw it. It's real. I don't support the posting of IRC logs, but can't control that either. I saw it too. I don't support posting of IRC logs trying to get people in trouble (though lord knows it wouldn't be the first time that has happened :P), but I also completely disagree with Comcast's position on this (big shocker, I know). As one of the people who has spoken out against Comcast's actions the most vocally, I suppose the original sentiment might very well be targeted at me. Personally I really don't think that people on the NANOG list posting about their network issues or actions has ANYTHING to do with their sponsorship of the NANOG conferences or community, and I suppose I should be shocked and appalled that it might come down to these type of threats to silence people who have something negative to say. I'm a Comcast customer too (50M/10M or 6M/768K DSL at home, gee, decisions decisions :P), what are they going to do next, shut off my cable modem for TOS violations? :) Seriously guys, this is an operator forum and you're running a congested network, to expect that people are not going to comment on those facts just because you've put money into NANOG sponsorship is absurd. -- Richard A Steenbergen r...@e-gerbil.net http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
Re: Some truth about Comcast - WikiLeaks style
On Thu, Dec 16, 2010 at 02:13:47PM -0600, Richard A Steenbergen wrote: Seriously guys, this is an operator forum and you're running a congested network, to expect that people are not going to comment on those facts just because you've put money into NANOG sponsorship is absurd. Forgot to attach a giant disclaimer on the previous post: I'm speaking solely for myself, and not in any way, shape, or form, for the NANOG, NewNOG, or any other organization. -- Richard A Steenbergen r...@e-gerbil.net http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
Re: Some truth about Comcast - WikiLeaks style
On Wed, Dec 15, 2010 at 02:25:53PM -0500, Jeffrey Lyon wrote: From Tata? I'd eat my own hand if they were paying more than $1-2 across the board. I know people who have offered them hundreds of gigs of settlement free transit (including myself), but clearly they aren't interested. FYI a large number of their wholesale transit/paid peering customer agreements include clauses which prohibit the resale of services to other parties too. They don't want one person being able to buy capacity into their network, then provide it to others. Remember their goal isn't to save money on transit, it's to make the transit paths minimally functional so they can force content networks to buy from them directly (at above market rates, from what people tell me :P), so they don't WANT to add capacity or transit paths. -- Richard A Steenbergen r...@e-gerbil.net http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
Re: Some truth about Comcast - WikiLeaks style
On Wed, Dec 15, 2010 at 07:05:26PM -0600, Jack Bates wrote: On 12/15/2010 4:47 PM, Adam Rothschild wrote: Folk in content/hosting should find this all more than a little bit scary. So you don't think the money content providers will pay Comcast won't reflect on other eyeball networks who aren't important/large enough to request financing? ie, Comcast could run lower rates and offer better service by charging the content provider, while competitive eyeball networks won't get the option to receive compensation from content providers and have to charge appropriate rates to their customers. And if you saw someone getting mugged on the street, you could argue that you're now less likely to be robbed because the guy already has someone else's money... If Comcast wanted to grow its revenue by offering a better, faster, cheaper, etc, wholesale transit service to content networks, I don't think anyone here would object in the slightest. The problem is that rather than compete on any kind of financial or technical merit, they've decided to hold their cable customers hostage and FORCE content networks to buy from them. Rest assured nobody WANTS to buy transit from a network with a 109ms rtt between New York and San Jose (it boggles the mind how one could even manage to assemble that fiber path, let alone try to charge money for it :P), congestion on every port, etc. If Comcast gets away with this, what's to stop every other monopoly/duopoly eyeball network from doing the same thing? And yes maybe if Comcast forces Netflix to pay them to reach you (either directly or indirectly via Level 3), your cable modem bill might go down, but all that means is that your Netflix bill is going to go up. At the end of the day you're probably better off betting on lower costs from the technical innovation of the networks who DON'T pay $50k for a 10GE port. :) -- Richard A Steenbergen r...@e-gerbil.net http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
Re: Some truth about Comcast - WikiLeaks style
On Tue, Dec 14, 2010 at 02:54:13AM -0500, Jeffrey Lyon wrote: gin-nto-icore1 is a Tata router at Equinix in NY. Whether or not that port belongs to Comcast is anyone's guess. From Tata's looking glass: 3 Vlan550.icore1.NTO-NewYork.as6453.net (209.58.26.78) 4 msec Vlan551.icore1.NTO-NewYork.as6453.net (209.58.26.82) 4 msec 0 msec 4 pos-1-9-0-0-cr01.newyork.ny.ibone.comcast.net (68.86.86.41) [AS 7922] 4 msec 4 msec 4 msec As far as I can tell their DNS doesn't expose Tata's router port names at all: 77.26.58.209.in-addr.arpa domain name pointer Vlan550.icore1.NTO-NewYork.as6453.net. 78.26.58.209.in-addr.arpa domain name pointer Vlan550.icore1.NTO-NewYork.as6453.net. 81.26.58.209.in-addr.arpa domain name pointer Vlan551.icore1.NTO-NewYork.as6453.net. 82.26.58.209.in-addr.arpa domain name pointer Vlan551.icore1.NTO-NewYork.as6453.net. 41.86.86.68.in-addr.arpa domain name pointer pos-1-9-0-0-cr01.newyork.ny.ibone.comcast.net. 42.86.86.68.in-addr.arpa domain name pointer pos-1-0-0-0-pe01.111eighthave.ny.ibone.comcast.net. Though I suppose if someone was photoshopping it, it would be pretty obvious for them to stick something that does show up in DNS into the graphs, so that doesn't exactly prove much. I'm also assuming Comcast wouldn't be very happy to have these out in public, so there is pretty much no way you're going to see a leaked graph that ISN'T from an anonymous source. FWIW these graphs pretty much reflect the massive congestion that I've been observing between Tata and Comcast. I've also seen some third party Smokeping graphs which visually show the rate of loss, and the pattern looks very very similar, but I'll let someone who actually maintains them be the one to post them. -- Richard A Steenbergen r...@e-gerbil.net http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
Re: Some truth about Comcast - WikiLeaks style
On Tue, Dec 14, 2010 at 11:24:45AM -0500, Craig L Uebringer wrote: Yeah, the 30 day looks like a classic uptick in traffic toward the holidays. Some bellhead beancounter maybe took out capacity in the summer lull and ignored the engineers. Or they just have stupidly-slow install intervals. Same crap I've seen on loads of provider networks. Except that they seem to be busy actively turning down other capacity, and forcing extra traffic through their Tata ports by blocking other paths with BGP no-export communities. For example, we've been observing Comcast turning down some of their Global Crossing capacity in recent days, causing new congestion during peak traffic times. I've even seen people contact the various NOCs involved, and they've been told explicitly and by multiple parties that Comcast is intentionally turning down extra capacity and running their existing ports hot. Everybody who deals with interconnection capacity in this industry knows what's going on, but the graphs and interconnection details are all under NDA, so it takes an inside source secretly leaking graphs to the public to expose this kind of activity. Even then you'll still have people who claim that it proves nothing because the graphs can't be positively associated to a specific customer port, but realistically these kinds of leaks are probably the best public info you'll ever see. -- Richard A Steenbergen r...@e-gerbil.net http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
Re: Some truth about Comcast - WikiLeaks style
On Tue, Dec 14, 2010 at 03:39:07PM -0600, Aaron Wendel wrote: To what end? And who's calling the shots there these days? Comcast has been nothing but shady for the last couple years. Spoofing resets, The L3 issue, etc. What's the speculation on the end game? I believe Comcast has made clear their position that they feel content providers should be paying them for access to their customers. I've seen them repeatedly state that they feel networks who send them too much traffic are abusing their network. It isn't a ratios argument in the classic sense, between two peers trying to maintain a fair balance of costs and benefits, it's that they object to ANY content provider being able to deliver to their customers without paying them for access. They do this by trying to enforce ratios which are well beyond what their actual end users are routing, and as in the case of Level 3, they leverage that position to claim that other networks should be paying them under threat of blocking uncongested access to their customers. I would say their short term goal is to make people who currently won't peer with them do so, so they can become transit free. This has been seen time and time again, as they move networks who they want to peer with but who will not peer with them into congested transit bucket. A while back it was SAVVIS, now it is Tata, but the pattern is clear and repetitive. Note that this only extends to a certain point though, as in the case of Global Crossing, who they claim is a settlement free peer, but who they have recently started pressuring and intentionally congesting because of ratio imbalances. Their long term goal seems to be to force content networks to pay them for direct transit or on-net connectivity, by removing the available capacity from other paths. If you are a content network, and you can't reach them in a reliable fashion via The Internet, your only choice may be to buy from Comcast directly. This is obviously not the first time that networks have used this strategy, there are several prominent examples in recent history of others using this exact same technique. But this is definitely one of the worst examples in the US of a major eyeball network using access to their customers (who may have little or no choice in their broadband access) to force other networks to pay them, and IMHO it needs to be called out publicly whenever possible. -- Richard A Steenbergen r...@e-gerbil.net http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
Re: TWT - Comcast congestion
On Wed, Dec 01, 2010 at 06:31:39AM -0800, Leo Bicknell wrote: In a message written on Tue, Nov 30, 2010 at 10:59:25PM -0600, Richard A Steenbergen wrote: I believe that's what I said. To be perfectly clear, what I'm saying is: * Comcast acted first by demanding fees * Level 3 went public first by whining about it after they agreed to pay * Comcast was well prepared to win the PR war, and had a large pile of content that sounds good to the uninformed layperson ready to go. I think I can make this very simple. What I am saying is that you're missing a step before your 3 bullet points. Before any of the three things you describe, Level 3 demanded fees from Comcast. Level 3 is doing a great job of getting folks to ignore that fact. Do you have any basis for this claim, or are you just making it up as a possible scenario that would explain Comcast's actions? I have it on good authority that Level 3 did not attempt to raise their prices or ask for additonal fees beyond their existing contract, nor was their contract coming to term where they could renegotiate for more favorable terms. Comcast simply said, we've decided we don't want to pay you, you should pay us instead, and you're going to bend over and like it if you want to be able to reach our customers. Obviously the version I've heard and the version you're pitching can't co-exist, so either you have some REALLY interesting inside info that I don't (which I honestly find hard to believe given your knowledge of the facts so far), or you're stating a theory with no possible basis that I can find as a fact. If it's just a theory, please say so, then we don't keep having to argue these positions that can clearly never converge. Comcast is a customer of L3, and pays them for service. Brining on Netflix will cause Comcast to pay L3 more. More interestingly, in this case it's likely Level 3 went to Comcast and said we don't think your existing customer ports will handle the additional trafficso...um...you should buy more customer ports. Comcast is th customer, they have complete and total control of the traffic being exchabged over their transit ports. If they wanted less traffic, they could announce fewer routes, or add more no-export communities. They also have complete control of traffic being sent outbound, and since Level3 is more than capable of handling 300Gbps (the capacity comcast claims they have), if Comcast actually had 300Gbps of outbound traffic to send they could easily have had a 1:1 ratio. Framing this as a peering ratio debate is absurd, because there two networks were NEVER peers. Any customer could have sent addtional bits to Level3 at any time, and Comcast should be prepared to deal with the TE as a result. That's life on the Internet. Does network neutrality work both ways? If it is bad for Comcast to hold the users hostage to extort more money from Level 3, is it also bad for Level 3 to hold the content hostage to extort more money from Comcast? You know, most people manage to buy sufficient transit capacity to support the volume of traffic that their customers pay them to deliver. Only Comcast seems to feel that it is proper to use their captive customer base hostage to extort content networks into paying for uncongested access. Level 3 is free to sell full transit or CDN to whomever they like, just as Comcast is free to not buy transit from Level 3 when their contract is up. The net neutrality part starts when Level 3 is NOT free to turn off their customer for non-payment just like what would happen to anyone else who suddenly decided they didn't think they should keep paying their bills, because Comcast maintains so little transit capacity that to shut them off would cause mssive disruptions to large portions of the Internet. -- Richard A Steenbergen r...@e-gerbil.net http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
Re: TWT - Comcast congestion
On Tue, Nov 30, 2010 at 11:45:53AM -0800, Kevin Oberman wrote: We have seen the same thing with other carriers. As far as I can see, Comcast is congested, at least at Equinix in San Jose. Since this is all over private connections (at least in our case), the fabric is not an issue. Maybe they will be using the money from Level(3) to increase capacity on the peerings with the transit providers. (Or maybe not.) I don't know about their connection to TWT, but Comcast has definitely been running their transits congested. The most obvious one from recent months is Tata, which appears to be massively congested for upwards of 12 hours a day in some locations. Comcast has been forcing traffic from large networks who refuse to peer with them (e.g. Abovenet, NTT, Telia, XO, etc) to route via their congested Tata transit for a few months now, their Level3 transit is actually one of the last uncongested providers that they have. The part that I find most interesting about this current debacle is how Comcast has managed to convince people that this is a peering dispute, when in reality Comcast and Level3 have never been peers of any kind. Comcast is a FULL TRANSIT CUSTOMER of Level3, not even a paid peer. This is no different than a Comcast customer refusing to pay their cable modem bill because Comcast sent them too much traffic (i.e. the traffic that they requested), and then demanding that Comcast pay them instead. Comcast is essentially abusing it's (in many cases captive) customers to extort other networks into paying them if they want uncongested access. This is the kind of action that virtually BEGS for government involvement, which will probably end badly for all networks. If there is any doubt about any of this, you can pop on over to lg.level3.net and look at the BGP communities Comcast is tagging on their Level3 transit service, preventing the routes from being exported to certain peers. For example, to my home cable modem: Community: North_America Lclprf_100 Level3_Customer United_States Chicago2 EU_Suppress_to_Peers Suppress_to_AS174 Suppress_to_AS1239 Suppress_to_AS1280 Suppress_to_AS1299 Suppress_to_AS1668 Suppress_to_AS2828 Suppress_to_AS2914 Suppress_to_AS3257 Suppress_to_AS3320 Suppress_to_AS3549 Suppress_to_AS3561 Suppress_to_AS3786 Suppress_to_AS4637 Suppress_to_AS5511 Suppress_to_AS6453 Suppress_to_AS6461 Suppress_to_AS6762 Suppress_to_AS7018 Suppress_to_AS7132 -- Richard A Steenbergen r...@e-gerbil.net http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
Re: TWT - Comcast congestion
together and deal with each other, cutting out the middle man Netflix is a Comcast customer too (again well established publicly and easily provable via the global routing table), but they don't run their own server infrastructure, and Comcast doesn't offer a CDN service... The reality is that Level 3 offered Netflix a cut-throat price on CDN service to steal the business from Akamai, probably only made possible by the double dipping mentioned above. They were already in for a world of hurt based on their CDN infrastructure investment and the revenue they were able to extract from it, this certainly isn't going to help things. :) -- Richard A Steenbergen r...@e-gerbil.net http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
Re: TWT - Comcast congestion
On Tue, Nov 30, 2010 at 07:53:25PM -0800, Leo Bicknell wrote: I'm not privy to the deal, but I will point out as reported it makes no sense, so there is something else going on here. This is where both sids are hiding the real truth. I suspect it's one of two scenarios: - Comcast demanded a lower price from Level 3, which Level 3 has spun as paying Comcast a monthly fee. - Comcast said they would do settlment free peering with Level 3, in addition to, or in place of transit. Level 3 is spinning the cost of turning this up as paying Comcast a fee. I suspect we'll not know what terms were offered for many years. While obviously nobody is going to come out and officially acknowledge the exact terms on the NANOG mailing list, I'd say this is far too massive a leap of logic to make any kind of sense. Both Level 3 and Comcast seem to acknowledge that Comcast is asking for Level 3 to pay, is it really so hard to believe that this is the case? :) Yes and no. First off, network neutrality is a vaguely defined term, so I'm not going to use it. Rather I'm going to say I think many people agree there is a concept that when it comes to traffic between providers there should be roughly similar terms for all players. Comcast shouldn't give Netflix a sweetheart deal while making Youtube pay through the nose. Why shouldn't they? Charging different people different rates based on their willingness to pay is perfectly legal last I looked, and goes on in every industry. Personally I thought net neutrality was about not charging Netflix a special fee or else risk having their services degraded (in the same way that the mob makes sure nothing bad happens to your store :P), so they don't compete with an internal VOD service which doesn't get such fees applied. But obviously net neutrality is like tier 1, you can apply any definition you'd like. :) The funny part is that Level 3 was clearly ill prepared for the PR war, whereas Comcast, being the first mover (if not the first PR issuer), was well prepared. Really? I just checked google news again, and the first statement I can find by either side was a Level 3 submission to business wire: I believe that's what I said. To be perfectly clear, what I'm saying is: * Comcast acted first by demanding fees * Level 3 went public first by whining about it after they agreed to pay * Comcast was well prepared to win the PR war, and had a large pile of content that sounds good to the uninformed layperson ready to go. The reality is that Level 3 offered Netflix a cut-throat price on CDN service to steal the business from Akamai, probably only made possible by the double dipping mentioned above. They were already in for a world of hurt based on their CDN infrastructure investment and the revenue they were able to extract from it, this certainly isn't going to help things. :) I feel you undercut your network neutrality argument right here, because you make an argument that this is just two competitive businesses trying to get a leg up on each other. You can't have the fairness part of network neutrality and try and stab each other in the back at every step. The net neutrality part comes from the fact that Level 3 can't just turn Comcast off for non-payment without risking massive impact to their customers. I'm pretty sure Level 3 is still allowed to charge people for transit services. If Comcast didn't want to buy from Level 3 they could have easily gone elsewhere, the part where the gov't steps in is when someone is abusing a monopoly/duopoly position. Neither Level 3 nor Comcast here are interested in the fairness of network neutraility, or even interested in helping their customers. They are interested in hurting their competitors and boosting their own bottom line. Probably true, but I'm sure someone somewhere (i.e. the consumers who have little to no choice in their home broadband) cares about the fairness just a little. I bet the cash spent on lawyers and lobbiests taking this to the FCC on both sides could pay for enough backbone bandwidth and router ports to make this problem go away on both sides many times over. If they really cared about the customers experience and good network performance they would put away the press release swords, the various VP and CxO's egos, and come up with a solution. Do you really think Comcast cares about the $50k router ports (by their own accounts, though personally I'd suggest they get off the CRS-1 tippe if they actually wanted to save some money :P), or might they actually be more interested in establishing themselves as a new Tier 1? :) At the end of the day both companies have made their share of mistakes, but I have a lot more respect for the ones who compete fairly and honestly, rather than by forcing people to use their services or else. -- Richard A Steenbergen r...@e-gerbil.net http://www.e
Re: experience with equinix exchange
On Sun, Nov 28, 2010 at 04:09:55PM -0600, Aaron Wendel wrote: According to pch they don't run most of them. I would say they run very few compared to how many there actually are. Uhh... Reality check, with the SD acquisition Equinix controls the VAST majority of the IX traffic in the US. The only other IX's doing anything even approaching interesting traffic are NOTA (in Miami), NYIIX (in New York), SIX (in Seattle), and the former AtlantaIX (now Telx TIE) in Atlanta. All are regional players, with very incomplete coverage of the important regions in the US, so if you're peering in the US you're almost guaranteed to be dealing with Equinix. Nobody else is even noteworthy, you can probably do more traffic than the other IX's by leaving a bit torrent client running overnight. Anyone can throw a Linksys switch in their basement and call themselves an exchange point, but that doesn't mean anyone is going to show up and peer there. -- Richard A Steenbergen r...@e-gerbil.net http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
Re: experience with equinix exchange
On Mon, Nov 29, 2010 at 04:03:21PM -0500, Patrick W. Gilmore wrote: The only thing I would change is that Any2 has at least one exchange with traffic (Los Angeles) and is distributed throughout the country. But the vast majority of traffic exchange over IXes in the US is over Equinix/PAIX switches. And a very large amount of traffic over private interconnects is also done in their buildings. Woops, yes I forgot Any2 (how'd that happen? :P). Like Telx they've recently deployed a bunch of new exchanges all over, but there is really only the one that does any traffic. :) For comparison purposes: http://www.seattleix.net/agg.htm http://www.nyiix.net/index.php?core=statistics.php http://tie.telx.com/usage.pl http://www.coresite.com/peering-any2charts.php I don't think the combined Equinix / SD numbers are published publicly anywhere, but I'm sure it's north of a terabit. :) -- Richard A Steenbergen r...@e-gerbil.net http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
Re: Outage between GBLX and HE?
On Wed, Nov 17, 2010 at 10:36:09AM -0500, Christopher J. Pilkington wrote: On Wed, Nov 17, 2010 at 09:55:10AM +, Paul Kelly :: Blacknight wrote: I may have spoken too soon... issues are on going. We were seeing routing irregularities with GBLX as well. It seems they sending out our prefix to their peers, but blackholing the traffic coming back. We've shutdown our session with AS3549 until someone there answers our ticket. Probably another LSP blackholing issue, look at the archives a few weeks back you'll see the same issue on GX in Seattle. As for the issue this morning, they have a router that has been blackholing traffic in Ashburn for a good long while now. I almost put on my Global Double Crossing t-shit this morning too. :) http://www.printfection.com/ras/Global-Double-Crossing-2-T-Shirt/_p_4935066 -- Richard A Steenbergen r...@e-gerbil.net http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
Re: flow analysis for juniper devices
On Tue, Nov 16, 2010 at 12:33:37AM +0100, bas wrote: Shouldn't there be a (**) (**) Also Except for MX'es with trio chipsets. These can do inline-jflow that export to IPFIX (modified netflow v9) All of the open source collector solutions I've tried that can handle v9 cannot handle IPFIX from the trio cards. Richard; Do you have something that handles IPFIX? Yes there's that too. I haven't actually gotten around to testing the Trio specific Netflow capabilities yet, but supposedly they only support IPFIX when using the built in sampling capabilities. If you want v9 you'll still need a Multiservice DPC, or you can always stick to classic RE-sampled v5/v8. IPFIX is effectively netflow v10, it's largely based off of v9, but it's just different enough to be incompatible. Of course it's close enough that it shouldn't be THAT much work if you already have an existing v9 parser, but I don't know what software actually supports it today. The only flow collector implementation which I've spent any amount of time looking at besides the stuff I've written myself is pmacct, which IMHO shows great promise, but I don't believe it supports IPFIX yet. For my purposes I'd have been just as happy if everyone had standardized on sFlow (especially since I already wrote a parser for it :P), but alas it isn't meant to be. Some differences between v9 and IPFIX that googling turned up: http://www.plixer.com/blog/netflow/what-is-ipfix-vs-netflow-v9/ -- Richard A Steenbergen r...@e-gerbil.net http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
Re: flow analysis for juniper devices
On Sun, Nov 14, 2010 at 08:59:33AM +, Paolo Lucente wrote: On Sat, Nov 13, 2010 at 09:17:55PM -0600, Richard A Steenbergen wrote: Oh and the sFlow on EX is actually pretty cripled when used for routing. It's missing support for a bunch of important extended message tpes, and doesn't fully populate all of the fields of the message types it does send. For example you won't get any data on ASNs, nexthops, dest ifindexes, or even netmasks of the src/dst route the flow matched, making it pretty darn useless for a lot of tasks. It's functional if you're just analyzing L2 networks at any rate. Agree people spend some money and hence tend to expect something in return. But it's also true those good souls developing free collectors (to stay in topic with the OP) sometimes come to the rescue: ASNs, BGP next-hop, routes, netmasks can be all looked up at the collector at pretty no major effort. Variety of methods available depending on the collector, in place or a posteriori, file or BGP lookup - it's matter of selecting what fits better the specific job. Yes you can do an offline routing lookup to try and reconstruct some missing data (or do some even more interesting analysis, as described in http://www.nanog.org/meetings/nanog35/presentations/steenbergen.pdf), but it isn't always a practical solution to missing netmask, nexthop, and dest ifindex data. Remember that every RIB in your network can and will have a unique best path selection (thanks to the EBGP IBGP rule if nothing else), and if you have a network of any size at all you'll probably have to deal with multiple exits to the same network. Even if you were only concerned with analyzing external traffic, you'd still need to collect a RIB per edge router using an IBGP feed. In my network this would put you well over 10 million paths, and consume several gigs of ram, not to mention the load of doing the routing lookups themselves. If you wanted to do traffic analysis inside your network you'd need a feed from every router, and maybe even active participation in your IGP. It CAN be done, but it's not pretty, and I don't think any existing free software has been tested under these kinds of conditions. So when a vendor says we support sFlow, make sure they actually support the message types and fields you need. :) -- Richard A Steenbergen r...@e-gerbil.net http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
Re: flow analysis for juniper devices
On Sun, Nov 14, 2010 at 12:07:40PM +1000, Mehmet Akcin wrote: hey there any recommendations on freeware flow analysis tool which can show the flow not only per prefix basis but also show asn and/or country/region as well? Juniper only. feel free to contact on/off list. Juniper's flow export is just like everyone else's (*), so any tool will do the same thing. Country/region analysis would depend on third party geolocation services, which have nothing to do with netflow. :) (*) Well, except M/T/MX only support NetFlow v5/v8 in the free software based sampling mode, you need an expensive services card and software license to do v9 for some reason. Oh and the sFlow on EX is actually pretty cripled when used for routing. It's missing support for a bunch of important extended message tpes, and doesn't fully populate all of the fields of the message types it does send. For example you won't get any data on ASNs, nexthops, dest ifindexes, or even netmasks of the src/dst route the flow matched, making it pretty darn useless for a lot of tasks. It's functional if you're just analyzing L2 networks at any rate. -- Richard A Steenbergen r...@e-gerbil.net http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
Re: Extra latency at ATT exchange for UVerse
, SBC/AS7132, and Bellsouth/AS6389, each with their own unique routing policies. The latency jump would be a near perfect fit for there still being some direct AS7132 peering sessions up, but only in Ashburn and not Atlanta. If nothing else, this illustrates one key point of troubleshooting with traceroute. The actual output of the traceroute is often worthless without knowing the source and destination IPs that were being tested, so *ALWAYS* provide those along with your traceroutes if you want to ever have any hope of having your problem solved. :) -- Richard A Steenbergen r...@e-gerbil.net http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
Re: RINA - scott whaps at the nanog hornets nest :-)
On Sun, Nov 07, 2010 at 08:02:28AM +0100, Mans Nilsson wrote: The only reason to use (10)GE for transmission in WAN is the completely baroque price difference in interface pricing. With todays line rates, the components and complexity of a line card are pretty much equal between SDH and GE. There is no reason to overcharge for the better interface except because they (all vendors do this) can. To be fair, there are SOME legitimate reasons for a cost difference. For example, ethernet has very high overhead on small packets and tops out at 14.8Mpps over 10GE, whereas SONET can do 7 bytes of overhead for your PPP/HDLC and FCS etc and easily end up doing well over 40Mpps of IP packets. The cost of the lookup ASIC that only has to support the Ethernet link is going to be a lot cheaper, or let you handle a lot more links on the same chip. At this point it's only half price gouging of the silly telco customers with money to blow. There really are significant cost savings for the vendors in using the more popular and commoditized technology, even though it may be technically inferior. Think of it like the old IDE vs SCSI wars, when enough people get onboard with the cheaper interior technology, eventually they start shoehorning on all the features and functionality that you wanted from the other one in the first place. :) -- Richard A Steenbergen r...@e-gerbil.net http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
Re: RINA - scott whaps at the nanog hornets nest :-)
On Sun, Nov 07, 2010 at 12:34:56AM -0700, George Bonser wrote: Yes, I really don't understand that either. You would think that the investment in developing and deploying all that SONET infrastructure has been paid back by now and they can lower the prices dramatically. One would think the vendors would be practically giving it away, particularly if people understood the potential improvement in performance, though the difference between 1500 and 4000 is probably not all that much except on long distance ( 2000km ) paths. Careful, you're rapidly working your way up to nanog kook status with these absurd claims based on no logic whatsoever. -- Richard A Steenbergen r...@e-gerbil.net http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
Re: RINA - scott whaps at the nanog hornets nest :-)
the mechanisms currently at our disposal. :) -- Richard A Steenbergen r...@e-gerbil.net http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
Re: RINA - scott whaps at the nanog hornets nest :-)
On Sat, Nov 06, 2010 at 02:21:51PM -0700, George Bonser wrote: That is not a new problem. That is also true to today with last mile links (e.g. dialup) that support 1500 byte MTU. What is different today is RFC 4821 PMTU discovery which deals with the black holes. RFC 4821 PMTUD is that negotiation that is lacking. It is there. It is deployed. It actually works. No more relying on someone sending the ICMP packets through in order for PMTUD to work! The only thing this adds is trial-and-error probing mechanism per flow, to try and recover from the infinite blackholing that would occur if your ICMP is blocked in classic PMTUD. If this actually happened in any scale, it would create a performance and overhead penalty that is far worse than the original problem you're trying to solve. Say you have two routers talking to each other over a L2 switched infrastructure (i.e. an exchange point). In order for PMTUD to function quickly and effectively, the two routers on each end MUST agree on the MTU value of the link between them. If router A thinks it is 9000, and router B thinks it is 8000, when router A comes along and tries to send a 8001 byte packet it will be silently discarded, and the only way to recover from this is with trial-and-error probing by the endpoints after they detect what they believe to be MTU blackholing. This is little more than a desperate ghetto hack designed to save the connection from complete disaster. The point where a protocol is needed is between router A and router B, so they can determine the MTU of the link, without needing to involve the humans in a manual negotiation process. Ideally this would support multi-point LANs over ethernet as well, so .1 could have an MTU of 9000, .2 could have an MTU of 8000, etc. And of course you have to make sure that you can actually PASS the MTU across the wire (if the switch in the middle can't handle it, the packet will also be silently dropped), so you can't just rely on the other side to tell you what size it THINKS it can support. You don't have a shot in hell of having MTUs negotiated correctly or PMTUD work well until this is done. Is there any gear connected to a major IX that does NOT support large frames? I am not aware of any manufactured today. Even cheap D-Link gear supports them. I believe you would be hard-pressed to locate gear that doesn't support it at any major IX. Granted, it might require the change of a global config value and a reboot for it to take effect in some vendors. http://darkwing.uoregon.edu/~joe/jumbo-clean-gear.html If that doesn't prove my point about every vendor having their own definition of what # is and isn't supported, I don't know what does. Also, I don't know what exchanges YOU connect to, but I very clearly see a giant pile of gear on that list that is still in use today. :) As for the configuration differences between units, how does that change from the way things are now? A person configuring a Juniper for 1500 byte packets already must know the difference as that quirk of including the headers is just as true at 1500 bytes as it is at 9000 bytes. Does the operator suddenly become less competent with their gear when they use a different value? Also, a 9000 byte MTU would be a happy value that practically everyone supports these days, including ethernet adaptors on host machines. Everything defaults to 1500 today, so nobody has to do anything. Again, I'm actually doing this with people today on a very large network with lots of peers all over the world, so I have a little bit of experience with exactly what goes wrong. Nearly everyone who tries to figure out the correct MTU between vendors and with a third party network gets it wrong, at least some significant percentage of the time. And honestly I can't even find an interesting number of people willing to turn on BFD, something with VERY clear benefits for improving failure detection time over an IX (for the next time Equinix decides to do one of their 10PM maintenances that causes hours of unreachability until hold timers expire :P). If the IX operators saw any significant demand they would have already turned it on already. -- Richard A Steenbergen r...@e-gerbil.net http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
Re: RINA - scott whaps at the nanog hornets nest :-)
, and every piece of gear supports it. It also doesn't accomplish anything, as almost no packets flowing through your SONET links are 1500 bytes, and if you actually tried to show up to the Internet with a PC and a 4474 byte MTU you'd have a bad time. At any rate, I'm going to stop arguing this one, as I think we've beaten this dead horse enough for one day. Please read what I said carefully, I promise you this isn't as easy as you think it is. :) -- Richard A Steenbergen r...@e-gerbil.net http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
Re: RINA - scott whaps at the nanog hornets nest :-)
On Fri, Nov 05, 2010 at 03:32:30PM -0700, Scott Weeks wrote: It's really quiet in here. So, for some Friday fun let me whap at the hornets nest and see what happens... ;-) Arguments about locator/identifier splits aside (which I happen to agree with), this thing goes off the deep end on page 7 when it starts talking about peering infrastructure. Infact pretty much every sentence on that page is blatantly wrong. :) -- Richard A Steenbergen r...@e-gerbil.net http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
Re: Equinix of Candia?
On Mon, Nov 01, 2010 at 06:31:34PM -0700, Ryan Finnesey wrote: Equinix only has one center within Toronto. Is there someone with a larger number of centers across the country? I'm assuming when you say like Equinix you mean a carrier neutral colo where you can buy from, sell to, and interconnect with other networks in an interesting fashion. If you're just looking for a place to stuff some servers, the answer will be very different. Canada is an odd market, with relatively little competition between carriers (outside of a few locations), and most of the bandwidth controlled by a few large incumbents. The biggest and most interesting facility for carrier neutral services is 151 Front in Toronto, where nearly every bit in the region goes. Switch and Data (now Equinix) is one major colo and IX operator in the building, but there are many more, and a building MMR. Technically this makes it more like a 111 8th than an Equinix. :) In Montreal there is Canix (www.canix.ca), which operates multiple facilities throughout the city, and is the defacto standard for carrier neutral colo there. This is probably the closest thing you'll find to an Equinix. If there is anything interesting going on in Vancouver I haven't heard of it, but I don't know the market well enough to say for certain. Everywhere else is either too small to care about on a national scale, or is serviced by non-neutral colos (e.g. Peer1, etc). -- Richard A Steenbergen r...@e-gerbil.net http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
Re: ATT/L3 interconnect?
On Mon, Oct 11, 2010 at 02:48:00PM -0400, Deepak Jain wrote: http://www.nanog.org/meetings/nanog45/presentations/Sunday/RAS_traceroute_N45.pdf I'd have thought I didn't need to provide credentials in NANOG, but apparently one stays quiet too long and you're a noob. First, to those who have given me basic mpls, traceroute and ip primers by off list email, thank you. It's not necessary. I appreciate your willingness to help out the community. Second, I *know* that the traceroute I pasted a bit of has to do with mpls magic (or similar). That's why I used the word tunnel. I wasn't asking *how* it was done. I'm quiet capable of performing the same magic. I just wanted to know if anyone off the top of their head knew *where* the packets were magically popping back into the ether... LA, Nevada, Denver. That's all. A physical location or a router IP would have been a perfectly wonderful answer. Hey Deepak, Sorry, but they're actually right. Read the section on icmp tunneling, it explains exactly how and why you're seeing this behavior. :) The return packets pop our at the end of the lsp, which is clearly in LA (or thereabouts, whatever lsrca is probably). -- Richard A Steenbergen r...@e-gerbil.net http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
Re: reachability problems Europe-US?
On Thu, Oct 07, 2010 at 07:12:33PM +0200, Thomas Schmid wrote: yes, I can confirm that situation is back to normal now after we re-enabled the GBLX session. I heared from others that it was again a broken LSP problem in GBLX (unconfirmed :) ) Global Crossing recently started deploying Foundry/Brocade XMR's in their MPLS core, as a lower cost alternative to their old T640/OC192 MPLS core model. Unfortunately these boxes are buggy as all hell, and seem to blackhole LSPs somewhere in their network on at least a weekly basis. I think we've seen at least a dozen issues similar to this over the last couple months, though most of them were out of LA, so I didn't know they had actually done a Seattle deployment. Honestly GX deserves what they get on this one. I'm not aware of any other large network who has ever done a serious MPLS deployment using these boxes (and if you're thinking of replying to this and saying hey we do some vll's between 2 routers and it seems to work, stop and think about what I might mean when I say a SERIOUS mpls deployment first :P), so this was pretty much to be expected. I'll also say that I'm remarkably underwhelmed by their response to this issue, and suggest that anyone who doesn't want their packets blackholed by the Floundrys be prepared to vote with their wallet. -- Richard A Steenbergen r...@e-gerbil.net http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
Re: BGP next-hop
On Thu, Sep 30, 2010 at 07:01:19AM -0700, Leo Bicknell wrote: I have suggested more than a few times to vendors that the command: show bgp ipv4 unicast 100.10.0.0/16 why-chosen Would be insanely useful. Been in JUNOS show route since day one, and IMHO is easily in the top 10 list of why I still buy Juniper instead of Cisco despite all the $%^*ing bugs these days. -- Richard A Steenbergen r...@e-gerbil.net http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
Re: BGP next-hop
On Thu, Sep 30, 2010 at 11:56:06PM +0100, Heath Jones wrote: Its interesting, I was heavy into cisco years back and then juniper for a while. Going back to cisco now is great (always good for me to keep my exposure up), but there is just so much unclear in it's CLI. It wasn't until going back that I realised. I guess they would have to balance keeping the old timers scripts etc happy VS bringing in new features that make the output look different.. Do you keep something that isn't perfect but people know how to use, or change it and cause more issues than good? Personally I still can't believe that it's the year 2010, and IOS still shows routes in classful notation (i.e. if it's in 192.0.0.0/3 and is a /24, the /24 part isn't displayed because it's assumed to be Class C). Of course I say that every year, and so far the only thing that has changed is the year I say it about. ps. Juniper has really gone to $h!t lately. There's a website called glassdoor.com that I found - go look up what employees have to say about it.. reflects exactly the support we were getting, even as as an 'elite' partner.. Don't get me started, I could complain for days and still not run out of material, but alas it doesn't accomplish anything. Sadly, many of the best Juniper people I know are incredibly disaffected, and are leaving (or have already left) in droves. I think the way I heard it put best was, I'm convinced that $somenewexecfromcisco is actually on a secret 5 year mission to come over to Juniper, completely $%^* the company, and then go back to Cisco and get a big bonus for it. :) -- Richard A Steenbergen r...@e-gerbil.net http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
Re: Routers in Data Centers
On Sun, Sep 26, 2010 at 09:24:54PM -0400, Alex Rubenstein wrote: And, not to mention that some vendors do it sometimes. The 9-slot Cisco Catalyst 6509 Enhanced Vertical Switch (6509-V-E) provides [stuff]. It also provides front-to-back airflow that is optimized for hot and cold aisle designs in colocated data center and service provider deployments and is compliant with Network Equipment Building Standards (NEBS) deployments. A classic 6509 is under 15U, a 6509-V-E is 21U. Anyone can do front to back airflow if they're willing to bloat the size of the chassis (in this case by 40%) to do all the fans and baffling, but then you'd have people whining about the size of the box. :) It only took 298 years from the inception of the 6509 to get a front-to-back version. If you can do it with that oversized thing, it certainly can be done on a 7200, XMR, juniper whatever, or whatever else you fancy. Well, a lot of people who buy 7200s, baby XMRs, etc, are doing it for the size. Lord knows I certainly bought enough 7606s instead of 6509s over the years for that very reason. I'm sure the vendors prefer to optimize the size footprint on the smaller boxes, and only do front to back airflow on the boxes with large thermal loads (like all the modern 16+ slot chassis that are rapidly approaching 800W/card). Also, remember the 6509 has been around since its 9 slots were lucky to see 100W/card, which is a far cry from a box loaded with 6716s at 400W/card or other power hungry configs. Remember the original XMR 32 chassis, which had side to side airflow? They quickly disappeared that sucker and replaced it with the much larger version they have today, I can only imagine how bad that was. :) -- Richard A Steenbergen r...@e-gerbil.net http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
Re: Did your BGP crash today?
On Fri, Aug 27, 2010 at 01:29:15PM -0400, Jared Mauch wrote: Unknown BGP attribute 99 (flags: 240) Unknown BGP attribute 99 (flags: 240) Unknown BGP attribute 99 (flags: 240) Unknown BGP attribute 99 (flags: 240) Unknown BGP attribute 99 (flags: 240) Just out of curiosity, at what point will we as operators rise up against the ivory tower protocol designers at the IETF and demand that they add a mechanism to not bring down the entire BGP session because of a single malformed attribute? Did I miss the memo about the meeting? I'll bring the punch and pie. -- Richard A Steenbergen r...@e-gerbil.net http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
Re: Did your BGP crash today?
On Fri, Aug 27, 2010 at 01:43:39PM -0700, Clay Fiske wrote: If -everyone- dropped the session on a bad attribute, it likely wouldn't make it far enough into the wild to cause these problems in the first place. And if everyone filtered their BGP customers there would be no routing leaks, but we've seen how well that works. :) The if anything bad happens, drop the session method of protection is only effective if EVERY BGP implementation catches EVERY malformed update EVERY time, which just doesn't match up with reality. Not only that, but a healthy number of the bgp update issues over the years have actually been the result of implementations detecting perfectly valid things as invalid, which means by definition the implementations which get it right and don't drop the session act as carriers and spread the problem route globally. How long as we going to continue to act like this method of protection is actually working? Lets be reasonable, if your basic bgp message format is malformed you're going to need to drop the session. If the packet is corrupted or the size of the message doesn't match whats in the tlv, you're not going to be able to continue and you'll have to drop the session. But there are still a huge number of potential issues where it would be perfectly safe to drop the update you didn't like, and support for this could easily be negotiated and the sending side informed of the issue by a soft notification extension. I have yet to see a single argument against this which isn't political or philosophical in nature. -- Richard A Steenbergen r...@e-gerbil.net http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
Re: Inquiries to Acquire IPs
On Sat, Jul 03, 2010 at 10:42:55PM +0200, Mans Nilsson wrote: aut-num:AS31337 as-name:ELEET-AS descr: ELEET Network descr: Location: Sweden (Story is, IIRC, that adjacent number was assigned initially, but the confirmation mail was answered with Can I have 31337 instead? which in turn was granted. ) I tried to time it to get 6.9 from ARIN, ended up with 6.8 instead, and they kept 6.9 for themselves. Bastards! :) -- Richard A Steenbergen r...@e-gerbil.net http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
Re: BGP convergence problem
On Tue, Jun 08, 2010 at 12:22:04PM -0400, Jared Mauch wrote: The Cisco 7600 and 6500 platforms are getting fairly old and have underpowered cpus these days. Starting in SXH the control plane did not scale quite as well as in SXF. This got better in SXI, but is not back on par with SXF performance yet. I mostly attribute this to a combination of bloat in software and routing tables. I would start to look for a replacement sooner rather than later. Place blame where blame is due, the cpu may be slow, but the crappy ios scheduler is the real problem here. We saw a huge reduction in the number of self-sustaining protocols timeouts cycles on these boxes (where the process of trying to bring up a new neighbor and converge routing uses so much cpu that it causes other neighbors to time out, resulting in a never-ending cycle of fail until you shut down everything and bring them up one neighbor at a time) with the move from SXF to the SR branches. We never really went down the SXH/SXI road, but I'd have assumed they would have introduced the same improvements there too. I guess you know what they say about assuming. :) Try the usual suspects: * Configure process-max-time 20 at the top level, this improves interactivity by making the scheduler switch processes more often. * Make sure you don't have an overly aggressive control-plane policer. In my experience the COPP rate-limits are quite harsh, and if you end up bumping against them you don't get a graceful slowing of the exchange of routes, you get protocol timeouts. * Make sure you don't have any stupid mls rate-limits, such as cef receive. I don't know why anyone would ever want to configure this, all it does is make your box fall over faster (as if these things need any help) by rate-limiting all traffic to the msfc. * You might want to try something like scheduler allocate 400 4000, which gives the vast majority of the cpu time to the control plane rather than process switching on the data plane (which in theory shouldn't happen on an entirely hw forwarded box like 6500/7600, though of course we all know that isn't true :P). Oh and also the OP should take this to the cisco-nsp mailing list, where all the good bitching about broken Crisco routers takes place. :) -- Richard A Steenbergen r...@e-gerbil.net http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
Re: Junos Asymmetric Routing
On Sun, May 30, 2010 at 10:16:14AM -0700, Kevin Oberman wrote: I remember a posting to this list back in the late 90s from Tony Li, who knows a bit about BGP. He urged that multi-hop BGP never be used and pointed out that it had not been intended for use except as a test tool, not a production one and should have been stripped from IOS before it was shipped. While there are a few good cases for using it, it is generally a bad, bad idea. And this thread demonstrates that he had reason for the warning I think you guys may be getting a tad carried away with the crusade against multihop BGP. The only thing you're really giving up when you use it is liveness detection, which as we all know BGP is actually pretty terrible at implementing anyways (hows that 180 sec IOS default working out for you?). There are much better mechanisms out there, like BFD, which could be used to provide better liveness detection to BGP through nexthop invalidation. I'm not saying everyone should run out and do all their peering over multihop EBGP without carefully considering a replacement for the liveness detection component, I just hate it when people get religious about such a simple concept for no good reason (well, other than Randy Bush getting to do his best Andy Rooney impersonation :P). Multihop BGP is no more evil than anything else we do with the Internet, and the fact that we've all managed to use it successfully for IBGP proves that it can work out just fine. There are some pretty interesting things you can accomplish as far as large scale traffic engineering if you can free yourself from the requirement of speaking EBGP with a directly connected neighbor, processed by whatever slow overpriced router CPU could be stuffed into that box. Again, I just hate to see the concepts dismissed out of hand because of some old BGP ideology about a problem that can be addressed any number of other ways. -- Richard A Steenbergen r...@e-gerbil.net http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
Re: BGP and convergence time
On Wed, May 12, 2010 at 09:52:48AM -0600, Danny McPherson wrote: The holdtime isn't technically negotiated, both sides convey their value in the open message and the lower of the two is used by both BGP speakers. IIRC, neither J or C reset the session with the timer change, but the new holdtimer expiry value doesn't take effect until then. Rest assured J will always reset the session if given given half a chance, and changing your holdtime is more than half. :) One thing I find interesting is that most other protocols will err on the side of caution and use the higher of two values like this when negotiating between two parties, but BGP does the opposite. I still run into bad bgp implementations which can't keep up with my 30 sec hold timers all the time *coughghettoequinixrouteservercough*. -- Richard A Steenbergen r...@e-gerbil.net http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
Re: SFP+ ER and ZR
On Tue, May 11, 2010 at 04:24:42AM -0400, bas wrote: Hi Guys, I thought ER and ZR SFP+ optics were not available (yet) due to power and cooling challenges. However on this site: http://www.excelight.com/products/datalink/sfpplus.asp They offer both ER and ZR SFP+ optics. Has anyone used or tested with these? If so with which equipment? Or have you found other vendors of these optics? They aren't (yet), these are vaporware. Many amnufacturers are close to having reliable 40km optics, and several are making 20km+ overpowered LRs, but ZR and DWDM are still a ways out. There are also some CWDM units in the works, but because SFP+ doesn't support onboard EDC you are limited by dispersion to 10km in the traditional 8ch 1470-1610nm CWDM space over SMF28. -- Richard A Steenbergen r...@e-gerbil.net http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
Re: BGP and convergence time
On Tue, May 11, 2010 at 09:31:51PM -0400, Jay Nakamura wrote: Yes, I understand BFD. The question is, do carriers usually do BFD with customers? And if they say no, are there other remedies? ATT doesn't seem to be even willing to change BGP timers. If anyone have been able to talk ATT or Qwest in doing so, it would really help to find out how they convinced them. They are such a big bureaucracies that it's frustrating to do anything that makes sense. Although Qwest seems a lot more responsive than ATT. Slow as the titanic carriers won't do anything innovative for anyone, regardless of the benefit. Try a clueful carrier and they'll be happy to run BFD with you. Of course after promoting it for more than a year now we have something like 5 peers and 0 customers using it (mostly because of broken vendor implementations), but hey it's never too late to start. :) -- Richard A Steenbergen r...@e-gerbil.net http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
Re: Juniper firewalls - SSG or SRX
On Tue, Apr 20, 2010 at 04:18:11AM -0700, Owen DeLong wrote: Interesting. My SRXes have been rock solid since upgrading to 10.0R1.8. Not so much here. My basement SRX210 starts dropping bgp sessions over an IPSEC tunnel every 30 secs or so after around 1-1.5 days of uptime, and won't stop until you restart rpd (which buys you another day or so of functioning bgp). And about 1 out of every 4 times you do restart rpd, dhcpd will spin at 100% cpu until you restart that too. Even 10.1S1.3 doesn't help these issues. It's a nice box in theory, and it has lots of potential, but lots and lots of unresolved bugs too. I knew things were off to a bad start when I tried to downgrade from the 10.0R1 that shipped with the box to 9.6 after my first round of issues, and it crashed in the middle of the installer, wiping the config in the process and requiring a tftp boot of new code to recover. :) -- Richard A Steenbergen r...@e-gerbil.net http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
Re: what about 48 bits?
On Sun, Apr 04, 2010 at 11:53:54AM -0300, A.B. Jr. wrote: Hi, Lots of traffic recently about 64 bits being too short or too long. What about mac addresses? Aren't they close to exhaustion? Should be. Or it is assumed that mac addresses are being widely reused throughout the world? All those low cost switches and wifi adapters DO use unique mac addresses? http://en.wikipedia.org/wiki/MAC_address The IEEE expects the MAC-48 space to be exhausted no sooner than the year 2100[3]; EUI-64s are not expected to run out in the foreseeable future. -- Richard A Steenbergen r...@e-gerbil.net http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
Re: what about 48 bits?
On Mon, Apr 05, 2010 at 10:57:46AM +0930, Mark Smith wrote: Has anybody considered lobbying the IEEE to do a point to point version of Ethernet to gets rid of addressing fields? Assuming an average 1024 byte packet size, on a 10Gbps link they're wasting 100+ Mbps. 100GE / 1TE starts to make it even more worth doing. If you're lobbying to have the IEEE do something intelligent to Ethernet why don't you start with a freaking standardization of jumbo frames. The lack of a real standard and any type of negotiation protocol for two devices under different administrative control are all but guaranteeing end to end jumbo frame support will never be practical. -- Richard A Steenbergen r...@e-gerbil.net http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
Re: CSIRT - Backbone Security : Runtime Monitoring and DynamicReconfiguration for Intrusion Detection Systems
On Thu, Mar 18, 2010 at 12:18:40AM +, char...@www.knownelement.com wrote: Mods, Can we get the spam off the list? Its getting old. FYI this guy has been spamming individuals and PeeringDB contacts for a couple months now too. -- Richard A Steenbergen r...@e-gerbil.net http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
Re: YouTube AS36561 began announcing 1.0.0.0/8
On Fri, Mar 12, 2010 at 07:34:10AM -0500, Patrick W. Gilmore wrote: Oh, I understand what's going on exactly. YouTube is trying to balance their ratios. :) That might explain why they're only announcing it behind Cogent. :) -- Richard A Steenbergen r...@e-gerbil.net http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
Re: Linux Router distro's with dual stack capability
On Thu, Feb 11, 2010 at 03:46:13PM -0800, Kevin Oberman wrote: Polling is excellent for low speed lines, but for Gig and faster, most newer interfaces support interrupt coalescing. This easily resolves the issue in hardware as interrupts are only issued when needed but limited to a reasonable rate, Polling does not use interrupts, but consumes system resources regardless of traffic. FreeBSD has supported polling for a long time (V6?) and interrupt coalescing since some release of V7. (Latest release is V8.) I'm pretty sure it's been around for a lot longer than that. I seem to recall playing with both back in 4.x. Of course interrupt coalescing is mostly a function of the NIC (though some driver involvement is required to take advantage of it), so the quality of the implementations have varied significantly over the years. The first generation GE NICs which offered it didn't do a particularly good job with it though, so for example it was still possible to cripple a box with high interrupt rates while the same box would be perfectly fine with polling. That said, I think your use case for polling is backwards. As you say, normally the NIC fires off an interrupt every time a packet is received, and the kernel stops what it is doing to process the new packet. On a low speed (or at least low traffic) interface this isn't a problem, but as the packet/sec rate increases the amount of time wasted as interrupt processing overhead becomes significant. For example, even a GE interface is capable of doing 1.488 million packets/sec. By switching to a polling based model, you switch off the interrupt generation completely and simply check the NIC for new packets a set rate (for example, 1000 times/sec). This gives you a predictable and consistent CPU use, so even if you had 1.488M/s interrupts coming in you would still only be checking 1000 times/sec. If you did less than 1000pps it would be a net increase in CPU, but if you do more (or ever risk doing more, such as during a DoS attack) it could be a net benefit. This is makes the most sense for people doing a lot of traffic regardless. Of course the downside is higher latency, since you're delaying the processing of the packet by some amount of time after it comes in. In the 1000 times/sec example above, you could be delaying processing of your packet by up to 1ms. For most applications this isn't enough to cause any harm, but it's something to keep in mind. Interrupt coalescing works around the problem of large interrupt rates by simply having the NIC limit the number of interrupts it generates under load, giving you the benefits of low-latency processing and low-interrupt rate under high load. I haven't played with this stuff in many many years, so I'm sure modern interrupt coalescing is much better than it used to be, and the extra work of configuring polling and dealing with the potential latency/jitter implications isn't worth the benefits for most people. :) -- Richard A Steenbergen r...@e-gerbil.net http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
Re: [NANOG] Contacts @ China Unicom and China Telecom
On Wed, Feb 03, 2010 at 11:40:38AM -0800, Justin Ream wrote: Hi All - Does anyone have peering contacts for China Unicom and China Telecom? Finding that the ones for Any2 in peeringdb.com are no good. Will take replies offlist, thanks! Last I checked the China Telecom e-mails listed worked fine, but the China Unicom/China Netcom addresses have all bounced for at least a couple of years now. I've personally tried every possible combination and permutation of every address listed, including the e-mail address that was used to register the PeeringDB account, and none of them work. -- Richard A Steenbergen r...@e-gerbil.net http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
Re: Using /126 for IPv6 router links
On Mon, Jan 25, 2010 at 09:12:49AM +, Andy Davidson wrote: There are 4,294,967,296 /64s in my own /32 allocation. If we only ever use 2000::/3 on the internet, I make that 2,305,843,009,213,693,952 /64s. This is enough to fill over seven Lake Eries. The total amount of ipv6 address space is exponentially larger still - I have just looked at 2000::/3 in these maths. THE IPv6 ADDRESS SPACE IS VERY, VERY, VERY BIG. Don't get carried away with all of that IPv6 is huge math, it quickly deteriorates when you start digging into it. Auto-configuration reduces it from 340282366920938463463374607431768211456 to 18446744073709551616 (that's 0.05% of the original 128 bit space). Now as an end user you might get anything ranging from a /56 to a /64, that's only between 1 - 256 IPs, barely a significant increase at all if you were to actually use a /64 for each routed IP rather than as each routed subnet. As a small network you might get a /48, so that even if you gave out /64s to everyone it would be only 16 bits of space for you (the equivalent of getting a class B back in IPv4 land), something like a 8-16 bit improvement over what a similar sized network would have gotten in IPv4. As a bigger ISP you might get a /32, but it's the same thing, only 16 bits of space when you have to give out /48s. All we've really done is buy ourselves an 8 to 16 bit improvement at every level of allocation space (and a lot of prefix bloat for when we start using more than 2000::/3), which is a FAR cry from the 2^128 omg big number, we can give every molecule an IPv6 address math of the popular imagination. :) -- Richard A Steenbergen r...@e-gerbil.net http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
Re: Using /126 for IPv6 router links
On Mon, Jan 25, 2010 at 09:10:11AM -0500, TJ wrote: While I agree with parts of what you are saying - that using the simple 2^128 math can be misleading, let's be clear on a few things: *) 2^61 is still very, very big. That is the number of IPv6 network segments available within 2000::/3. *) An end-user should get something between a /48 and a /56, _maybe_ as low as a /60 ... hopefully never a /64. Really. **) Let's call the /48s enterprise assignments, and the /56s home assignments ... ? **) And your /56 to /64 is NOT 1-256 IPs, it is 1-256 segments. It is if we are to follow the always use a /64 as a single IP guidelines. Not that I'm encouraging this, I'm just saying this is what we're told to do with the space. I for one have this little protocol called DHCP that does IP assignments along with a bunch of other things that I need anyways, so I'm more than happy to take a single /64 for house as a single lan segment (well, never minding the fact that my house has a /48). **) And, using the expected /48-/56, the numbers are really 256-64k subnets. ... Note: All we've really done is buy ourselves an 8 to 16 bit improvement at every level of allocation space *) And you don't think 8-16 bits _AT EVERY LEVEL_ is a bit deal?? I'm not saying that 8-16 bits isn't an improvement, but it's a far cry from the bazillions of numbers everyone makes IPv6 out to be. By the time you figure in the overhead of autoconfiguration, restrictive initial deployments, and the now that the space is much bigger, we should be reallocating bigger blocks logic at every layer of redistribution, that is what you're left with. So far all we've really done with v6 is created a flashback to the days when every end user could get a /24 just by asking, every enterprise could get a /16 just by asking, and every big network could get a /8 just by asking, just bit shifted a little bit. That's all well and good, but it isn't a bazillion. :) -- Richard A Steenbergen r...@e-gerbil.net http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
Re: Foundry CLI manual?
On Sat, Jan 23, 2010 at 10:51:57AM -0500, David Hubbard wrote: Anyone have the Foundry/Brocade CLI reference PDF they could send me? Brocade feels you should have a support contract to have a list of commands the hardware you purchased offers and I'm having difficulty with a oc12 pos module. Ironically enough the manuals themselves are accessable without a login, but the list of manuals is not. You fail to mention which product you're interested in, so I'm going to take a stab and hope that it's something current with a pos card like an MLX/XMR. If you're still rocking an old B2P622, I'd say you're in need of far more help than any manual can provide. :) http://www.foundrynet.com/services/documentation/xmr_user/current/NetIron_04100_ConfigGuide.pdf http://www.foundrynet.com/services/documentation/xmr_diag/current/NetIronXMR-MLX_04100_DiagnosticRef.pdf -- Richard A Steenbergen r...@e-gerbil.net http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
Re: Enhancing automation with network growth
On Wed, Jan 20, 2010 at 09:54:50PM -0500, Steve Bertrand wrote: Hi all, I'm reaching the point where adding in a new piece of infrastructure hardware, connecting up a new cable, and/or assigning address space to a client is nearly 50% documentation and 50% technical. One thing that would take a major load off would be if my MRTG system could simply update its config/index files for itself, instead of me having to do it on each and every port change. It is really quite trivial to auto-discover ifindex-ifdescr mappings on every poll cycle then track your interfaces by their names, pretty much every modern poller system can manage this. MRTG is absurdly old, slow, and generally nasty, and should not be used by anyone in this day and age. -- Richard A Steenbergen r...@e-gerbil.net http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
Re: dark fiber and sfp distance limitations
On Fri, Jan 01, 2010 at 02:52:33PM -0800, Mike wrote: I am looking at the possibility of leasing a ~70 mile run of fiber. I don't have access to any mid point section for regeneration purposes, and so I am wondering what the chances that a 120km rated SFP would be able to light the path and provide stable connectivity. There are a lot of unknowns including # of splices, condition of the cable, or the actual dispersion index or other properties (until we actually get closer to leasing it). Its spare telco fibers in the same cable binder they are using interoffice transport, but there are regen huts along the way so it works for them but may not for us, and 'finding out' is potentially expensive. How would someone experienced go about determining the feasibillity of this concept and what options might there be? Replies online or off would be appreciated. That shouldn't be too difficult, especially at only 1G (though pesonally I can't imagine why you would bother leasing dark fiber for that :P). There are several ways you could do it, including 120km+ rated SFPs (iirc there have been 200km SFPs out for a while too), an external optical amplifier (ideally you'd want to amp in the middle, but with a single channel you should be fine w/pre-amp), and a digital FEC wrapper to extend the receive sensitivity. Remember that the distance spec on optics is mostly a rough guideline, so depending on the fiber conditions and number of splices/panels along the way you could potentially expect to get the entire distance out of a standard 100km optic. -- Richard A Steenbergen r...@e-gerbil.net http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
Re: UltraDNS Failure?
On Wed, Dec 23, 2009 at 05:38:21PM -0800, Shrdlu wrote: I'm still seeing the DNS servers at udns down, hard. Amazon's cloud will need a reboot when this is over. Dang, what the heck happened to all that anycast stuff? We have some DNS providing type customers (not UltraDNS) receiving a few million packets/sec of UDP/53 DoS traffic, starting at about the same time as the UltraDNS problems. No clue if it's related, but it certainly sounds suspicious. :) -- Richard A Steenbergen r...@e-gerbil.net http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
Re: fight club :) richard bennett vs various nanogers, on paid peering
on the other side for the traffic, thus allowing you to double dip for the same bit and potentially make more money. Of course in practice it doesn't work this way at all. The vast majority of the cost of operating a network is transporting the bits from one place to another, and when you sell paid peering you are guaranteed that the traffic is going to stay on your network and be hauled. This makes it some of the most expensive traffic to deliver, and typically results in prices which are higher than those of another network who is hot potatoing those bits off their network in one location, and who is sending the traffic to a settlement-free peer. There is nothing wrong with paid peering, it often has a time and a place (such as when two networks are close to being settlement-free peers, but not quite, and someone needs to sweeten the deal a little bit), but it is not the panacea you think it is. Of course nobody else seems to think the FCC Question 106 is talking about regulating paid peering (which would be absurd), so fortunately I don't think we have anything to worry about. Of course all of these points (and more) were already quite elegantly expressed by fine folks like Vijay Gill, Dan Golding, Patrick Gilmore, Joe Provo, and others. They tried to help correct your misinformation with free advice, and you repaid them with delusional rants. Now you simply look like a fool to everyone. -- Richard A Steenbergen r...@e-gerbil.net http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
Re: fight club :) richard bennett vs various nanogers, on paid peering
On Wed, Nov 25, 2009 at 02:29:33PM -0800, Richard Bennett wrote: (pardon me if this message is not formatted correctly, T-bird doesn't like this list) I agree that this is not the proper venue for discussion of the politics of Internet regulation; the post I wrote for GigaOm has comments enabled, and many people with an anti-capitalist bone to pick have already availed themselves of that forum to advocate for the people's revolution. There are some technical issues that might be of more interest and relevance to operators, however. So now anyone who points out the massive flaws in your statements are part of an anti-capitalist movement? Any more conspiracy theories you'd like to put forward? I can't speak for anyone else, but personally I consider myself very pro-capitalism and it has absolutely no impact on how I feel about the blatantly wrong and baseless crap you are spewing. * One claim I made in my blog post is that traffic increases on the Internet aren't measured by MINTS very well. MINTS uses data from Meet-me switches, but IX's and colos are pulling x-connects like mad so more and more traffic is passing directly through the x-connects and therefore not being captured by MINTS. Rate of traffic increase is important for regulators as it relates to the cost of running an ISP and the need for traffic shaping. Seems to me that MINTS understates traffic growth, and people are dealing with it by lighting more dark fiber, pulling more fiber, and the x-connects are the tip of the iceberg that says this is going on. This is all completely irrelevent to everything else that has been discussed so far, but what the hell I'll bite. Traffic on the Internet is indeed growing rapidly, while the predominate technology for cost effectively interconnecting the vast majority of the bits (10 Gigabit Ethernet) has remained relatively static in recent years. Without a cost effective technology for interconnecting devices in 10Gbps increments (40Gbps OC-768 has existed for a while, but is far more expensive than simply doing 4x10GbE), the only reasonable way to scale a network is to build your links out of Nx10G bundles. In places with reasonable crossconnect pricing, it is far cheaper to simply order multiple crossconnects than it is to pay for DWDM gear, and thus you see a rapid increase in fiber crossconnects. * A number of people said I have no basis for the claim that paid peering is on the increase, and it's true that the empirical data is slim due to the secretive nature of peering and transit agreements. This claim is based on hearsay and on the observation that Comcast now has a nationwide network and a very open policy regarding peering and paid peering. So if paid peering is only increasing at Comcast, now a top 10 network, it's increasing overall. So in other words, you're admitting that you have absolutely no basis for your claim, and you're simply making it up based on indirect hearsay modified with your own ill-informed conclusions? First intelligent thing you've said so far. If you actually bothered to ask anyone in the industry with experience dealing with Comcast, they would tell you that while Comcast initially entered the market primarily trying to sell paid peering, they have since switched their efforts to primarily selling full transit. There are only a certain number of networks who even know what to DO with a paid peering product, and a vastly larger number who know what to do with a transit product, so it makes perfect sense really. * Some other people said I'm not entitled to have an opinion; so much for democracy and free speech. You are not entitled to opine an opinion on a subject matter which you do not understand, without being called out for it. Sane and rational people understand when they are talking out their ass and are being corrected by knowledgable experts, and will shut the hell up and listen. Sadly this seems to be a skill you lack. I'd be glad to hear from anyone who has data or informed opinions on these subjects, on-list of off-. The reason you should share is that people in Washington and Brussels listen to me, so it's in everybody's interest for me to be well-informed; I don't really have an ax to grind one way or another, but I do want law and regulation to be based on fact, not speculation and ideology. So far none of the above statements seem to be true. -- Richard A Steenbergen r...@e-gerbil.net http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
Re: fight club :) richard bennett vs various nanogers, on paid peering
On Wed, Nov 25, 2009 at 02:29:33PM -0800, Richard Bennett wrote: * One claim I made in my blog post is that traffic increases on the Internet aren't measured by MINTS very well. MINTS uses data from Meet-me switches, but IX's and colos are pulling x-connects like mad so more and more traffic is passing directly through the x-connects and therefore not being captured by MINTS. Rate of traffic increase is important for regulators as it relates to the cost of running an ISP and the need for traffic shaping. Seems to me that MINTS understates traffic growth, and people are dealing with it by lighting more dark fiber, pulling more fiber, and the x-connects are the tip of the iceberg that says this is going on. Oh also I forgot to mention that trying to map a direct relationship between IX traffic growth and total IP traffic growth is completely bogus. There is a significant modifier you're missing, and it's called price. Two years ago the price for an IX port at the large commercial exchange points in the US (which account for the vast majority of the traffic, no offense to the small non-comercial exchanges out there) was between 4-7x higher than the price for the same ports today. The reason for the price drop had nothing to do with changing economics of providing the service, but rather it was because of a wide-spread price war between the two largest IX operators in the US. Such a massive change in the economics for the IP network operators will obviously result in major changes to the amount of traffic delivered over IX fabrics vs private interconnection. Again, something you could have actually asked operators about rather than making up conclusons in your head. -- Richard A Steenbergen r...@e-gerbil.net http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
Re: Juniper M120 Alternatives
On Tue, Nov 17, 2009 at 09:24:24AM -0600, Jack Bates wrote: Richard A Steenbergen wrote: They've definitely been improving it over the years though, so much that I almost never trigger a session reset on me unintentionally any more. They must have. This was new to me and came as a shock. I don't think I've ever seen my m120 behave any different than my cisco when it comes to flapping BGP. Things have just worked as I expected them to. Not that I go screwing with underlying interface configs or changing a peer from one group to another or changing the asn; at least not on a live session. These things would seem to indicate that the session might be subject to reset. Perhaps it just behaves for normal users and not power users. :) But those things won't trigger session resets on Cisco, so it often comes as a shock. Also, one might very well expect that changing the peer-as on a neighbor is going to cause a reset, but if you didn't know from experience you might not expect that renaming a group or changing an underlying interface MTU would do it too. The issue is that there is a fundamental design difference between Cisco and Juniper. Cisco lets you configure anything you want in a line by line basis, but it doesn't immediately apply those changes until you command it to do so. Juniper's philosophy is that you make a bunch of changes to a candiate configuration, commit to apply those changes, and then you can expect those changes to take effect (or at least begin trying to take effect) immediately. Personally I think the Juniper design philosophy is better. Besides the obvious stuff like being able to rollback your config, think about how non-deterministic it is when you update a route-map but forget to soft clear the BGP session. The routes that have been exchanged so far will retain their old policy, while any new updates you receive after the route-map change will receive the new policy, leaving the session in an inconsistent state that will slowly and unpredictably change over time as routing updates come in. The trade-off is that you lose the ability to do non-impacting changes, where you make a change but know that it hasn't actually taken effect yet, and won't until the next time the session bounces. What Juniper is trying to do really is a good thing, I just wish it could tell me before I commit what is and isn't going to flap. :) -- Richard A Steenbergen r...@e-gerbil.net http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
Re: Juniper M120 Alternatives
On Tue, Nov 17, 2009 at 01:28:06AM +0100, Daniel Roesen wrote: PS: and of course JUNOS still undeterministically resetting unrelated BGP sessions for no good reason when modifying BGP configuration - so one is well-advised to do ANY configuration changes in the area of BGP within a maint window as it might happen that you configure a peering session and whoops there goes your IBGP mesh... or all your other peerings, or, ... Well to be fair, the session resetting on config change behavior is actually quite deterministic (being EASY to determine is not part of the requirements, technically speaking :P), and most of the resets really do have perfectly good reasons. I'll certainly go with really annoying and often a giant pain in the @#$%^* though. They've definitely been improving it over the years though, so much that I almost never trigger a session reset on me unintentionally any more. The things to watch out for are: a) any time you change the update replication by moving a neighbor between groups, renaming groups, or significantly changing the export policy chain. b) any time you change a major part of the underlying interface configuration for an eBGP session, such as mtu or vlan tagging config. c) any time you change something about the bgp session which really does require a session reset to take effect, such as a new ASN, new endpoint address, new mbgp family configuration, new md5 password, new tcp mss, etc. You can actually safeguard yourself from a lot of the accidental reset behaviors while implementing other features at the same time by using commit scripts (i.e. as a side-effect of my scripts which exist for other reasons, I automatically protect myself against changes to the policy chain or family configuration which might cause unintended session flaps), though I'll certainly admit this is well into the category of power user and not appropriate for most people. They are making some progress though, you can actually turn NSR on and off now without flapping your sessions, which is certainly an improvement over the serious logic flaws in earlier versions (where you couldn't turn off NSR without flapping every session, but you also couldn't commit w/NSR enabled and the backup RE offline, effectively locking you out of config changes without a total box flap if you didn't have both RE's running). It would certainly be a lot more user friendly if they could tell you what sessions would be reset as part of a commit check process though. -- Richard A Steenbergen r...@e-gerbil.net http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
Re: Resilience - How many BGP providers
On Wed, Nov 11, 2009 at 11:18:20AM -0800, Steve Gibbard wrote: If you have three components, the chances of all three being broken at once are even less than the chances of two of them being broken at once. With four, you're even safer, and so on and so forth. But once you get beyond two, you hit a point of diminishing returns pretty quickly. Not only that, but you have to ask yourself what are the chances that all these extra components will become extra points of failure and actually increase the likelihood of something going wrong. I know a lot of folks who have gotten themselves into a lot of trouble buying transit from everyone they can possibly buy from, thinking it will make their network more reliable. In most cases all it does is make their network more unstable. The more transit paths you have out there, the more likely you are to have something flap and wipe you out w/flap dampening, and the more likely you are to see any single event cause a massive amount of churn. I've seen people with 8 transit providers appear to others on the internet as though they flapped 100+ times over a single session flap, because of all the churn as the network reconverges. More transit providers also means more 95th calculations, and thus a higher bill, but that is another story for another day. :) -- Richard A Steenbergen r...@e-gerbil.net http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
Re: Upstream BGP community support
On Fri, Nov 06, 2009 at 12:04:18AM +0100, Daniel Roesen wrote: On Mon, Nov 02, 2009 at 02:13:38PM -0600, Richard A Steenbergen wrote: Rather than simply double the size and break it up into 32:32, the designers reserved the top 16 bits for type and subtype attributes, leaving you only 48 bits to work with. Clearly the only suitable mapping for support of 32-bit ASNs on the Internet is 32:16, leaving us with exactly the same range of data values that we have today. ... which breaks schemes such as 65123:45678 where 45678 is the neighboring AS to apply the action defined by 65123 to. Seen those multiple times. Of course using anything else then your own ASN in the AS part of TE communities is certainly debatable. Definitely a problem. The point of using 65123:45678 in the first place (with a private ASN field in the AS part) is to avoid stepping on anyone else's ASN with your internal use community. Clearly we won't be able to continue implementing this pattern AND fully support 32 bit ASNs, so the type field is going to have to come to the rescue here. Fortunately there is a transitive bit on the extended community type that could be used to signal a behavior to your upstream network without allowing that community to leak any further, so in theory one could use something like that to do a localtarget:45678:actiondata tag without poluting the namespace. Uou would lose the ability to send a community to your upstream's upstream, but that is probably of questionable legitimacy anyhow. But the way I read RFC5668 and the IANA extended community registry it doesn't look like there is an explicit definition of a non-transitive target type, and the way I read RFC4360 it doesn't look like the non-transitive value gets automatically reserved. So I guess someone will need to request 0x4002 and 0x4202 non-transitive target types for this purpose. Unless someone has a better idea of how to handle the problem stated above? -- Richard A Steenbergen r...@e-gerbil.net http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
Re: Upstream BGP community support
On Mon, Nov 02, 2009 at 05:19:32AM -0500, Randy Bush wrote: i try to use as few tricks, knobs, and clever things as possible and still get my job done. i try to be extremely conscious of, and minimal, when what i am doing effects or is visible to my neighbors and/or the global net. i try to complicate the internals of my network as little as possible, after all, complexity == opex and i value my time, it is a non-renewable resource. i prefer to be seen as an old and lazy minimalist, not a clever person. clever was a pejorative where/when i was brought up. Translation: randybush You damn kids! Get off my lawn! But seriously now, the reason we have these squishy things taking up space between our ears in the first place is so we can come up with new ideas and better ways to solve our problems. Obviously you can take it too far, I'm sure we've all seen examples of absurdly over-complicated designs which existed only for the edification of someone's ego, but I have never seen a intelligent and well thought out BGP community system do anything other than make everyone's life easier. I don't know who these people are who you claim are busy breaking things with communities, but I've never seen them. Being clever is a good thing when it helps you find new ways to do more with less, and those who can't innovate in this industry are dooming themselves to obsolescence. -- Richard A Steenbergen r...@e-gerbil.net http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
Re: Upstream BGP community support
On Mon, Nov 02, 2009 at 01:38:00PM -0600, Jack Bates wrote: Communities (except the standardized well known ones) are extremely diverse. For those that support even more granular traffic engineering by limiting which of their peers your routes might be transiting, I believe there are 2 distinct methods of using communities. Even the standardized ones aren't guaranteed to be useful. For example RFC3765 defines a NOPEER community, i.e. a standardized way to say do not export this route to peers (in the settlement free bilateral sense of the word). But there is no possible way for the router to know what is or isn't a peer, so it's up to the operator to implement the matching for this supposedly standard community. But guess what, most people don't, and those that do implement the functionality end up writing their own network specific version anyways (either because they want to keep it secret, or because they need to implement far more powerful version anyways). If we want to turn this into a discussion on useful things to do with communities (to try and recover some value from this otherwise brain rotting thread), how about this one. We as network operators are currently facing a problem with BGP communities, and that problem is called 32 bit ASNs. Quite simply, 32 bit ASNs don't fit into the existing 16:16 scheme. There are new 64 bit communities (extended communities) out there, but they aren't a 1:1 mapping of the way communities work today. Rather than simply double the size and break it up into 32:32, the designers reserved the top 16 bits for type and subtype attributes, leaving you only 48 bits to work with. Clearly the only suitable mapping for support of 32-bit ASNs on the Internet is 32:16, leaving us with exactly the same range of data values that we have today. So why do I bring this up? Because of those top 16 bits for type and subtype. Two of the type fields that are newly introduced in extended communities are target and origin, which specifically mean this tag is trying to tell $ASN something, vs this $ASN is trying to tell you something. This actually has the benefit of addressing one of the most common problems with communities today, namespace collision between folks trying to both send instructions and receive data within the same ASN:x tag. Since we're all going to need to start updating our routing policies to support 32 bit ASNs soon anyways (unless you want to tell people getting them that they aren't allowed to use communities :P), now is a good time to start thinking about taking advantage of these new features to resolve age-old problems in your new community design. Another feature I think would be beneficial for router vendors to consider implementing is a way to map between regular and extended communities. For example, I might be able to apply a policy at the edge of my network which imports regular communities from my neighbor, and turns them into origin: tags of extended communities. I might then be able to update my internal network to work on extended communities, and translate them back again to regular for backwards compatibility at the edge. Also, now is a good time to find out if your router vendor ACTUALLY supports extended communities in all of their features (for example, regexp support), or if they only exist for l3vpn support and are not actually prepared to use them to work with 32-bit ASNs. Hint: Some vendors still fall into this category last I looked. Apologies if this post contained too much clever and made Randy's head explode. -- Richard A Steenbergen r...@e-gerbil.net http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)