from:"Richard A Steenbergen"

Re: Evaluating Tier 1 Internet providers

2013-08-29 Thread Richard A Steenbergen

On Wed, Aug 28, 2013 at 09:54:28AM -0700, Michael Smith wrote:
 
 It's really can reach versus how well can they reach.  I can't any 
 provider that would have less than a full view of the DFZ but, if your 
 primary traffic is to Provider X, and one of your Tier 1's peers 
 locally and the other peers in France, then you would look more 
 closely at the closer one.  Unless, of course, that local peer was 
 saturated 99% of the time.  Then France might be attractive.

One thing to keep in mind is that for major Tier 1s, it's not at all 
uncommon to see some very large percentages of traffic (like say well 
north of 50%) stay completely on-net, going from customer to customer. 
In this type of model, capacity to other third party peers (typically 
the other Tier 1's) becomes secondary to other considerations like 
backbone capacity, which is why those huge Tier 1 networks often have 
much less peering capacity than you might otherwise expect. 

Tier 2's on the other hand, typically spend the vast majority of their 
time/money/effort figuring out how they can deliver traffic to other 
networks via peering and transit relationships. This usually means they 
have much smaller amounts of backbone capacity, but relative to their 
total sizes they often have a lot more capacity to the other major 
peering/transit networks.

The economics of each model are vastly different too. Tier 2's are 
typically always looking to take advantage of tricks like hot potato 
routing and 95th percentile billing to get free inbound to minimize 
their backhaul costs. All too often people tend to get caught in the 
mentral trap of thinking peering == free, but in reality the Tier 1's 
are just shifting the majority of their operational costs into backbone 
instead, and peering becomes the way to handle the leftovers. Each 
model has its advantages and weaknesses, but most people who haven't 
lived in both worlds tend to vastly underestimate the realities of the 
other side's cost models.

There is a lot to be said for the value of a Tier 2 network. Sometimes 
throwing a token amount of money at a problem solves it much more 
effectively than waiting for two squabbling Tier 1's to fight over the 
principal of not paying anything or risking being perceived as weak. 
And a Tier 2 with multiple transit paths and extensive peering options 
may be able to easily reroute traffic around a particular problem spot 
in a way that a Tier 1 just doesn't have the ability to do. Then again, 
sometimes there is value in just buying transit from someone who 
operates a massive entwork, with the economy of scale necessary to 
implement terabits of backbone capacity for cheap, and a huge customer 
base.

As for the which one should I buy question, a smart person would 
realize the different strengths and weaknesses of each model, and 
probably end up buying from (at least) one of each to take advantage of 
this. Of course in reality 99% of people fail to understand any of this, 
and turn off their brains after thinking things like 1  2 so it must 
be better. :)

-- 
Richard A Steenbergen r...@e-gerbil.net   http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)

Re: Evaluating Tier 1 Internet providers

2013-08-29 Thread Richard A Steenbergen

On Thu, Aug 29, 2013 at 08:25:41PM -0700, Luke S. Crawford wrote:
 
 I have no idea how to solve this sort of problem automatically. 
 Ideally, if someone has a congested or down link, I'd prefer that they 
 not announce routes to that part of the internet, as I do have a 
 backup, but that isn't how it works.

BGP best path routing decisions are made by completely irrelevent 
criteria like AS-PATH lengths and lower router-id's, and are completely 
blind to things that actualy matter like latency, capacity, packet loss, 
etc. Fundamentally it's impossible to fix automatically with the current 
routing protocols, and at best the protocol extensions like BGP AIGP 
(which could help at least convey some of the data, like the oh crap I 
just got rerouted to a different exit with much higher latency 
situation you mentioned) are still a long way from being practically 
usable. At best you can aim your default/tie breaks towards networks you 
have more faith in, but that doesn't mean much in practice. :) 

-- 
Richard A Steenbergen r...@e-gerbil.net   http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)

Re: nLayer IP transit

2013-08-02 Thread Richard A Steenbergen

On Fri, Aug 02, 2013 at 07:11:34AM +1000, Mark Tees wrote:
 Thanks for the replies.
 
 I think I saw somewhere around the Cloudflare outage post someone 
 mentioning that since the person at Juniper that was responsible for 
 Flowspec left it all went down hill.
 
 I take it then Flowspec is still used internally then? I am still 
 wondering if its best to avoid Flowspec and roll your own firewall 
 rules applied via Netconf for transit interfaces to achieve the same 
 sort of functionality.

It's a lot less likely to go south if you control the routes that go 
into the system. That said, it still breaks some things just by having 
it enabled (like NSR, though I suppose one could argue that NSR breaks 
itself :P), so you might be better served with a netconf distribution of 
rules if you want to avoid those potential issues.

-- 
Richard A Steenbergen r...@e-gerbil.net   http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)

Re: nLayer IP transit

2013-08-01 Thread Richard A Steenbergen

On Thu, Aug 01, 2013 at 10:00:49AM +1000, Mark Tees wrote:
 Howdy listers,
 
 I remember reading a while back that customers of nLayer IP transit 
 services could send in Flowspec rules to nLayer. Anyone know if that 
 is true/current?

We were forced to stop offering flowspec connections to customers, after 
we started experiencing a rash of issues with it. Among other things, we 
found ways for flowspec generated rules to easily cause non line-rate 
performance on Juniper MX boxes, and we had a couple of incidents where 
customer generated routes were able to cause cascading failure behaviors 
like crashing the firewall compiler processes across the entire network.

I previously mentioned some of this here:

http://mailman.nanog.org/pipermail/nanog/2011-January/030051.html

There have also been a few other high profile outages caused by bugs in 
the Juniper implementation, for example:

https://support.cloudflare.com/entries/23294588-CloudFlare-Post-Mortem-from-Outage-on-March-3-2013

As a concept I still very much like Flowspec, and wish we could continue 
to offer it to customers, but as with any new routing protocol there 
are significant risks of network-wide impact if the implementation is 
not stable.

IMHO Juniper has done a horrible job of maintaining support for Flowspec 
in recent years, and has effectively abandoned doing the proper testing 
and support necessary to run it in production. Until that changes, or 
until some other major router vendors pick it up and do better with it, 
I don't expect to see any major changes in this position any time soon.

-- 
Richard A Steenbergen r...@e-gerbil.net   http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)

Re: GTT/Inteliquent/nLayer

2013-07-31 Thread Richard A Steenbergen

On Wed, Jul 31, 2013 at 09:28:50AM -0400, Tim Durack wrote:
 Any experience/comments on the GTT Global eXpress service? Looks
 interesting but odd. Why would I use a virtual IXP? Who participates?
 
 Comments on-list or off-list are fine.

This was an old PacketExchange service, essentially just a single large 
VPLS-based global layer 2 virtual IXP service, which combined long-haul 
transport and multi-party interconnection. It's somewhat interesting as 
a concept (since I'm not aware of anyone else offering anything 
similar), but IMHO not the most practical thing in the world, which is 
why it hasn't really been promoted as a new product in many years. If 
you've heard differently, please contact me off-list. :)

-- 
Richard A Steenbergen r...@e-gerbil.net   http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)

Re: Issues with level3?

2013-01-15 Thread Richard A Steenbergen

On Tue, Jan 15, 2013 at 04:12:12PM +, Network Operations wrote:
 Anyone seeing any issues with level3?  We can connect to every other 
 IP in our Class C.  When tracerouting to individual IP's, 
 (x.x.x.50/51/52/53) we get a drop at 
 ge-4-16.car2.Washington1.Level3.net [4.59.146.53] for 50, but 51 is 
 fine, drop for 52, 53 is fine.

Sounds like a classic problem with a member of a bundle (like a link-agg 
or ECMP) breaking. Level3 tends not to do anything in bundles of 2, so 
you might want to look elsewhere, like with your own connections to 
them, possibly on the reverse path. Now, please go find a blunt object 
and hit yourself in the head as punishment for using the words Class C 
in 2013 in a non-historic or ironic context. Hard. :)

-- 
Richard A Steenbergen r...@e-gerbil.net   http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)

Re: [j-nsp] Krt queue issues

2013-01-08 Thread Richard A Steenbergen

On Tue, Jan 08, 2013 at 03:45:10PM +0100, Tim Vollebregt wrote:
 Hi,
 
 What we do nowadays as some workaround, is configuring a default route 
 towards a core router on 8 x 10G before maintaining an MX box. Which 
 will be installed before BGP sessions come up, this will cause some 
 packet loss during burst hour outages but is fine during maintenance 
 hours.
 
 I've seen cases where it took up to 30 minutes before the full table 
 was installed correctly in the PFE's.
 
 Currently this issue/bug is holding back our Juniper deployments. As 
 far as I know Juniper created a project group for this bug, and so far 
 they were able to reproduce the issue. Looks like the issue is being 
 taken serious from now.

PR 836197

I actually have very good luck reproducing it:

http://cluepon.net/ras/rpdstall.png

The issue appears to be that when rpd is busy processing incoming BGP 
updates (such as when you turn up a large number of peers 
simultaniously), it starves the rest of the process from actually 
spending any CPU time handling/installing the route. The graph above 
shows a plot of the total BGP paths, the number of routes in the 
pending state, and the number of routes actually installed into the 
forwarding hardware. This is a very simplified example (nothing but IBGP 
sessions with very simple policies here, not even any EBGP neighbors), 
using the latest top of the line routing engine, so in real life the 
issue is much worse.

As you can see, while rpd is still busy receiving and processing the 
incoming updates, the number of pending routes rises and doesn't fall, 
and the number of routes installed in the PFE stays almost non-existant. 
A few routes actually manage to squeek in before all of the BGP sessions 
come up, which is why it has any at all for the period between 0 and 330 
seconds. After the router finishes receiving the BGP paths, the pending 
routes clear very quickly, and then the FIB installation process begins. 
8 minutes after turning up the BGP sessions, this router finally has a 
full table installed in hardware. The pending routes actually clear much 
quicker than this once the BGP routes stop coming int, I need to update 
this graph with a higher resolution to show it. :)

Juniper actually DOES have a fix for this issue, tweaking the scheduler 
in rpd so that the router still processes BGP routes even when it's 
spending a lot of time receiving new routes. Unfortunately they haven't 
yet decided to prioritize implementing this fix, so it's still stuck in 
development. If this issue drives you as insane as it does me, I highly 
encourage you to talk to your account team about PR 836197 and why 8-20+ 
minutes to install routes to the FIB is not acceptable to you.

-- 
Richard A Steenbergen r...@e-gerbil.net   http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)

Re: [j-nsp] Krt queue issues

2013-01-08 Thread Richard A Steenbergen

On Tue, Jan 08, 2013 at 11:10:16PM +0100, bas wrote:
 Hi,
 
 On Tue, Jan 8, 2013 at 10:20 PM, Richard A Steenbergen r...@e-gerbil.net 
 wrote:
  PR 836197
 
 That looks like a spanking new PR number to me.
 The highest PR number I found in 12.2 release notes was 82.
 Rather strange that they didn't have an earlier PR number, while the
 issue has existed for such a long time.

Oh I have a pile of PR's about a mile long, including some that I opened 
on this issue 5+ years ago. But I'm not going to harp on the complete 
absurdity of how long it has taken to finally figure this thing out, or 
the number of people who have seen this issue while they've claimed all 
along that nobody else sees it. I'm just going to focus on fixing it. 
This is the PR that they've chosen for implementing the actual fix, so 
that's what I'm going with for the sake of simplicity. :)

 I can't read PR836197 online as it is not public.
 Can you post it without liability?
 If you would be liable do not post it.. Also do _not_ email me off
 list with the PR description...

Neither can I, but the basic description of the issue is what I said 
before. :)

-- 
Richard A Steenbergen r...@e-gerbil.net   http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)

Re: Semi-automated L3 interface DNS records

2012-10-18 Thread Richard A Steenbergen

On Thu, Oct 18, 2012 at 12:57:16PM -0700, Pedersen, Sean wrote:
 Does anyone out there have any experience with a script, tool or appliance 
 that would help manage the creation and maintenance of DNS records for 
 Layer 3 interfaces on routers and switches?

http://cluepon.net/ras/generate_dnsptr_generic_php

A relatively simple example using php, with the net-snmp module and Net_IPv4 
from PEAR. For extra bonus points, it parses your BGP state and uses any 
neighbor ASNs it finds for the remote side of your /30 or /31s, and it 
resolves point-to-point SVIs to physical ports by checking against the vlan 
tables. The later part was only tested on Cisco 6500s, and I haven't touched 
that code (or those boxes) in many many years, so no guarantees about using 
it on anything else. :)

Out of date DNS PTRs in traceroute make baby jesus cry, so please use 
copiously.

-- 
Richard A Steenbergen r...@e-gerbil.net   http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)

Re: Real world sflow vs netflow?

2012-09-24 Thread Richard A Steenbergen

On Mon, Sep 24, 2012 at 11:52:28AM -0700, Peter Phaal wrote:
 On Mon, Sep 24, 2012 at 11:19 AM, Joe Loiacono jloia...@csc.com wrote:
  OK, Well I guess I was thinking sFlow was primarily a switch oriented
  technology versus on a layer-3 peering router.
 
 The sFlow technology is a good fit for any device that performs a
 packet forwarding function (including routers) and the sFlow.org web
 site maintains a list of switches and routers that implement the
 technology,

Minus a whole pile of babble from people who don't actually know what a 
router vs layer 3 switch is...The difference at this point is mostly that 
NetFlow has provisions to allow exporting all data about an ENTIRE flow, 
whereas sFlow is designed to only take statistical samples for overall 
traffic analysis. Tracking an entire flow is much harder, it requires 
keeping state on the router, so if you only care about overall traffic 
analysis sampling is just fine.

Originally sFlow introduced features like raw packet export (including 
layer 2 headers), and extensible formatting, which NetFlow later copied 
with v9 and v10/IPFIX. At this point they're mostly on the same footing 
technically, though sFlow does have a counter export feature which is 
essentially a push version of polling SNMP IF-MIB counters. Only Cisco 
and Juniper are still trying to push NetFlow though, sFlow has been 
adopted by nearly ehter other vendor at this point. Even some Juniper 
products, like EX (which is really Marvell ASICs with a JUNOS wrapper), 
support sFlow only.

-- 
Richard A Steenbergen r...@e-gerbil.net   http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)

Re: HE.net BGP origin attribute rewriting

2012-06-01 Thread Richard A Steenbergen

On Fri, Jun 01, 2012 at 08:03:50PM +0200, Daniel Suchy wrote:
 
 By overwriting origin field, there's no warranty that someone improves 
 performance at all - it's just imagination. In extreme cases, 
 performance can be degraded when someone in the middle plays with 
 origin field and doesn't know reasons, why originating network uses 
 something else than IGP origin. In RFC 2119 words, full implications 
 were not understanded - when this overwriting is done generally.

Uh, what part of to prevent remote networks from improperly forcing a 
cold potato routing behavior on you sounds imaginary?

 Also, there must be some historical reason, why origin should not be 
 rewritten (this changed in January 2006). For internal reasons within 
 the network operator still haves enough knobs to enforce own policy 
 (by setting localpref, med on his network).

Not really, no. Not every RFC is 100% correct, and they're often written 
by people who are not operators (because operators are too busy running 
real networks :P). Besides, SHOULD NOT means you probably don't want 
to do this, unless you have a really good reason, and enforcing such an 
important point in a peering policy is a pretty good reason.

You also clearly don't understand the practical use of localpref. When 
you're trying to implement a simple and relatively common policy like 
closest exit routing to a peer with multiple exits, you set the 
localprefs the same (localpref is usually used to determine WHICH peer 
you'll be sending to), you set the MEDs the same (if you don't want to 
artifically select which EXIT to use), AS-PATH lengths are obviously the 
same if you have multiple exits, and then the next one down is origin 
code. If you can't reset origin code, you run the risk of a remote 
network being able to force your network to do something you probably 
don't want to do (or at least probably wouldn't want to do, if you had 
any idea what you were doing :P).

Please see the previous commentary from Joe Provo, Saku Ytti, etc, they 
are quite correct.

-- 
Richard A Steenbergen r...@e-gerbil.net   http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)

Re: HE.net BGP origin attribute rewriting

2012-05-31 Thread Richard A Steenbergen

On Thu, May 31, 2012 at 12:21:12PM -0400, Keegan Holley wrote:
 The internet by definition is a network of network so no one entity 
 can keep traffic segregated to their network.  Modifying someone else 
 routing advertisements without their consent is just as bad as 
 filtering them in my opinion.  Doing so to move traffic into your AS 
 in order to gain an advantage in peering arrangements and make more 
 money off of the end user is just dastardly.

There was one particularly (in)famous network *coughpeer1cough* which 
was well known for selectively rewriting the origin codes towards their 
peers a few years back. For example, if traffic was going to New York, 
they would advertise the prefix with IGP in New York, and Incomplete 
everywhere else, forcing other networks to haul the traffic to New York. 
This is a violation of most peering agreements, which require consistent 
advertisements unless otherwise agreed, but it was just sneaky enough 
that it flew under the radar of most folks for quite a while. When it 
was finally noticed and they refused to stop doing it when asked, a few 
folks just depeered them, but a bunch of others just solved the 
problem by rewriting the origin codes. This is why you still see a lot 
of rewriting happening today by default, to avoid a repeat of the same 
issue.

Personally I was of the opinion that the correct solution to this 
particular problem was just to terminate the peering relationship, but 
honestly Origin code is a pretty useless attribute in the modern 
Internet, and it exists today only because it's impossible to take it 
out of the protocol. I don't see anyone complaining when we rewrite 
someone else's MEDs, sometimes as a trick to move traffic onto your 
network (*), or even that big of a complaint when we remove another 
networks' communities, so I don't see why anyone cares about this one.

Maybe a better fix would be a local knob to ignore Origin code in the 
best path decision without having to modify it. Start asking your 
vendors for it now, maybe it'll show up around 2017... :)

(*) I've seen a lot of inexperienced BGP speaking customers be very 
upset that they can't send any traffic using natural bgp (yes, there 
appears to be some kind of delusion running around that modifying BGP 
attributes to influence path selection is bad... What's next, organic 
routes, not from concentrate? :P), which in the end turned out to be us 
sending the customer MEDs based on our IGP cost, other networks sending 
them MEDs of 0, and them not knowing enough to do something useful with 
the data or else rewrite it to 0.

-- 
Richard A Steenbergen r...@e-gerbil.net   http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)

Re: Did Internap lose all clue?

2011-10-20 Thread Richard A Steenbergen

On Thu, Oct 20, 2011 at 10:48:34PM +0200, bas wrote:
 Recently I was contacted by an Internap sales person.
 The third line of the email read:
 
 As you know well, BGP makes all routing decisions simply based on HOP COUNT
 
 I blinked my eyes a couple of times.. Yes it really said hop count.
 Then I replied to the guy that if he tries to sell a technical product
 to technical people he should get his info straight.

Errr, I think they mean AS hops, which is actually mostly correct. After 
you eliminate things that don't actually convey any information (like 
localpref, which you have to configure yourself), and things that don't 
provide any meaningful data in a multi-network path selection role (like 
MEDs), AS-PATH length is pretty much the only useful basis you have for 
picking a best path from your BGP peers. All other marketing crap 
asside, they aren't incorrect in pointing out that BGP really sucks as a 
way to pick a best path. :)

-- 
Richard A Steenbergen r...@e-gerbil.net   http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)

Re: 4.0.0.0/8?

2011-09-20 Thread Richard A Steenbergen

On Tue, Sep 20, 2011 at 08:13:09PM +0300, Hank Nussbacher wrote:
 Did Level3 withdraw 4.0.0.0/8 today and start announcing it as two /9s?

Level3 has been announcing 2x /9's as well as the /8 for some time now, 
ever since Telefonica's unfortunate incident where they allowed a 
customer to hijack 12.0.0.0/8 because they don't prefix-list filter 
customers properly IIRC.

-- 
Richard A Steenbergen r...@e-gerbil.net   http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)

Re: Cogent HE

2011-06-09 Thread Richard A Steenbergen

On Thu, Jun 09, 2011 at 12:55:44AM -0700, Owen DeLong wrote:
 
 Respectfully, RAS, I disagree. I think there's a big difference 
 between being utterly unwilling to resolve the situation by peering 
 and merely refusing to purchase transit to a network that appears to 
 offer little or no value to the purchaser or their customers.

Owen, can you please name me one single instance in the history of the 
Internet where a peering dispute which lead to network partitioning did 
NOT involve one side saying hey, we're willing to peer and the other 
side saying no thanks? Being the one who wants to peer means 
absolutely NOTHING here, the real question is which side is causing the 
partitioning, and in this case the answer is very clearly HE.

HE wants to peer with Cogent, Cogent doesn't want to peer with HE, and 
thus you have an impass and there will be no peering. HE has no problem 
using transit to reach Cogent for IPv4 (I see HE reaching Cogent via 
1299/Telia, and Cogent reaching HE via 3549/Global Crossing, both very 
clearly HE transit providers and Cogent peers), but HE has chosen not to 
use transit for the IPv6 traffic. Quite simply, HE feels that they are 
entitled to peer with Cogent for the IPv6 traffic, and has deliberately 
chosen to create this partition to try and force the issue. These are 
*PRECISELY* the same motivations and actions as EVERY OTHER NETWORK who 
has ever created a network partition in pursuit of peering that the 
other party doesn't want to give them, period.

Again, this isn't necessarily a bad thing if HE thinks it can work to 
their long term advantage, but to try and claim that this is anything 
else is completely disingenuous. I understand that you have a PR 
position to take, and you may even have done a good job convincing the 
weak minded who don't understand how peering works that HE is the 
victim, but please don't try to feed a load of bullshit to the rest of 
us. :)

-- 
Richard A Steenbergen r...@e-gerbil.net   http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)

Re: Cogent HE

2011-06-09 Thread Richard A Steenbergen

On Thu, Jun 09, 2011 at 07:06:29PM -0400, Brian Dickson wrote:
 
 So, long history short, there were in fact peering disputes that had 
 one side saying, hey, we want to peer and the other side saying you 
 don't have enough traffic, or your ratio is too imbalanced, or 
 you're my customer - tough!. And some of those got resolved by the 
 ratios changing, or the traffic levels reaching sufficiently high. (I 
 can historically mention AS 6453.)

How is that different from what I said? One side wants to peer, the 
other side says no thanks. A list of reasons is nice, especially if 
they will actually grant peering after you meet those requirements 
(instead of just changing their requirements to deny you again :P), but 
immaterial to the point. In EVERY peering dispute there is one side who 
wants to peer, but that doesn't make this side any more noble or right, 
especially if they don't meet the requirements and are simply trying to 
force the peering through intentionally creating a partition then 
playing the propaganda game to blame the other side for it.

Everyone complained when Cogent did it to others, why should it be any 
different when HE does it to Cogent? I'm sorry but I don't accept 
because Cogent is giving away free IPv6 transit right now as a valid 
reason, especially when it very clearly advances their goals of 
artificially inflating their customer base specifically so they CAN 
engage in these peering disputes. It's a perfectly valid tactic that has 
been used by the finest networks for years, but at least have the 
decency to admit it for what it is, that's all I'm saying. :)

-- 
Richard A Steenbergen r...@e-gerbil.net   http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)

Re: Cogent HE

2011-06-09 Thread Richard A Steenbergen

On Thu, Jun 09, 2011 at 06:26:01PM -0500, Jimmy Hess wrote:
 Er, Sorry... you are kind of siding with Cogent and claiming HE 
 responsible without any logically sound argument explicitly stated 
 that supports that position...

You're confused, read again. :)

 I would consider them both responsible for the partition, with Cogent 
 slightly more complicit, in that Cogent's expectation of selling HE 
 transit is slightly less reasonable than HE's expectation of Cogent 
 peering with HE.

Cogent is (unfortunately, note I have no particular love for Cogent 
here) a transit free network, who peers with every other Tier 1. HE is a 
perfectly fine network, but they are not even CLOSE to a transit free 
network. HE buys transit from multiple other networks, including 
3549/Global Crossing and 1299/Telia (both easily visible in the routing 
table), which they use to reach Cogent for IPv4.

There is absolutely NO requirement that there be a direct 
interconnection between HE and Cogent. None, period, and if you think 
otherwise you are vastly confused about routing on the Internet. Let me 
say this again, there is NO requirement that HE buy transit from Cogent, 
but there is a requirement that HE buy transit from *SOMEONE* if they 
are not a transit free network.

HE has deliberately chosen NOT to use transit for their IPv6 routes, in 
order to force people like Cogent to peer with them so they can become 
an IPv6 Tier 1, and thus you have a partition. These are the same 
tactics and strategies used by every other network in pursuit of 
becoming a Tier 1, including Cogent, and everyone complained their ass 
off when Cogent caused partitioning several times during THEIR peering 
disputes on the road to their current transit free status. If your 
answer is I like HE better than Cogent so I'm willing to overlook it, 
that's fine, but you're just making things up if you're trying to claim 
that they AREN'T causing this partition. 

-- 
Richard A Steenbergen r...@e-gerbil.net   http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)

Re: Cogent HE

2011-06-08 Thread Richard A Steenbergen

On Wed, Jun 08, 2011 at 06:39:02PM -0400, Patrick W. Gilmore wrote:
 
 Yes, both refuse to buy transit, yes.  But HE is able, willing, and 
 even begging to peer; Cogent is not.  These are not the same thing.

I'm ready, willing, and lets say for the purposes of this discussion 
begging to peer with every Tier 1, but some of them aren't willing to 
peer with me. Does that mean I should stop buying transit and blame them 
for my resulting lack of global reachability? If I could convince my 
customers to accept that line of bullshit it would certainly reduce my 
transit costs, but I have a sneaking suspicion they wouldn't. :)

Ultimately it is the responsibility of everyone who connects to the 
Internet to make sure they are, you know, actually connected to the 
Internet. Choosing not to do so and then throwing up your hands and 
saying oh I can't help it, they won't peer with me is not a valid 
excuse, at least not in my book or the book of anyone who pays me money 
to deliver their packets. And this isn't even a case of not being ABLE 
to buy sufficient capacity via a transit path (ala Comcast), this is 
just two networks who have mutually decided two remain partitioned from 
each other in the pursuit of long term strategic advantage. Ultimately 
both parties share responsibility for this issue, and you can't escape 
that just because you have a tube of icing and some spare time. :)

 These are not the only two networks on the v6 Internet who are 
 bifurcated.  There are some in Europe I know of (e.g. Telecom Italia 
 refuses to buy v6 transit and refuses to peer with some networks), and 
 probably others.  The v6 'Net is _not_ ready for prime time, and won't 
 be until there is a financial incentive to stop the stupidity  ego 
 stroking.
 
 The Internet is a business.  Vote with your wallet.  I prefer to buy 
 from people who do things that are in MY best interest.  Giving money 
 to Cogent will not put pressure on them peer with HE  Google  
 everyone else - just the opposite.

Absolutely. This is just like any other IPv4 peering dispute, the only 
difference is IPv6 is so unimportant in the grand scheme of the Internet 
that there hasn't been enough external pressure from customers on either 
side to force a settlement. Shockingly, HE manages to buy plenty of IPv4 
transit to reach Cogent and many other networks, no doubt because they 
wouldn't have any (paying) customers if they didn't. :)

 On the flip side, HE is an open peer, even to their own customers, and 
 _gives away_ free v6 transit.  Taking their free transit  complaining 
 that they do not buy capacity to Cogent seems more than silly.  Plus, 
 they are doing that I think is in my best interest as a customer - 
 open peering.  Trying to make them the bad guy here seems counter 
 intuitive.

I know you're not naive enough to think that HE is giving away free IPv6 
transit purely out of the kindness of their heart. They're doing it to 
bulk up their IPv6 customer base, so they can compete with larger 
networks like Cogent, and make a play for Tier 1-dom in exactly the same 
way that Cogent has done with IPv4. And more power to them for it, it 
may well be a smart long term strategic move on their part, but with 
every wannabe Tier 1 network comes partitioning and peering disputes, as 
they try to trade short term customer pain for long term advantages.

Sorry to all the HE guys, but trying to simultaniously complain about 
your treatment at the hands of other networks and their peering disputes 
while emulating their actions is bullshit and you know it. :)

-- 
Richard A Steenbergen r...@e-gerbil.net   http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)

Re: Downstream Usage-BGP Communites

2011-05-10 Thread Richard A Steenbergen

On Tue, May 10, 2011 at 05:52:39PM -0400, Nick Olsen wrote:
 Greetings NANOG,
 Was hoping to gain some insight into common practice with using BGP 
 Communities downstream.
 
 For instance:
 We peer with AS100 (example)
 AS100 peers with TW Telecom (AS4323).
 Since I happen to know that AS100 doesn't sanitize the communities I send 
 with my routes. I can take advantage of TW Telecom's BGP communities for 
 traffic engineering. Such as 4323:666 (Keep in TWTC Backbone). Would this 
 be something that is generally frowned upon? Still under the assumption 
 that the communities aren't scrubbed off my routes. Could I do this with 
 other AS's beyond TW Telecom? Such as TW's peering with Global Crossing 
 (AS3549)?

Well first off, if you're using the words peers with in the normal 
sense, your routes would never propagate to AS4323 in the first place. 
Assuming what you actually mean is that at least one of those sessions 
is a transit feed, essentially all (non-stupid) networks will filter 
their own TE communities from their transits/peers, so the odds of this 
working are almost non-existant.

You also have about a 50/50 shot of AS100 stripping your communities 
before they even make it to AS4323 (or any other network). Personally my 
belief is that this is a bad thing, and you should only filter 
communities in your own name-space (i.e. $YOURASN:*), but this doesn't 
stop a large number of obnoxious networks from doing it anyways. :)

-- 
Richard A Steenbergen r...@e-gerbil.net   http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)

Re: Downstream Usage-BGP Communites

2011-05-10 Thread Richard A Steenbergen

On Tue, May 10, 2011 at 06:47:11PM -0400, Nick Olsen wrote:
 Ah, Sorry for the confusion. 
 We have a mutual agreement with AS100 (call it transit or peering) we send 
 them full routes, They send us full routes.
 AS100 is a transit customer of AS4323.
 I understand I would be at the mercy of how people have things setup. I do 
 know for a fact I'm not filtered by AS100 as I've already tested it.
 Thanks to everyone for the info so far.

Erm ok, well as long as you're a transit customer of AS100 (for some 
definition of transit customer), and they're a transit customer of 
AS4323, you should have no problems. This is completely different from 
peering, when money changes hands communities get listened to. :)

-- 
Richard A Steenbergen r...@e-gerbil.net   http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)

Re: Anyone still maintaining altdb.net?

2011-04-20 Thread Richard A Steenbergen

On Wed, Apr 20, 2011 at 10:30:44AM -0400, Jon Lewis wrote:
 On Wed, 20 Apr 2011, Bret Palsson wrote:
 
  I submitted my objects April 11. the mtrner object needs to be 
  created by the db-admin. I realize this is a volunteer thing. Could 
  I help out or could the people that are helping out look at adding 
  my record? I need to setup some peering relationships. I'd prefer to 
  support open communities rather than paying and am willing to help 
  out if need be.
 
 If you're just getting started, it might make sense to look at another db. 
 IIRC, RIPE's routing registry is free to use, supports md5crypt and 
 PGP/GPG auth, and isn't a volunteer one-man show.

One of the premises of AltDB is that no support is provided. For 
example, a lot of people send email asking how do I use this, and the 
unfortunate answer has to be sorry we can't help you. If you need 
support, then by all means pay the money to someone like RADB and let 
them help walk you through the process. Of course after the initial 
mntner creation everything is pretty much automated anyways, so if you 
know what you're doing AltDB provides a free method to maintain your IRR 
entries with very little sacrificied over a commercial solution.

There is infact more than 1 person volunteering for AltDB, but from what 
I can see of this April 11th email, it falls into the please provide 
support category. :)

-- 
Richard A Steenbergen r...@e-gerbil.net   http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)

Re: Level 3 Agrees to Purchase Global Crossing

2011-04-11 Thread Richard A Steenbergen

On Mon, Apr 11, 2011 at 03:49:43PM -0700, Holmes,David A wrote:
 Way too many players ... means that the telecom marketplace is good 
 for the consumer, with competition keeping prices low. Many network 
 users feel that prices are still way too high, particularly for high 
 speed circuits and dark fiber, areas in which Level 3 and Global 
 Crossing have specialized.

Cute theory, but unfortunately this has no basis in reality. Users can 
feel any way they'd like, but the truth is that the current market 
prices for wholesale IP transit, in which Level 3 and Global Crossing 
specialize, are far below cost and are impossible for any carrier to 
sustain long term. I'm not saying that either L3 or GX runs a completely 
optimal network (infact I'd say that GX may well be a case study in 
failure to do so :P), but a simple analysis of the costs of routers, 
colo, power, crossconnects, optical gear, etc, makes it abundantly clear 
that the current rush to the bottom pricing cannot possibly be 
supported even under optimal conditions and ignoring other overhead. The 
situation isn't significantly different for high-speed longhaul 
capacity, the revenue these these circuits generate at current market 
prices is barely offsetting their capex on the optical gear at this 
point. Anyone who told you that there is a cash cow in this particular 
market is woefully mistaken, any serious money to be had is coming from 
enterprise customers who can only be reached via unique metro assets.

I have no doubt that there will be some modest reduction in competition 
following the acquisition, but I honestly don't think it is anything to 
get too worried about. Unlike L3's previous acquisitions (such as 
Wiltel, Telcove, Looking Glass, etc), it isn't really possible for them 
to disappear the assets from the market following the purchase. GX's 
longhaul fiber footprint is mostly still owned and operated by Qwest, 
they were never a big player in IRU dark sales to begin with, and they 
don't have much in the way of metro fiber assets to speak of. The two 
companies also not really in any danger of being able to stop the 
current tide of market transit prices, since this are being driven by 
many other companies. And L3 has already learned what happens to their 
market share when they try to alter market pricing by themselves, which 
is what led to their current Comcast debacle in the first place.

The best case scenario that I see here is L3 being able to provide some 
technical leadership to significantly reduce GX's overhead, and 
hopefully fix some of their other problem areas too. But personally I'm 
not convinced that L3 is the technical or market force they used to be, 
and thus I question whether they'll be able to get it right themselves. 
Remember, it taks a LOT of work for a big telco to put all the pieces in 
place correctly, and any mistakes on their part will open the door for 
smaller carriers to show off the advantages of being nimble. If there is 
any significant reduction in competition that comes to either carrier, 
it will do exactly that. Infact, I encourage them to try, it will 
probably be good for my business. :)

-- 
Richard A Steenbergen r...@e-gerbil.net   http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)

Re: Peering Traffic Volume

2011-03-25 Thread Richard A Steenbergen

On Thu, Mar 24, 2011 at 07:27:08PM -0400, Ravi Ramaswamy wrote:
 Hi All - I am new to this mailer.  Hopefully my question is posed to the
 correct list.
 
 I am using 2.5 Tbps as the peak volume of peering traffic over all 
 peering points for a Tier 1 ISP, for some modeling purposes.  Is that 
 a reasonable estimate?

The largest Tier 1's, like say Level 3, and god help me for saying it 
but... Cogent, are certainly in or beyond that kind of ballpark. But 
most of the smaller ones, like say ATT, Qwest, ATDN (if you even still 
want to count them), etc, not a chance in hell. And then there are 
plenty of non tier 1 networks (and some that aren't even actual single 
networks in the classic sense) that do far more traffic than that, for 
example some of the large CDNs like Akamai and LimeLight.

On the modern Internet most of the traffic bypasses Tier 1 networks 
completely, going directly from content networks to eyeball networks, 
so the Tier 1's are effectively left as the higher priced and lower 
capacity last resorts for the remaining traffic.

-- 
Richard A Steenbergen r...@e-gerbil.net   http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)

Re: bfd-like mechanism for LANPHY connections between providers

2011-03-16 Thread Richard A Steenbergen

On Wed, Mar 16, 2011 at 06:56:28PM +0200, Tassos Chatzithomaoglou wrote:
 Are there any transit providers out there that accept using the BFD (or 
 any other similar) mechanism for eBGP peerings?
 If no, how do you solve the issue with the physical interface state when 
 LANPHY connections are used?
 Anyone messing with the BGP timers? If yes, what about multiple LAN 
 connections with a single BGP peering?

Well first off LAN PHY has a perfectly useful link state. That's pretty 
much the ONLY thing it has in the way of native OAM, but it does have 
that, and that's normally good enough to bring down your EBGP session 
quickly. Personally I find the risk of false positives when speaking to 
other people's random bad BGP implementations to be too great if you go 
much below 30 sec hold timers (and sadly, even 30 secs is too low for 
some people). We (nLayer) are still waiting for our first customer to 
request BFD, we'd be happy to offer it (with reasonable timer values of 
course). :)

-- 
Richard A Steenbergen r...@e-gerbil.net   http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)

Re: bfd-like mechanism for LANPHY connections between providers

2011-03-16 Thread Richard A Steenbergen

On Wed, Mar 16, 2011 at 02:55:14PM -0400, Jeff Wheeler wrote:
 
 This is often my topology as well.  I am satisfied with BGP's 
 mechanism and default timers, and have been for many years.  The 
 reason for this is quite simple: failures are relatively rare, my 
 convergence time to a good state is largely bounded by CPU, and I do 
 not consider a slightly improved convergence time to be worth an 
 a-typical configuration.  Case in point, Richard says that none of his 
 customers have requested such configuration to date; and you indicate 
 that Level3 will provision BFD only if you use a certain vendor and 
 this is handled outside of their normal provisioning process.

There are still a LOT of platforms where BFD doesn't work reliably 
(without false positives), doesn't work as advertised, doesn't work 
under every configuration (e.g. on SVIs), or doesn't scale very well 
(i.e. it would fall over if you had more than a few neighbors 
configured). The list of caveats is huge, the list of vendors which 
support it well is small, and there should be giant YMMV stickers 
everywhere. But Juniper (M/T/MX series at any rate) is definitely one of 
the better options (though not without its flaws, inability to configure 
on the group level and selectively disable per-peer, and lack of support 
on the group level where any IPv6 neighbor is configured, come to mind).

Running BFD with a transit provider is USUALLY the least interesting use 
case, since you're typically connected either directly, or via a metro 
transport service which is capable of passing link state. One possible 
exception to this is when you need to bundle multiple links together, 
but link-agg isn't a good solution, and you need to limit the number of 
EBGP paths to reduce load on the routers. The typical solution for this 
is loopback peering, but this kills your link state detection mechanism 
for killing BGP during a failure, which is where BFD starts to make 
sense.

For IX's, where you have an active L2 switch in the middle and no link 
state, BFD makes the most sense. Unfortunately it's the area where we've 
seen the least traction among peers, with zomg why are you sending me 
these udp packets complaints outnumbering people interesting in 
configuring BFD 10:1.

-- 
Richard A Steenbergen r...@e-gerbil.net   http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)

Re: Internet Edge Router replacement - IPv6 route tablesizeconsiderations

2011-03-11 Thread Richard A Steenbergen

On Fri, Mar 11, 2011 at 12:55:33PM -0600, James Stahr wrote:
 link-local address.  Then I realized, why even assign a global in the 
 first place?  Traceroutes replies end up using the loopback. BGP will 
 use loopbacks.  So is there any obvious harm in this approach that I'm 
 missing?

Traceroute replies most assuredly do NOT use loopbacks on most networks, 
and it would make troubleshooting massively more difficult if this was 
the only option. Imagine any kind of complex network where there is more 
than one link between a pair of routers (and don't just picture your own 
internal network, but imagine customers connecting to their ISPs as 
well) , and now tell me how you plan on identifying a particular link 
with a traceroute. The two words that best sum this up would be epic 
disaster.

-- 
Richard A Steenbergen r...@e-gerbil.net   http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)

Re: Internet Edge Router replacement - IPv6 route table sizeconsiderations

2011-03-10 Thread Richard A Steenbergen

On Thu, Mar 10, 2011 at 10:52:37AM -0800, George Bonser wrote:
 
 What I have done on point to points and small subnets between routers 
 is to simply make static neighbor entries.  That eliminates any 
 neighbor table exhaustion causing the desired neighbors to become 
 unreachable.  I also do the same with neighbors at public peering 
 points.  Yes, that comes at the cost of having to reconfigure the 
 entry if a MAC address changes, but that doesn't happen often.

And this is better than just not trying to implement IPv6 stateless 
auto-configuration on ptp links in the first place how exactly? Don't 
get taken in by the people waving an RFC around without actually taking 
the time to do a little critical thinking on their own first, /64s and 
auto-configuration just don't belong on router ptp links. And btw only a 
handful of routers are so poorly designed that they depend on not having 
subnets longer than /64s when doing IPv6 lookups, and there are many 
other good reasons why you should just not be using those boxes in the 
first place. :)

-- 
Richard A Steenbergen r...@e-gerbil.net   http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)

Re: ATT via Tata and Level3

2011-03-03 Thread Richard A Steenbergen

On Thu, Mar 03, 2011 at 11:15:51AM -0500, Morgan Miskell wrote:
 I've noticed that we have thousands of routes for ATT via Tata that 
 we don't have from ATT through Level3.  I would expect Level3 to have 
 most of the routes for ATT that Tata does since they are both 
 directly peered with ATT.

Well, I don't know anything about this specific issue or any policy 
changes that may have been made, but at a high level I can tell you that 
BGP doesn't work like that. BGP is only capable of passing on a single 
best path for each route, and what is considered the best path is 
totally in the eye of the beholder.

First off you must understand that the vast majority of Internet routes 
are multi-homed at some level. As you get into large Tier 1 carriers, 
the amount of overlap is massive (i.e. you'll hear the same route as a 
customer from multiple networks), and the question of which path will 
be selected is completely up to the policies of the network doing the 
selecting. Not only does this vary by policy, but it varies by the 
composition of other networks they peer with (or buy from), what other 
networks buy from them, and even their network topology (due to tie 
breaking rules like EBGP  IBGP).

For example, Level 3 is a much larger network with significantly more 
customer routes than Tata. I'm too lazy to do an actual comparison 
between the two, but odds are high that of the ATT customer routes that 
they announce to their peers, probably somewhere around 30-40% of those 
routes are also Level 3 customer routes as well. A network will ALWAYS 
prefer their customer routes above those learned from peers (or else 
they wouldn't be able to guarantee that they're actually providing full 
transit service), so those routes coming from ATT will never be 
selected. Meanwhile, Tata is receiving those same routes from both ATT 
and Level 3 (and potentially other peers and/or customers too), and is 
completely free to make their own best path selections based on their 
own local criteria.

The result is that you should almost never expect to see the same paths 
for the same networks being selected by two different large networks, 
unless the routes in question are single homed and there are no other 
choices (which is a small minority of the routes on the Internet).

-- 
Richard A Steenbergen r...@e-gerbil.net   http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)

Re: 6453 routing leaks (January and Today)

2011-02-25 Thread Richard A Steenbergen

On Fri, Feb 25, 2011 at 07:22:36AM -0500, Jared Mauch wrote:
 Update:
 
 I have had a source ask me to post the following:
 
 -- snip --
 The problem with route leaking was caused by specific routing platform 
 resulting in some peer routes not being properly tagged.
 We are deploying additional measures to prevent this from happening in 
 the future
 -- snip --

Hopefully someone learned a lesson about BGP community design, and how 
it should fail safe by NOT leaking if you accidentally fail to tag a 
route. Always require a positive match on a route to advertise to peers, 
not the absence of a negative match.

-- 
Richard A Steenbergen r...@e-gerbil.net   http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)

Re: SFP vs. SFP+

2011-02-17 Thread Richard A Steenbergen

On Thu, Feb 17, 2011 at 03:41:28PM -0800, Sam Chesluk wrote:
 Depends on the switch.  Some, like the 2960S and 4948E, have 1G/10G
 ports.  They will, however, not operate at 4Gbps (that particular speed
 was chosen to allow the core components to work for gigabit Ethernet,
 OC48, 2G FC, and 4G FC).

4G SFPs are relatively rare, and only for fibre channel. Multi-rate SFPs 
that do up to 2.5G (for OC48) are a lot more common, but they cost more 
than just a simple 1GE SFP. Since all you can do with Ethernet is 1G or 
10G anyways, most SFPs you'll encounter in the field will be the 
cheaper non-multirate kind.

For more information about SFP+, as well as some comparisons between 
different 10G optic types, take a look at:

http://www.nanog.org/meetings/nanog42/presentations/pluggables.pdf

As an update (since this presentation is from Feb 2008), SFP+ is just 
now finally starting to get into 40km/ER reach territory. Supplies are 
limited, as they just very recently started shipping, but they do exist. 
Of course since they moved the electronic dispersion compensation (EDC) 
off the optic and onto the host board, the exact distances you'll be 
able to achieve are still based on the quality of the device you're 
plugging them into. SFP+ is still mostly an enterprise box or high 
density / short reach offering, and XFP is still required for full 
functionality.

-- 
Richard A Steenbergen r...@e-gerbil.net   http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)

Re: SFP vs. SFP+

2011-02-17 Thread Richard A Steenbergen

On Thu, Feb 17, 2011 at 09:04:29PM -0600, Frank Bulk wrote:

 Are there are any optics that plug into 10G ports but have a copper or 
 optical 1G interface?  There's some equipment that I'm specing where 
 it is $10K for a multi-port 1G card, even while I really may only 
 *occasionally* need a single 1G port and there's a free 10G port for 
 me to use.

It doesn't work that way. The closest you can get is that the device can 
support either 1G or 10G in the same port (since SFP and SFP+ are 
physically and electrically the same), but it requires support from the 
device (since both PHYs have to be implemented).

-- 
Richard A Steenbergen r...@e-gerbil.net   http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)

Re: SFP vs. SFP+

2011-02-17 Thread Richard A Steenbergen

On Fri, Feb 18, 2011 at 12:55:45AM -0500, Peter Nowak wrote:
 
 You can plug SFP module (copper or fiber) into any SFP+ port.
 So, on 10G port you can run either 1GE or 10GE.

Not true. Some devices support this, since SFP and SFP+ are physically 
and electrically compatible, but not all. The device must be 
specifically designed to support both PHYs, which is NOT a given.

-- 
Richard A Steenbergen r...@e-gerbil.net   http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)

Re: Announcing the Community FlowSpec trial

2011-01-05 Thread Richard A Steenbergen

On Wed, Jan 05, 2011 at 05:46:36PM -0600, John Kristoff wrote:
 Friends and colleagues,
 
 At NANOG 48 I talked about a community flow-spec service we were
 looking at trying to make work.  This is the idea of using IETF RFC
 5575 to pass around flow-based rules, in this case, primarily for
 dropping unwanted packets.
 
 This technology is not as widely deployed as traditional RTBH
 techniques for a number of reasons.  However, we thought perhaps it
 was widely used enough, or could be, to justify what might be a
 helpful and free 3rd party feed of flow-spec routes to keep our
 networks a little bit cleaner.
 
 A trial of this feed based on the traditional bogon routes can be had 
 by contacting me directly.  We realize the traditional IPv4 reserved, 
 special and unallocated IPv4 bogon address is dwindling.  Maybe there 
 is room for some other type of feed, but to justify that, we're 
 looking to see if even enough people would set up this presumably 
 simpler feed to help us and the community get some more experience 
 with multi-hop flow-spec.

As a word of warning to anyone who wants to deploy this on their Juniper 
routers (what other router vendors support it? :P), there are some 
pretty serious performance considerations of which you should be aware.

For example, we discovered that on MX routers (with classic I-chip DPCs, 
the performance should be somewhat better for Trio cards but we haven't 
fully tested the exact numbers yet), installing as few as a dozen 
flowspec routes can create firewall filters that use enough SRAM 
accesses that you will no longer be able to achieve line rate 
packets/sec. With a few more rules, you may find that your 10GE's will 
only be able to handle 3-5Mpps instead of the normal 14.8Mpps. When this 
happens, excess traffic above what the firewall filters can handle will 
be silently discarded, with no indicaton in SNMP or show interface 
that you're dropping packets (though you may be able to see it in show 
pfe statistics traffic as Info cell drops).

I can't tell you what the performance numbers are for other platforms, 
but anyone thinking about turning on flowspec from a third party source 
(especially one who may be sending them a large number of rules) should 
give serious consideration to the potential impact on their network 
first.

-- 
Richard A Steenbergen r...@e-gerbil.net   http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)

Re: Comcast vs Level 3 - This time with video

2010-12-20 Thread Richard A Steenbergen

On Mon, Dec 20, 2010 at 11:59:31AM -0500, Randy Epstein wrote:
  A simplified explanation of the situation between Level 3 and Comcast,
 from the perspective of a Comcast customer who is asking for the same thing
 Comcast is asking for. :)
 
  http://www.xtranormal.com/watch/8124137/
 
 I have to question Richard on this interaction though. There is no way 
 in hell a Comcast customer service rep would respond like that. Not at 
 least without putting you on hold 5 times and then still, wouldn't 
 know what in the hell you're talking about. In the end, the service 
 rep would tell you they need to dispatch someone to your house.

Hah, yes they did seem to skip over the usual bad ratios? have you 
tried rebooting your cable modem? part didn't they. I suppose I should 
have added the phrase highly fictionalized, but Xtranormal has 
something against allowing punctuation in their descriptions, and the 
existing one was confusing enough.

FYI a bunch of people complained that the voices were hard to 
distinguish, so I did a modified version which is a little more 
intelligable. It's also linked to from the original, as part of the same 
series.

http://www.xtranormal.com/watch/8134089/

-- 
Richard A Steenbergen r...@e-gerbil.net   http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)

Re: Some truth about Comcast - WikiLeaks style

2010-12-19 Thread Richard A Steenbergen

On Sun, Dec 19, 2010 at 08:20:49PM -0500, Bryan Fields wrote:
 
 The government granting a monopoly is the problem, and more lame 
 government regulation is not the solution.  Let everyone compete on a 
 level playing field, not by allowing one company to buy a monopoly 
 enforced by men with guns.

Running a wire to everyone's house is a natural monopoly. It just 
doesn't make sense, financially or technically, to try and manage 50 
different companies all trying to install 50 different wires into every 
house just to have competition at the IP layer. It also wouldn't make 
sense to have 5 different competing water companies trying to service 
your house, etc. This is where government regulation of the entities who 
ARE granted the monopoly status comes into play, to protect consumers 
against abuses like we're seeing Comcast commit today.

Personally I think the right answer is to enforce a legal separation 
between the layer 1 and layer 3 infrastructure providers, and require 
that the layer 1 network provide non-discriminatory access to any 
company who wishes to provide IP to the end user. But that would take a 
lot of work to implement, and there are billions of dollars at work 
lobbying against it, so I don't expect it to happen any time soon. :)

-- 
Richard A Steenbergen r...@e-gerbil.net   http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)

Re: Some truth about Comcast - WikiLeaks style

2010-12-19 Thread Richard A Steenbergen

On Sun, Dec 19, 2010 at 05:58:26PM -0800, Leo Bicknell wrote:
 
 I dream of a day where we have municipal fiber to the home, leased to 
 any ISP who wants to show up at the local central office for a dollar 
 a two a month so there can be true competition in end-user services.

Take a second and think about what THAT would do to the ratio wars. 
Imagine if any hosting/content provider, with potentially hundreds or 
thousands of gigabits of unused inbound capacity on their networks, 
could easily get into providing IP service to eyeballs. Even ignoring 
the existing 95th percentile silliness like free inbound transit, 
which would no doubt rapidly evaporate under this kind of model, the 
difference in efficiencies between the highly competetive hosting world 
and the highly non-competetive last mile world are simply staggering. 
For many content networks, it would be an opportunity to start making 
money on their bits instead of paying for them, and networks without 
content expertise would be in serious trouble.

I personally can't think of a single thing with more potential for 
massive disruption to the business models of incumbent providers. There 
are so many billions of dollars at stake protecting the status quo that 
it's not even funny, which IMHO is why you'll never see any of this 
happen in the US, in any kind of scale at any rate. :)


-- 
Richard A Steenbergen r...@e-gerbil.net   http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)

Re: Some truth about Comcast - WikiLeaks style

2010-12-19 Thread Richard A Steenbergen

On Sun, Dec 19, 2010 at 06:12:02PM -0800, JC Dill wrote:
 
 And if a competing water service thought they could do better than the 
 incumbent, why not let them put in a competing water project?  If they 
 think they can make money after the cost of the infrastructure, then 
 they may be onto something.  We don't have to worry that too many 
 would join in, the laws of diminishing returns would make it 
 unprofitable for the nth company to build out the infrastructure to 
 enter the market.

The laws of diminishing returns have already set the bar for the point 
at which it's not profitable for a new company to enter the market and 
try to compete. Right now the number is roughly 2, cable and dsl, give 
or take a few outliers. I do believe the point would be to encourage a 
little more competition than that. :)

-- 
Richard A Steenbergen r...@e-gerbil.net   http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)

Re: potential new and different architectural approach to solve the Comcast - L3 dispute

2010-12-17 Thread Richard A Steenbergen

On Fri, Dec 17, 2010 at 11:15:14AM -0600, Benson Schliesser wrote:
 
 I have no direct knowledge of the situation, but my guess: I suspect 
 the proposal was along the lines of longest-path / best-exit routing 
 by Level(3).  In other words, if L(3) carries the traffic (most of the 
 way) to the customer, then Comcast has no complaint--the costs can be 
 more fairly distributed.  The modest investment is probably in tools 
 to evaluate traffic and routing metrics, to make this work.  This 
 isn't really *new* to the peering community, but it isn't normal 
 either.

Nah, you're still thinking about this like it was a classic peering 
dispute over ratios, when nothing could be further from the truth. First 
off, by the very nature of a CDN, all of the Netflix/etc traffic is 
going to be delivered to the best exit on the long-haul network already. 
Second, Comcast is a FULL TRANSIT CUSTOMER of Level 3. Typically the 
customer gets to dictate the handoff point to the provider, by either 
advertising MEDs, or by sending inconsistent routes. The fact that the 
existing Level3/Comcast routing DOESN'T make Level 3 haul all of the 
bits to the best exit mean it's highly likely that Comcast agreeing to 
haul the bits was part of their commercial transit agreement, probably 
in exchange for lower transit prices.

-- 
Richard A Steenbergen r...@e-gerbil.net   http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)

Comcast vs Level 3 - This time with video

2010-12-17 Thread Richard A Steenbergen

A simplified explanation of the situation between Level 3 and Comcast, 
from the perspective of a Comcast customer who is asking for the same 
thing Comcast is asking for. :)

http://www.xtranormal.com/watch/8124137/

-- 
Richard A Steenbergen r...@e-gerbil.net   http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)

Re: potential new and different architectural approach to solve the Comcast - L3 dispute

2010-12-17 Thread Richard A Steenbergen

On Sat, Dec 18, 2010 at 01:07:15AM -0500, Patrick Giagnocavo wrote:
 
 Note that Comcast has never said that the Level3/Netflix issue is 
 about users exceeding their allotted bandwidth (currently at about 
 250GB/month for residential); presumably, were a Comcast user to use 
 249GB of bandwidth downloading cute pictures of cats, Comcast would 
 have no objection.

I believe they want the cat people to pay too, it's just easier to go 
after Netflix first.

Lets say for a moment that Comcast's overall ratio with its customers is 
approximately the same as their ratio in the leaked Tata graphs (yes I 
know that this proves nothing, but lets just assume it for a moment), 
i.e. 5:1. They then ask that every network who sends them traffic, even 
their transit providers (in the case of Level 3) be under 2:1. What is 
the point of insisting on a ratio that is not supported by the traffic 
their customers actually request? Because it gives them a convenient 
excuse to demand payment from nearly everyone on the Internet for being 
out of ratio, and to restrict capacity to those who do not pay.

With so many transit ports running hot, and even peering ports running 
hot as in the recent example where they intentionally turned down Global 
Crossing capacity (which they claim is settlement free) and CAUSED 
congestion, the ISP who hosts the cute cat pictures may have little 
choice but to pay Comcast for access, or risk losing their cute cat 
hosting business to someone else who is willing to do so.

I've also seen Comcast ignore several offers to honor MEDs or accept 
more-specifics from networks who DO meet their published peering 
requirements in every way except ratios, so I don't think they're 
interested in technical solutions a potential transport cost imbalance 
either. If it was about anything other than trying to extract a toll 
from content providers, one of these technical solutions would clearly 
have been better for them then continuing to force the traffic into 
their congested transit ports, which they not only pay for, but then 
also do the backhaul for across their own network.

BTW, they rejected my very nice comment on their blog asking if they 
would be willing to share the graphs of their transit provider 
interfaces (which are NOT peering relationships, and not under NDA) to 
back up their claims that the published graphs are false, so I'm 
positive yours isn't going to get through. :)

-- 
Richard A Steenbergen r...@e-gerbil.net   http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)

Re: Some truth about Comcast - WikiLeaks style

2010-12-16 Thread Richard A Steenbergen

On Thu, Dec 16, 2010 at 02:48:56PM -0500, Randy Epstein wrote:
 
 I was in the IRC channel at the time and saw it.  It's real.
 
 I don't support the posting of IRC logs, but can't control that either.

I saw it too. I don't support posting of IRC logs trying to get people 
in trouble (though lord knows it wouldn't be the first time that has 
happened :P), but I also completely disagree with Comcast's position on 
this (big shocker, I know).

As one of the people who has spoken out against Comcast's actions the 
most vocally, I suppose the original sentiment might very well be 
targeted at me. Personally I really don't think that people on the NANOG 
list posting about their network issues or actions has ANYTHING to do 
with their sponsorship of the NANOG conferences or community, and I 
suppose I should be shocked and appalled that it might come down to 
these type of threats to silence people who have something negative to 
say. I'm a Comcast customer too (50M/10M or 6M/768K DSL at home, gee, 
decisions decisions :P), what are they going to do next, shut off my 
cable modem for TOS violations? :)

Seriously guys, this is an operator forum and you're running a congested 
network, to expect that people are not going to comment on those facts 
just because you've put money into NANOG sponsorship is absurd.

-- 
Richard A Steenbergen r...@e-gerbil.net   http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)

Re: Some truth about Comcast - WikiLeaks style

2010-12-16 Thread Richard A Steenbergen

On Thu, Dec 16, 2010 at 02:13:47PM -0600, Richard A Steenbergen wrote:
 Seriously guys, this is an operator forum and you're running a congested 
 network, to expect that people are not going to comment on those facts 
 just because you've put money into NANOG sponsorship is absurd.

Forgot to attach a giant disclaimer on the previous post: I'm speaking 
solely for myself, and not in any way, shape, or form, for the NANOG, 
NewNOG, or any other organization.

-- 
Richard A Steenbergen r...@e-gerbil.net   http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)

Re: Some truth about Comcast - WikiLeaks style

2010-12-15 Thread Richard A Steenbergen

On Wed, Dec 15, 2010 at 02:25:53PM -0500, Jeffrey Lyon wrote:
 From Tata? I'd eat my own hand if they were paying more than $1-2
 across the board.

I know people who have offered them hundreds of gigs of settlement free 
transit (including myself), but clearly they aren't interested. FYI a 
large number of their wholesale transit/paid peering customer agreements 
include clauses which prohibit the resale of services to other parties 
too. They don't want one person being able to buy capacity into their 
network, then provide it to others.

Remember their goal isn't to save money on transit, it's to make the 
transit paths minimally functional so they can force content networks to 
buy from them directly (at above market rates, from what people tell me
:P), so they don't WANT to add capacity or transit paths.

-- 
Richard A Steenbergen r...@e-gerbil.net   http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)

Re: Some truth about Comcast - WikiLeaks style

2010-12-15 Thread Richard A Steenbergen

On Wed, Dec 15, 2010 at 07:05:26PM -0600, Jack Bates wrote:
 On 12/15/2010 4:47 PM, Adam Rothschild wrote:
  Folk in
  content/hosting should find this all more than a little bit scary.
 
 So you don't think the money content providers will pay Comcast won't 
 reflect on other eyeball networks who aren't important/large enough to 
 request financing? ie, Comcast could run lower rates and offer better 
 service by charging the content provider, while competitive eyeball 
 networks won't get the option to receive compensation from content 
 providers and have to charge appropriate rates to their customers.

And if you saw someone getting mugged on the street, you could argue 
that you're now less likely to be robbed because the guy already has 
someone else's money...

If Comcast wanted to grow its revenue by offering a better, faster, 
cheaper, etc, wholesale transit service to content networks, I don't 
think anyone here would object in the slightest. The problem is that 
rather than compete on any kind of financial or technical merit, they've 
decided to hold their cable customers hostage and FORCE content networks 
to buy from them. Rest assured nobody WANTS to buy transit from a 
network with a 109ms rtt between New York and San Jose (it boggles the 
mind how one could even manage to assemble that fiber path, let alone 
try to charge money for it :P), congestion on every port, etc.

If Comcast gets away with this, what's to stop every other 
monopoly/duopoly eyeball network from doing the same thing? And yes 
maybe if Comcast forces Netflix to pay them to reach you (either 
directly or indirectly via Level 3), your cable modem bill might go 
down, but all that means is that your Netflix bill is going to go up. At 
the end of the day you're probably better off betting on lower costs 
from the technical innovation of the networks who DON'T pay $50k for a 
10GE port. :)

-- 
Richard A Steenbergen r...@e-gerbil.net   http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)

Re: Some truth about Comcast - WikiLeaks style

2010-12-14 Thread Richard A Steenbergen

On Tue, Dec 14, 2010 at 02:54:13AM -0500, Jeffrey Lyon wrote:
 gin-nto-icore1 is a Tata router at Equinix in NY. Whether or not that
 port belongs to Comcast is anyone's guess.

From Tata's looking glass:

  3 Vlan550.icore1.NTO-NewYork.as6453.net (209.58.26.78) 4 msec
Vlan551.icore1.NTO-NewYork.as6453.net (209.58.26.82) 4 msec 0 msec
  4 pos-1-9-0-0-cr01.newyork.ny.ibone.comcast.net (68.86.86.41) [AS 7922] 4 
msec 4 msec 4 msec

As far as I can tell their DNS doesn't expose Tata's router port names 
at all:

77.26.58.209.in-addr.arpa domain name pointer 
Vlan550.icore1.NTO-NewYork.as6453.net.
78.26.58.209.in-addr.arpa domain name pointer 
Vlan550.icore1.NTO-NewYork.as6453.net.
81.26.58.209.in-addr.arpa domain name pointer 
Vlan551.icore1.NTO-NewYork.as6453.net.
82.26.58.209.in-addr.arpa domain name pointer 
Vlan551.icore1.NTO-NewYork.as6453.net.
41.86.86.68.in-addr.arpa domain name pointer 
pos-1-9-0-0-cr01.newyork.ny.ibone.comcast.net.
42.86.86.68.in-addr.arpa domain name pointer 
pos-1-0-0-0-pe01.111eighthave.ny.ibone.comcast.net.

Though I suppose if someone was photoshopping it, it would be pretty 
obvious for them to stick something that does show up in DNS into the 
graphs, so that doesn't exactly prove much. I'm also assuming Comcast 
wouldn't be very happy to have these out in public, so there is pretty 
much no way you're going to see a leaked graph that ISN'T from an 
anonymous source.

FWIW these graphs pretty much reflect the massive congestion that I've 
been observing between Tata and Comcast. I've also seen some third party 
Smokeping graphs which visually show the rate of loss, and the pattern 
looks very very similar, but I'll let someone who actually maintains 
them be the one to post them.

-- 
Richard A Steenbergen r...@e-gerbil.net   http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)

Re: Some truth about Comcast - WikiLeaks style

2010-12-14 Thread Richard A Steenbergen

On Tue, Dec 14, 2010 at 11:24:45AM -0500, Craig L Uebringer wrote:
 
 Yeah, the 30 day looks like a classic uptick in traffic toward the 
 holidays. Some bellhead beancounter maybe took out capacity in the 
 summer lull and ignored the engineers. Or they just have stupidly-slow 
 install intervals. Same crap I've seen on loads of provider networks.

Except that they seem to be busy actively turning down other capacity, 
and forcing extra traffic through their Tata ports by blocking other 
paths with BGP no-export communities.

For example, we've been observing Comcast turning down some of their 
Global Crossing capacity in recent days, causing new congestion during 
peak traffic times. I've even seen people contact the various NOCs 
involved, and they've been told explicitly and by multiple parties that 
Comcast is intentionally turning down extra capacity and running their 
existing ports hot.

Everybody who deals with interconnection capacity in this industry knows 
what's going on, but the graphs and interconnection details are all 
under NDA, so it takes an inside source secretly leaking graphs to the 
public to expose this kind of activity. Even then you'll still have 
people who claim that it proves nothing because the graphs can't be 
positively associated to a specific customer port, but realistically 
these kinds of leaks are probably the best public info you'll ever see.

-- 
Richard A Steenbergen r...@e-gerbil.net   http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)

Re: Some truth about Comcast - WikiLeaks style

2010-12-14 Thread Richard A Steenbergen

On Tue, Dec 14, 2010 at 03:39:07PM -0600, Aaron Wendel wrote:
 To what end?  And who's calling the shots there these days?  Comcast 
 has been nothing but shady for the last couple years.  Spoofing 
 resets, The L3 issue, etc.  What's the speculation on the end game?

I believe Comcast has made clear their position that they feel content 
providers should be paying them for access to their customers. I've seen 
them repeatedly state that they feel networks who send them too much 
traffic are abusing their network. It isn't a ratios argument in the 
classic sense, between two peers trying to maintain a fair balance of 
costs and benefits, it's that they object to ANY content provider being 
able to deliver to their customers without paying them for access. They 
do this by trying to enforce ratios which are well beyond what their 
actual end users are routing, and as in the case of Level 3, they 
leverage that position to claim that other networks should be paying 
them under threat of blocking uncongested access to their customers.

I would say their short term goal is to make people who currently won't 
peer with them do so, so they can become transit free. This has been 
seen time and time again, as they move networks who they want to peer 
with but who will not peer with them into congested transit bucket. A 
while back it was SAVVIS, now it is Tata, but the pattern is clear and 
repetitive. Note that this only extends to a certain point though, as in 
the case of Global Crossing, who they claim is a settlement free peer, 
but who they have recently started pressuring and intentionally 
congesting because of ratio imbalances.

Their long term goal seems to be to force content networks to pay them 
for direct transit or on-net connectivity, by removing the available 
capacity from other paths. If you are a content network, and you can't 
reach them in a reliable fashion via The Internet, your only choice 
may be to buy from Comcast directly.

This is obviously not the first time that networks have used this 
strategy, there are several prominent examples in recent history of 
others using this exact same technique. But this is definitely one of 
the worst examples in the US of a major eyeball network using access to 
their customers (who may have little or no choice in their broadband 
access) to force other networks to pay them, and IMHO it needs to be 
called out publicly whenever possible.

-- 
Richard A Steenbergen r...@e-gerbil.net   http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)

Re: TWT - Comcast congestion

2010-12-01 Thread Richard A Steenbergen

On Wed, Dec 01, 2010 at 06:31:39AM -0800, Leo Bicknell wrote:
 In a message written on Tue, Nov 30, 2010 at 10:59:25PM -0600, Richard A 
 Steenbergen wrote:
  I believe that's what I said. To be perfectly clear, what I'm saying is:
  
  * Comcast acted first by demanding fees
  * Level 3 went public first by whining about it after they agreed to pay
  * Comcast was well prepared to win the PR war, and had a large pile of 
content that sounds good to the uninformed layperson ready to go.
 
 I think I can make this very simple.  What I am saying is that
 you're missing a step before your 3 bullet points.  Before any of
 the three things you describe, Level 3 demanded fees from Comcast.
 Level 3 is doing a great job of getting folks to ignore that fact.

Do you have any basis for this claim, or are you just making it up 
as a possible scenario that would explain Comcast's actions? I have 
it on good authority that Level 3 did not attempt to raise their 
prices or ask for additonal fees beyond their existing contract, 
nor was their contract coming to term where they could renegotiate 
for more favorable terms. Comcast simply said, we've decided we don't 
want to pay you, you should pay us instead, and you're going to bend 
over and like it if you want to be able to reach our customers.

Obviously the version I've heard and the version you're pitching 
can't co-exist, so either you have some REALLY interesting inside 
info that I don't (which I honestly find hard to believe given 
your knowledge of the facts so far), or you're stating a theory 
with no possible basis that I can find as a fact. If it's just 
a theory, please say so, then we don't keep having to argue these 
positions that can clearly never converge.

 Comcast is a customer of L3, and pays them for service.  Brining
 on Netflix will cause Comcast to pay L3 more.  More interestingly,
 in this case it's likely Level 3 went to Comcast and said we don't
 think your existing customer ports will handle the additional
 trafficso...um...you should buy more customer ports.

Comcast is th customer, they have complete and total control of the 
traffic being exchabged over their transit ports. If they wanted 
less traffic, they could announce fewer routes, or add more 
no-export communities. They also have complete control of traffic 
being sent outbound, and since Level3 is more than capable of 
handling 300Gbps (the capacity comcast claims they have), if 
Comcast actually had 300Gbps of outbound traffic to send they 
could easily have had a 1:1 ratio.

Framing this as a peering ratio debate is absurd, because there 
two networks were NEVER peers. Any customer could have sent 
addtional bits to Level3 at any time, and Comcast should be 
prepared to deal with the TE as a result. That's life on the 
Internet.

 Does network neutrality work both ways?  If it is bad for Comcast
 to hold the users hostage to extort more money from Level 3, is it
 also bad for Level 3 to hold the content hostage to extort more
 money from Comcast?

You know, most people manage to buy sufficient transit capacity to 
support the volume of traffic that their customers pay them to 
deliver. Only Comcast seems to feel that it is proper to use their 
captive customer base hostage to extort content networks into paying 
for uncongested access. Level 3 is free to sell full transit or CDN 
to whomever they like, just as Comcast is free to not buy transit 
from Level 3 when their contract is up. The net neutrality part 
starts when Level 3 is NOT free to turn off their customer for 
non-payment just like what would happen to anyone else who suddenly 
decided they didn't think they should keep paying their bills, 
because Comcast maintains so little transit capacity that to shut 
them off would cause mssive disruptions to large portions of the 
Internet.

-- 
Richard A Steenbergen r...@e-gerbil.net   http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)

Re: TWT - Comcast congestion

2010-11-30 Thread Richard A Steenbergen

On Tue, Nov 30, 2010 at 11:45:53AM -0800, Kevin Oberman wrote:
 We have seen the same thing with other carriers. As far as I can see, 
 Comcast is congested, at least at Equinix in San Jose. Since this is 
 all over private connections (at least in our case), the fabric is not 
 an issue.
 
 Maybe they will be using the money from Level(3) to increase capacity 
 on the peerings with the transit providers. (Or maybe not.)

I don't know about their connection to TWT, but Comcast has definitely 
been running their transits congested. The most obvious one from recent 
months is Tata, which appears to be massively congested for upwards of 
12 hours a day in some locations. Comcast has been forcing traffic from 
large networks who refuse to peer with them (e.g. Abovenet, NTT, Telia, 
XO, etc) to route via their congested Tata transit for a few months now, 
their Level3 transit is actually one of the last uncongested providers 
that they have.

The part that I find most interesting about this current debacle is how 
Comcast has managed to convince people that this is a peering dispute, 
when in reality Comcast and Level3 have never been peers of any kind. 
Comcast is a FULL TRANSIT CUSTOMER of Level3, not even a paid peer. This 
is no different than a Comcast customer refusing to pay their cable 
modem bill because Comcast sent them too much traffic (i.e. the 
traffic that they requested), and then demanding that Comcast pay them 
instead. Comcast is essentially abusing it's (in many cases captive) 
customers to extort other networks into paying them if they want 
uncongested access. This is the kind of action that virtually BEGS for 
government involvement, which will probably end badly for all networks.

If there is any doubt about any of this, you can pop on over to 
lg.level3.net and look at the BGP communities Comcast is tagging on 
their Level3 transit service, preventing the routes from being exported 
to certain peers. For example, to my home cable modem:

Community: North_America Lclprf_100 Level3_Customer United_States 
Chicago2 EU_Suppress_to_Peers Suppress_to_AS174 Suppress_to_AS1239 
Suppress_to_AS1280 Suppress_to_AS1299 Suppress_to_AS1668 
Suppress_to_AS2828 Suppress_to_AS2914 Suppress_to_AS3257 
Suppress_to_AS3320 Suppress_to_AS3549 Suppress_to_AS3561 
Suppress_to_AS3786 Suppress_to_AS4637 Suppress_to_AS5511 
Suppress_to_AS6453 Suppress_to_AS6461 Suppress_to_AS6762 
Suppress_to_AS7018 Suppress_to_AS7132

-- 
Richard A Steenbergen r...@e-gerbil.net   http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)

Re: TWT - Comcast congestion

2010-11-30 Thread Richard A Steenbergen

 together and deal with each other, cutting out the middle man

Netflix is a Comcast customer too (again well established publicly and 
easily provable via the global routing table), but they don't run their 
own server infrastructure, and Comcast doesn't offer a CDN service...

The reality is that Level 3 offered Netflix a cut-throat price on CDN 
service to steal the business from Akamai, probably only made possible 
by the double dipping mentioned above. They were already in for a world 
of hurt based on their CDN infrastructure investment and the revenue 
they were able to extract from it, this certainly isn't going to help 
things. :)

-- 
Richard A Steenbergen r...@e-gerbil.net   http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)

Re: TWT - Comcast congestion

2010-11-30 Thread Richard A Steenbergen

On Tue, Nov 30, 2010 at 07:53:25PM -0800, Leo Bicknell wrote:
 
 I'm not privy to the deal, but I will point out as reported it makes no
 sense, so there is something else going on here.  This is where both
 sids are hiding the real truth.  I suspect it's one of two scenarios:
 
 - Comcast demanded a lower price from Level 3, which Level 3 has spun
   as paying Comcast a monthly fee.
 
 - Comcast said they would do settlment free peering with Level 3, in
   addition to, or in place of transit.  Level 3 is spinning the cost
   of turning this up as paying Comcast a fee.
 
 I suspect we'll not know what terms were offered for many years.

While obviously nobody is going to come out and officially acknowledge 
the exact terms on the NANOG mailing list, I'd say this is far too 
massive a leap of logic to make any kind of sense. Both Level 3 and 
Comcast seem to acknowledge that Comcast is asking for Level 3 to pay, 
is it really so hard to believe that this is the case? :)

 Yes and no.  First off, network neutrality is a vaguely defined term, 
 so I'm not going to use it.  Rather I'm going to say I think many 
 people agree there is a concept that when it comes to traffic between 
 providers there should be roughly similar terms for all players.  
 Comcast shouldn't give Netflix a sweetheart deal while making Youtube 
 pay through the nose.

Why shouldn't they? Charging different people different rates based on 
their willingness to pay is perfectly legal last I looked, and goes on 
in every industry. 

Personally I thought net neutrality was about not charging Netflix a 
special fee or else risk having their services degraded (in the same 
way that the mob makes sure nothing bad happens to your store :P), so 
they don't compete with an internal VOD service which doesn't get such 
fees applied. But obviously net neutrality is like tier 1, you can 
apply any definition you'd like. :)

  The funny part is that Level 3 was clearly ill prepared for the PR war, 
  whereas Comcast, being the first mover (if not the first PR issuer), was 
  well prepared.
 
 Really?  I just checked google news again, and the first statement I can
 find by either side was a Level 3 submission to business wire:

I believe that's what I said. To be perfectly clear, what I'm saying is:

* Comcast acted first by demanding fees
* Level 3 went public first by whining about it after they agreed to pay
* Comcast was well prepared to win the PR war, and had a large pile of 
  content that sounds good to the uninformed layperson ready to go.

  The reality is that Level 3 offered Netflix a cut-throat price on CDN 
  service to steal the business from Akamai, probably only made possible 
  by the double dipping mentioned above. They were already in for a world 
  of hurt based on their CDN infrastructure investment and the revenue 
  they were able to extract from it, this certainly isn't going to help 
  things. :)
 
 I feel you undercut your network neutrality argument right here, because
 you make an argument that this is just two competitive businesses trying
 to get a leg up on each other.  You can't have the fairness part of
 network neutrality and try and stab each other in the back at every
 step.

The net neutrality part comes from the fact that Level 3 can't just turn 
Comcast off for non-payment without risking massive impact to their 
customers. I'm pretty sure Level 3 is still allowed to charge people for 
transit services. If Comcast didn't want to buy from Level 3 they could 
have easily gone elsewhere, the part where the gov't steps in is when 
someone is abusing a monopoly/duopoly position.

 Neither Level 3 nor Comcast here are interested in the fairness of 
 network neutraility, or even interested in helping their customers. 
 They are interested in hurting their competitors and boosting their 
 own bottom line.

Probably true, but I'm sure someone somewhere (i.e. the consumers who 
have little to no choice in their home broadband) cares about the 
fairness just a little.

 I bet the cash spent on lawyers and lobbiests taking this to the FCC 
 on both sides could pay for enough backbone bandwidth and router ports 
 to make this problem go away on both sides many times over.  If they 
 really cared about the customers experience and good network 
 performance they would put away the press release swords, the various 
 VP and CxO's egos, and come up with a solution.

Do you really think Comcast cares about the $50k router ports (by their 
own accounts, though personally I'd suggest they get off the CRS-1 tippe 
if they actually wanted to save some money :P), or might they actually 
be more interested in establishing themselves as a new Tier 1? :)

At the end of the day both companies have made their share of mistakes, 
but I have a lot more respect for the ones who compete fairly and 
honestly, rather than by forcing people to use their services or else.

-- 
Richard A Steenbergen r...@e-gerbil.net   http://www.e

Re: experience with equinix exchange

2010-11-29 Thread Richard A Steenbergen

On Sun, Nov 28, 2010 at 04:09:55PM -0600, Aaron Wendel wrote:
 According to pch they don't run most of them.  I would say they run 
 very few compared to how many there actually are.

Uhh... Reality check, with the SD acquisition Equinix controls the VAST 
majority of the IX traffic in the US. The only other IX's doing anything 
even approaching interesting traffic are NOTA (in Miami), NYIIX (in New 
York), SIX (in Seattle), and the former AtlantaIX (now Telx TIE) in 
Atlanta. All are regional players, with very incomplete coverage of the 
important regions in the US, so if you're peering in the US you're 
almost guaranteed to be dealing with Equinix. Nobody else is even 
noteworthy, you can probably do more traffic than the other IX's by 
leaving a bit torrent client running overnight.

Anyone can throw a Linksys switch in their basement and call themselves 
an exchange point, but that doesn't mean anyone is going to show up and 
peer there.

-- 
Richard A Steenbergen r...@e-gerbil.net   http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)

Re: experience with equinix exchange

2010-11-29 Thread Richard A Steenbergen

On Mon, Nov 29, 2010 at 04:03:21PM -0500, Patrick W. Gilmore wrote:
 The only thing I would change is that Any2 has at least one exchange 
 with traffic (Los Angeles) and is distributed throughout the country.
 
 But the vast majority of traffic exchange over IXes in the US is over 
 Equinix/PAIX switches.  And a very large amount of traffic over 
 private interconnects is also done in their buildings.

Woops, yes I forgot Any2 (how'd that happen? :P). Like Telx they've 
recently deployed a bunch of new exchanges all over, but there is 
really only the one that does any traffic. :)

For comparison purposes:

http://www.seattleix.net/agg.htm
http://www.nyiix.net/index.php?core=statistics.php
http://tie.telx.com/usage.pl
http://www.coresite.com/peering-any2charts.php

I don't think the combined Equinix / SD numbers are published publicly 
anywhere, but I'm sure it's north of a terabit. :)

-- 
Richard A Steenbergen r...@e-gerbil.net   http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)

Re: Outage between GBLX and HE?

2010-11-17 Thread Richard A Steenbergen

On Wed, Nov 17, 2010 at 10:36:09AM -0500, Christopher J. Pilkington wrote:
 On Wed, Nov 17, 2010 at 09:55:10AM +, Paul Kelly :: Blacknight wrote:
  I may have spoken too soon... issues are on going.
 
 We were seeing routing irregularities with GBLX as well.  It seems 
 they sending out our prefix to their peers, but blackholing the 
 traffic coming back.  We've shutdown our session with AS3549 until 
 someone there answers our ticket.

Probably another LSP blackholing issue, look at the archives a few weeks 
back you'll see the same issue on GX in Seattle. As for the issue this 
morning, they have a router that has been blackholing traffic in Ashburn 
for a good long while now.

I almost put on my Global Double Crossing t-shit this morning too. :)

http://www.printfection.com/ras/Global-Double-Crossing-2-T-Shirt/_p_4935066

-- 
Richard A Steenbergen r...@e-gerbil.net   http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)

Re: flow analysis for juniper devices

2010-11-15 Thread Richard A Steenbergen

On Tue, Nov 16, 2010 at 12:33:37AM +0100, bas wrote:
 
 Shouldn't there be a (**)
 
 (**) Also Except for MX'es with trio chipsets. These can do
 inline-jflow that export to IPFIX (modified netflow v9)
 
 All of the open source collector solutions I've tried that can handle
 v9 cannot handle IPFIX from the trio cards.
 
 Richard; Do you have something that handles IPFIX?

Yes there's that too. I haven't actually gotten around to testing the 
Trio specific Netflow capabilities yet, but supposedly they only support 
IPFIX when using the built in sampling capabilities. If you want v9 
you'll still need a Multiservice DPC, or you can always stick to classic 
RE-sampled v5/v8.

IPFIX is effectively netflow v10, it's largely based off of v9, but 
it's just different enough to be incompatible. Of course it's close 
enough that it shouldn't be THAT much work if you already have an 
existing v9 parser, but I don't know what software actually supports it 
today. The only flow collector implementation which I've spent any 
amount of time looking at besides the stuff I've written myself is 
pmacct, which IMHO shows great promise, but I don't believe it supports 
IPFIX yet. For my purposes I'd have been just as happy if everyone had 
standardized on sFlow (especially since I already wrote a parser for it
:P), but alas it isn't meant to be.

Some differences between v9 and IPFIX that googling turned up:

http://www.plixer.com/blog/netflow/what-is-ipfix-vs-netflow-v9/

-- 
Richard A Steenbergen r...@e-gerbil.net   http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)

Re: flow analysis for juniper devices

2010-11-14 Thread Richard A Steenbergen

On Sun, Nov 14, 2010 at 08:59:33AM +, Paolo Lucente wrote:
 On Sat, Nov 13, 2010 at 09:17:55PM -0600, Richard A Steenbergen wrote:
 
  Oh and the sFlow on EX is actually pretty cripled when used for routing. 
  It's missing support for a bunch of important extended message tpes, and 
  doesn't fully populate all of the fields of the message types it does 
  send. For example you won't get any data on ASNs, nexthops, dest 
  ifindexes, or even netmasks of the src/dst route the flow matched, 
  making it pretty darn useless for a lot of tasks. It's functional if 
  you're just analyzing L2 networks at any rate.
 
 Agree people spend some money and hence tend to expect something in
 return. But it's also true those good souls developing free collectors
 (to stay in topic with the OP) sometimes come to the rescue: ASNs, BGP
 next-hop, routes, netmasks can be all looked up at the collector at
 pretty no major effort. Variety of methods available depending on the
 collector, in place or a posteriori, file or BGP lookup - it's matter
 of selecting what fits better the specific job.

Yes you can do an offline routing lookup to try and reconstruct some 
missing data (or do some even more interesting analysis, as described in 
http://www.nanog.org/meetings/nanog35/presentations/steenbergen.pdf), 
but it isn't always a practical solution to missing netmask, nexthop, 
and dest ifindex data.

Remember that every RIB in your network can and will have a unique best 
path selection (thanks to the EBGP  IBGP rule if nothing else), and if 
you have a network of any size at all you'll probably have to deal with 
multiple exits to the same network. Even if you were only concerned with 
analyzing external traffic, you'd still need to collect a RIB per edge 
router using an IBGP feed. In my network this would put you well over 10 
million paths, and consume several gigs of ram, not to mention the load 
of doing the routing lookups themselves. If you wanted to do traffic 
analysis inside your network you'd need a feed from every router, and 
maybe even active participation in your IGP. It CAN be done, but it's 
not pretty, and I don't think any existing free software has been tested 
under these kinds of conditions.

So when a vendor says we support sFlow, make sure they actually 
support the message types and fields you need. :)

-- 
Richard A Steenbergen r...@e-gerbil.net   http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)

Re: flow analysis for juniper devices

2010-11-13 Thread Richard A Steenbergen

On Sun, Nov 14, 2010 at 12:07:40PM +1000, Mehmet Akcin wrote:
 hey there
 
 any recommendations on freeware flow analysis tool which can show the 
 flow not only per prefix basis but also show asn and/or country/region 
 as well? Juniper only.
 
 feel free to contact on/off list.

Juniper's flow export is just like everyone else's (*), so any tool will 
do the same thing. Country/region analysis would depend on third party 
geolocation services, which have nothing to do with netflow. :)

(*) Well, except M/T/MX only support NetFlow v5/v8 in the free software 
based sampling mode, you need an expensive services card and software 
license to do v9 for some reason.

Oh and the sFlow on EX is actually pretty cripled when used for routing. 
It's missing support for a bunch of important extended message tpes, and 
doesn't fully populate all of the fields of the message types it does 
send. For example you won't get any data on ASNs, nexthops, dest 
ifindexes, or even netmasks of the src/dst route the flow matched, 
making it pretty darn useless for a lot of tasks. It's functional if 
you're just analyzing L2 networks at any rate.

-- 
Richard A Steenbergen r...@e-gerbil.net   http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)

Re: Extra latency at ATT exchange for UVerse

2010-11-11 Thread Richard A Steenbergen

, 
SBC/AS7132, and Bellsouth/AS6389, each with their own unique routing 
policies. The latency jump would be a near perfect fit for there still 
being some direct AS7132 peering sessions up, but only in Ashburn and 
not Atlanta.

If nothing else, this illustrates one key point of troubleshooting with 
traceroute. The actual output of the traceroute is often worthless 
without knowing the source and destination IPs that were being tested, 
so *ALWAYS* provide those along with your traceroutes if you want to 
ever have any hope of having your problem solved. :)

-- 
Richard A Steenbergen r...@e-gerbil.net   http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)

Re: RINA - scott whaps at the nanog hornets nest :-)

2010-11-07 Thread Richard A Steenbergen

On Sun, Nov 07, 2010 at 08:02:28AM +0100, Mans Nilsson wrote:
 
 The only reason to use (10)GE for transmission in WAN is the 
 completely baroque price difference in interface pricing. With todays 
 line rates, the components and complexity of a line card are pretty 
 much equal between SDH and GE. There is no reason to overcharge for 
 the better interface except because they (all vendors do this) can.

To be fair, there are SOME legitimate reasons for a cost difference. For 
example, ethernet has very high overhead on small packets and tops out 
at 14.8Mpps over 10GE, whereas SONET can do 7 bytes of overhead for your 
PPP/HDLC and FCS etc and easily end up doing well over 40Mpps of IP 
packets. The cost of the lookup ASIC that only has to support the 
Ethernet link is going to be a lot cheaper, or let you handle a lot more 
links on the same chip.

At this point it's only half price gouging of the silly telco customers 
with money to blow. There really are significant cost savings for the 
vendors in using the more popular and commoditized technology, even 
though it may be technically inferior. Think of it like the old IDE vs 
SCSI wars, when enough people get onboard with the cheaper interior 
technology, eventually they start shoehorning on all the features and 
functionality that you wanted from the other one in the first place. :)

-- 
Richard A Steenbergen r...@e-gerbil.net   http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)

Re: RINA - scott whaps at the nanog hornets nest :-)

2010-11-07 Thread Richard A Steenbergen

On Sun, Nov 07, 2010 at 12:34:56AM -0700, George Bonser wrote:
 
 Yes, I really don't understand that either.  You would think that the 
 investment in developing and deploying all that SONET infrastructure 
 has been paid back by now and they can lower the prices dramatically.  
 One would think the vendors would be practically giving it away, 
 particularly if people understood the potential improvement in 
 performance, though the difference between 1500 and 4000 is probably 
 not all that much except on long distance ( 2000km ) paths.

Careful, you're rapidly working your way up to nanog kook status with 
these absurd claims based on no logic whatsoever.

-- 
Richard A Steenbergen r...@e-gerbil.net   http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)

Re: RINA - scott whaps at the nanog hornets nest :-)

2010-11-06 Thread Richard A Steenbergen

 the 
mechanisms currently at our disposal. :)

-- 
Richard A Steenbergen r...@e-gerbil.net   http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)

Re: RINA - scott whaps at the nanog hornets nest :-)

2010-11-06 Thread Richard A Steenbergen

On Sat, Nov 06, 2010 at 02:21:51PM -0700, George Bonser wrote:
 
 That is not a new problem.  That is also true to today with last 
 mile links (e.g. dialup) that support 1500 byte MTU.  What is 
 different today is RFC 4821 PMTU discovery which deals with the black 
 holes.
 
 RFC 4821 PMTUD is that negotiation that is lacking.  It is there. 
 It is deployed.  It actually works.  No more relying on someone 
 sending the ICMP packets through in order for PMTUD to work!

The only thing this adds is trial-and-error probing mechanism per flow, 
to try and recover from the infinite blackholing that would occur if 
your ICMP is blocked in classic PMTUD. If this actually happened in any 
scale, it would create a performance and overhead penalty that is far 
worse than the original problem you're trying to solve.

Say you have two routers talking to each other over a L2 switched 
infrastructure (i.e. an exchange point). In order for PMTUD to function 
quickly and effectively, the two routers on each end MUST agree on the 
MTU value of the link between them. If router A thinks it is 9000, and 
router B thinks it is 8000, when router A comes along and tries to send 
a 8001 byte packet it will be silently discarded, and the only way to 
recover from this is with trial-and-error probing by the endpoints after 
they detect what they believe to be MTU blackholing. This is little more 
than a desperate ghetto hack designed to save the connection from 
complete disaster.

The point where a protocol is needed is between router A and router B, 
so they can determine the MTU of the link, without needing to involve 
the humans in a manual negotiation process. Ideally this would support 
multi-point LANs over ethernet as well, so .1 could have an MTU of 9000, 
.2 could have an MTU of 8000, etc. And of course you have to make sure 
that you can actually PASS the MTU across the wire (if the switch in the 
middle can't handle it, the packet will also be silently dropped), so 
you can't just rely on the other side to tell you what size it THINKS it 
can support. You don't have a shot in hell of having MTUs negotiated 
correctly or PMTUD work well until this is done.

 Is there any gear connected to a major IX that does NOT support large 
 frames?  I am not aware of any manufactured today.  Even cheap D-Link 
 gear supports them.  I believe you would be hard-pressed to locate 
 gear that doesn't support it at any major IX.  Granted, it might 
 require the change of a global config value and a reboot for it to 
 take effect in some vendors.
 
 http://darkwing.uoregon.edu/~joe/jumbo-clean-gear.html

If that doesn't prove my point about every vendor having their own 
definition of what # is and isn't supported, I don't know what does. 
Also, I don't know what exchanges YOU connect to, but I very clearly see 
a giant pile of gear on that list that is still in use today. :)

 As for the configuration differences between units, how does that 
 change from the way things are now?  A person configuring a Juniper 
 for 1500 byte packets already must know the difference as that quirk 
 of including the headers is just as true at 1500 bytes as it is at 
 9000 bytes.  Does the operator suddenly become less competent with 
 their gear when they use a different value?  Also, a 9000 byte MTU 
 would be a happy value that practically everyone supports these days, 
 including ethernet adaptors on host machines.

Everything defaults to 1500 today, so nobody has to do anything. Again, 
I'm actually doing this with people today on a very large network with 
lots of peers all over the world, so I have a little bit of experience 
with exactly what goes wrong. Nearly everyone who tries to figure out 
the correct MTU between vendors and with a third party network gets it 
wrong, at least some significant percentage of the time.

And honestly I can't even find an interesting number of people willing 
to turn on BFD, something with VERY clear benefits for improving failure 
detection time over an IX (for the next time Equinix decides to do one 
of their 10PM maintenances that causes hours of unreachability until 
hold timers expire :P). If the IX operators saw any significant demand 
they would have already turned it on already.

-- 
Richard A Steenbergen r...@e-gerbil.net   http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)

Re: RINA - scott whaps at the nanog hornets nest :-)

2010-11-06 Thread Richard A Steenbergen

, and every piece of gear supports it. It also doesn't 
accomplish anything, as almost no packets flowing through your SONET 
links are  1500 bytes, and if you actually tried to show up to the 
Internet with a PC and a 4474 byte MTU you'd have a bad time. 

At any rate, I'm going to stop arguing this one, as I think we've beaten 
this dead horse enough for one day. Please read what I said carefully, I 
promise you this isn't as easy as you think it is. :)

-- 
Richard A Steenbergen r...@e-gerbil.net   http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)

Re: RINA - scott whaps at the nanog hornets nest :-)

2010-11-05 Thread Richard A Steenbergen

On Fri, Nov 05, 2010 at 03:32:30PM -0700, Scott Weeks wrote:
 
 It's really quiet in here.  So, for some Friday fun let me whap at the 
 hornets nest and see what happens...  ;-)

Arguments about locator/identifier splits aside (which I happen to agree 
with), this thing goes off the deep end on page 7 when it starts talking 
about peering infrastructure. Infact pretty much every sentence on that 
page is blatantly wrong. :)

-- 
Richard A Steenbergen r...@e-gerbil.net   http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)

Re: Equinix of Candia?

2010-11-01 Thread Richard A Steenbergen

On Mon, Nov 01, 2010 at 06:31:34PM -0700, Ryan Finnesey wrote:
 Equinix only has one center within Toronto.  Is there someone with a 
 larger number of centers across the country?

I'm assuming when you say like Equinix you mean a carrier neutral colo 
where you can buy from, sell to, and interconnect with other networks in 
an interesting fashion. If you're just looking for a place to stuff some 
servers, the answer will be very different.

Canada is an odd market, with relatively little competition between 
carriers (outside of a few locations), and most of the bandwidth 
controlled by a few large incumbents. The biggest and most interesting 
facility for carrier neutral services is 151 Front in Toronto, where 
nearly every bit in the region goes. Switch and Data (now Equinix) is 
one major colo and IX operator in the building, but there are many more, 
and a building MMR. Technically this makes it more like a 111 8th than 
an Equinix. :) In Montreal there is Canix (www.canix.ca), which operates 
multiple facilities throughout the city, and is the defacto standard for 
carrier neutral colo there. This is probably the closest thing you'll 
find to an Equinix. If there is anything interesting going on in 
Vancouver I haven't heard of it, but I don't know the market well enough 
to say for certain. Everywhere else is either too small to care about on 
a national scale, or is serviced by non-neutral colos (e.g. Peer1, etc).

-- 
Richard A Steenbergen r...@e-gerbil.net   http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)

Re: ATT/L3 interconnect?

2010-10-11 Thread Richard A Steenbergen

On Mon, Oct 11, 2010 at 02:48:00PM -0400, Deepak Jain wrote:
 
  http://www.nanog.org/meetings/nanog45/presentations/Sunday/RAS_traceroute_N45.pdf
 
 I'd have thought I didn't need to provide credentials in NANOG, but 
 apparently one stays quiet too long and you're a noob.
 
 First, to those who have given me basic mpls, traceroute and ip primers 
 by off list email, thank you. It's not necessary. I appreciate your 
 willingness to help out the community.
 
 Second, I *know* that the traceroute I pasted a bit of has to do with 
 mpls magic (or similar). That's why I used the word tunnel. I wasn't 
 asking *how* it was done. I'm quiet capable of performing the same 
 magic. I just wanted to know if anyone off the top of their head knew 
 *where* the packets were magically popping back into the ether... LA, 
 Nevada, Denver. That's all. A physical location or a router IP would 
 have been a perfectly wonderful answer.

Hey Deepak,

Sorry, but they're actually right. Read the section on icmp tunneling, 
it explains exactly how and why you're seeing this behavior. :)

The return packets pop our at the end of the lsp, which is clearly 
in LA (or thereabouts, whatever lsrca is probably).

-- 
Richard A Steenbergen r...@e-gerbil.net   http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)

Re: reachability problems Europe-US?

2010-10-07 Thread Richard A Steenbergen

On Thu, Oct 07, 2010 at 07:12:33PM +0200, Thomas Schmid wrote:
 yes, I can confirm that situation is back to normal now after we 
 re-enabled the GBLX session. I heared from others that it was again a 
 broken LSP problem in GBLX (unconfirmed :) )

Global Crossing recently started deploying Foundry/Brocade XMR's in 
their MPLS core, as a lower cost alternative to their old T640/OC192 
MPLS core model. Unfortunately these boxes are buggy as all hell, and 
seem to blackhole LSPs somewhere in their network on at least a weekly 
basis. I think we've seen at least a dozen issues similar to this over 
the last couple months, though most of them were out of LA, so I didn't 
know they had actually done a Seattle deployment.

Honestly GX deserves what they get on this one. I'm not aware of any 
other large network who has ever done a serious MPLS deployment using 
these boxes (and if you're thinking of replying to this and saying hey 
we do some vll's between 2 routers and it seems to work, stop and think 
about what I might mean when I say a SERIOUS mpls deployment first :P), 
so this was pretty much to be expected. I'll also say that I'm 
remarkably underwhelmed by their response to this issue, and suggest 
that anyone who doesn't want their packets blackholed by the Floundrys 
be prepared to vote with their wallet.

-- 
Richard A Steenbergen r...@e-gerbil.net   http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)

Re: BGP next-hop

2010-09-30 Thread Richard A Steenbergen

On Thu, Sep 30, 2010 at 07:01:19AM -0700, Leo Bicknell wrote:
 I have suggested more than a few times to vendors that the command:
 
 show bgp ipv4 unicast 100.10.0.0/16 why-chosen
 
 Would be insanely useful.

Been in JUNOS show route since day one, and IMHO is easily in the top 
10 list of why I still buy Juniper instead of Cisco despite all the 
$%^*ing bugs these days.

-- 
Richard A Steenbergen r...@e-gerbil.net   http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)

Re: BGP next-hop

2010-09-30 Thread Richard A Steenbergen

On Thu, Sep 30, 2010 at 11:56:06PM +0100, Heath Jones wrote:
 
 Its interesting, I was heavy into cisco years back and then juniper 
 for a while. Going back to cisco now is great (always good for me to 
 keep my exposure up), but there is just so much unclear in it's CLI. 
 It wasn't until going back that I realised.
 
 I guess they would have to balance keeping the old timers  scripts 
 etc happy VS bringing in new features that make the output look 
 different.. Do you keep something that isn't perfect but people know 
 how to use, or change it and cause more issues than good?

Personally I still can't believe that it's the year 2010, and IOS still 
shows routes in classful notation (i.e. if it's in 192.0.0.0/3 and is a 
/24, the /24 part isn't displayed because it's assumed to be Class C). 
Of course I say that every year, and so far the only thing that has 
changed is the year I say it about.

 ps. Juniper has really gone to $h!t lately. There's a website called 
 glassdoor.com that I found - go look up what employees have to say 
 about it.. reflects exactly the support we were getting, even as as an 
 'elite' partner..

Don't get me started, I could complain for days and still not run out of 
material, but alas it doesn't accomplish anything. Sadly, many of the 
best Juniper people I know are incredibly disaffected, and are leaving 
(or have already left) in droves. I think the way I heard it put best 
was, I'm convinced that $somenewexecfromcisco is actually on a secret 5 
year mission to come over to Juniper, completely $%^* the company, and 
then go back to Cisco and get a big bonus for it. :)

-- 
Richard A Steenbergen r...@e-gerbil.net   http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)

Re: Routers in Data Centers

2010-09-26 Thread Richard A Steenbergen

On Sun, Sep 26, 2010 at 09:24:54PM -0400, Alex Rubenstein wrote:
 
 And, not to mention that some vendors do it sometimes.
 
 The 9-slot Cisco Catalyst 6509 Enhanced Vertical Switch (6509-V-E) 
 provides [stuff]. It also provides front-to-back airflow that is 
 optimized for hot and cold aisle designs in colocated data center and 
 service provider deployments and is compliant with Network Equipment 
 Building Standards (NEBS) deployments.

A classic 6509 is under 15U, a 6509-V-E is 21U. Anyone can do front to 
back airflow if they're willing to bloat the size of the chassis (in 
this case by 40%) to do all the fans and baffling, but then you'd have 
people whining about the size of the box. :)

 It only took 298 years from the inception of the 6509 to get a 
 front-to-back version. If you can do it with that oversized thing, it 
 certainly can be done on a 7200, XMR, juniper whatever, or whatever 
 else you fancy.

Well, a lot of people who buy 7200s, baby XMRs, etc, are doing it for 
the size. Lord knows I certainly bought enough 7606s instead of 6509s 
over the years for that very reason. I'm sure the vendors prefer to 
optimize the size footprint on the smaller boxes, and only do front to 
back airflow on the boxes with large thermal loads (like all the modern 
16+ slot chassis that are rapidly approaching 800W/card). Also, remember 
the 6509 has been around since its 9 slots were lucky to see 100W/card, 
which is a far cry from a box loaded with 6716s at 400W/card or other 
power hungry configs.

Remember the original XMR 32 chassis, which had side to side airflow? 
They quickly disappeared that sucker and replaced it with the much 
larger version they have today, I can only imagine how bad that was. :)

-- 
Richard A Steenbergen r...@e-gerbil.net   http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)

Re: Did your BGP crash today?

2010-08-27 Thread Richard A Steenbergen

On Fri, Aug 27, 2010 at 01:29:15PM -0400, Jared Mauch wrote:
 
 Unknown BGP attribute 99 (flags: 240)
 Unknown BGP attribute 99 (flags: 240)
 Unknown BGP attribute 99 (flags: 240)
 Unknown BGP attribute 99 (flags: 240)
 Unknown BGP attribute 99 (flags: 240)

Just out of curiosity, at what point will we as operators rise up 
against the ivory tower protocol designers at the IETF and demand that 
they add a mechanism to not bring down the entire BGP session because of 
a single malformed attribute? Did I miss the memo about the meeting? 
I'll bring the punch and pie.

-- 
Richard A Steenbergen r...@e-gerbil.net   http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)

Re: Did your BGP crash today?

2010-08-27 Thread Richard A Steenbergen

On Fri, Aug 27, 2010 at 01:43:39PM -0700, Clay Fiske wrote:
 
 If -everyone- dropped the session on a bad attribute, it likely 
 wouldn't make it far enough into the wild to cause these problems in 
 the first place.

And if everyone filtered their BGP customers there would be no routing 
leaks, but we've seen how well that works. :)

The if anything bad happens, drop the session method of protection is 
only effective if EVERY BGP implementation catches EVERY malformed 
update EVERY time, which just doesn't match up with reality. Not only 
that, but a healthy number of the bgp update issues over the years have 
actually been the result of implementations detecting perfectly valid 
things as invalid, which means by definition the implementations which 
get it right and don't drop the session act as carriers and spread the 
problem route globally. How long as we going to continue to act like 
this method of protection is actually working?

Lets be reasonable, if your basic bgp message format is malformed you're 
going to need to drop the session. If the packet is corrupted or the 
size of the message doesn't match whats in the tlv, you're not going to 
be able to continue and you'll have to drop the session. But there are 
still a huge number of potential issues where it would be perfectly safe 
to drop the update you didn't like, and support for this could easily be 
negotiated and the sending side informed of the issue by a soft 
notification extension. I have yet to see a single argument against this 
which isn't political or philosophical in nature.

-- 
Richard A Steenbergen r...@e-gerbil.net   http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)

Re: Inquiries to Acquire IPs

2010-07-03 Thread Richard A Steenbergen

On Sat, Jul 03, 2010 at 10:42:55PM +0200, Mans Nilsson wrote:
 aut-num:AS31337
 as-name:ELEET-AS
 descr:  ELEET Network
 descr:  Location: Sweden
 
 (Story is, IIRC, that adjacent number was assigned initially, but the 
 confirmation mail was answered with Can I have 31337 instead? which 
 in turn was granted. )

I tried to time it to get 6.9 from ARIN, ended up with 6.8 instead, and 
they kept 6.9 for themselves. Bastards! :)

-- 
Richard A Steenbergen r...@e-gerbil.net   http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)

Re: BGP convergence problem

2010-06-08 Thread Richard A Steenbergen

On Tue, Jun 08, 2010 at 12:22:04PM -0400, Jared Mauch wrote:
 
 The Cisco 7600 and 6500 platforms are getting fairly old and have
 underpowered cpus these days.
 
 Starting in SXH the control plane did not scale quite as well as in
 SXF.  This got better in SXI, but is not back on par with SXF
 performance yet.
 
 I mostly attribute this to a combination of bloat in software and
 routing tables.  I would start to look for a replacement sooner rather
 than later.

Place blame where blame is due, the cpu may be slow, but the crappy ios
scheduler is the real problem here. We saw a huge reduction in the
number of self-sustaining protocols timeouts cycles on these boxes
(where the process of trying to bring up a new neighbor and converge
routing uses so much cpu that it causes other neighbors to time out,
resulting in a never-ending cycle of fail until you shut down everything
and bring them up one neighbor at a time) with the move from SXF to the 
SR branches. We never really went down the SXH/SXI road, but I'd have 
assumed they would have introduced the same improvements there too. I 
guess you know what they say about assuming. :)

Try the usual suspects:

* Configure process-max-time 20 at the top level, this improves 
interactivity by making the scheduler switch processes more often.

* Make sure you don't have an overly aggressive control-plane policer. 
In my experience the COPP rate-limits are quite harsh, and if you end up 
bumping against them you don't get a graceful slowing of the exchange of 
routes, you get protocol timeouts.

* Make sure you don't have any stupid mls rate-limits, such as cef 
receive. I don't know why anyone would ever want to configure this, all 
it does is make your box fall over faster (as if these things need any 
help) by rate-limiting all traffic to the msfc.

* You might want to try something like scheduler allocate 400 4000,
which gives the vast majority of the cpu time to the control plane
rather than process switching on the data plane (which in theory
shouldn't happen on an entirely hw forwarded box like 6500/7600, though 
of course we all know that isn't true :P).

Oh and also the OP should take this to the cisco-nsp mailing list, where 
all the good bitching about broken Crisco routers takes place. :)

-- 
Richard A Steenbergen r...@e-gerbil.net   http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)

Re: Junos Asymmetric Routing

2010-05-31 Thread Richard A Steenbergen

On Sun, May 30, 2010 at 10:16:14AM -0700, Kevin Oberman wrote:
 
 I remember a posting to this list back in the late 90s from Tony Li,
 who knows a bit about BGP. He urged that multi-hop BGP never be used
 and pointed out that it had not been intended for use except as a test
 tool, not a production one and should have been stripped from IOS
 before it was shipped.
 
 While there are a few good cases for using it, it is generally a bad,
 bad idea. And this thread demonstrates that he had reason for the
 warning

I think you guys may be getting a tad carried away with the crusade
against multihop BGP. The only thing you're really giving up when you
use it is liveness detection, which as we all know BGP is actually
pretty terrible at implementing anyways (hows that 180 sec IOS default
working out for you?). There are much better mechanisms out there, like
BFD, which could be used to provide better liveness detection to BGP
through nexthop invalidation. 

I'm not saying everyone should run out and do all their peering over
multihop EBGP without carefully considering a replacement for the
liveness detection component, I just hate it when people get religious
about such a simple concept for no good reason (well, other than Randy
Bush getting to do his best Andy Rooney impersonation :P). Multihop BGP
is no more evil than anything else we do with the Internet, and the fact
that we've all managed to use it successfully for IBGP proves that it
can work out just fine. There are some pretty interesting things you can
accomplish as far as large scale traffic engineering if you can free
yourself from the requirement of speaking EBGP with a directly connected
neighbor, processed by whatever slow overpriced router CPU could be
stuffed into that box. Again, I just hate to see the concepts dismissed
out of hand because of some old BGP ideology about a problem that can be
addressed any number of other ways.

-- 
Richard A Steenbergen r...@e-gerbil.net   http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)

Re: BGP and convergence time

2010-05-12 Thread Richard A Steenbergen

On Wed, May 12, 2010 at 09:52:48AM -0600, Danny McPherson wrote:
 
 The holdtime isn't technically negotiated, both sides convey their
 value in the open message and the lower of the two is used by both BGP
 speakers.  IIRC, neither J or C reset the session with the timer
 change, but the new holdtimer expiry value doesn't take effect until
 then.

Rest assured J will always reset the session if given given half a
chance, and changing your holdtime is more than half. :)

One thing I find interesting is that most other protocols will err on
the side of caution and use the higher of two values like this when
negotiating between two parties, but BGP does the opposite. I still run
into bad bgp implementations which can't keep up with my 30 sec hold
timers all the time *coughghettoequinixrouteservercough*.

-- 
Richard A Steenbergen r...@e-gerbil.net   http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)

Re: SFP+ ER and ZR

2010-05-11 Thread Richard A Steenbergen

On Tue, May 11, 2010 at 04:24:42AM -0400, bas wrote:
 Hi Guys,
 
 I thought ER and ZR SFP+ optics were not available (yet) due to power
 and cooling challenges.
 
 However on this site: http://www.excelight.com/products/datalink/sfpplus.asp
 They offer both ER and ZR SFP+ optics.
 
 Has anyone used or tested with these? If so with which equipment?
 Or have you found other vendors of these optics?

They aren't (yet), these are vaporware. Many amnufacturers are close to 
having reliable 40km optics, and several are making 20km+ overpowered 
LRs, but ZR and DWDM are still a ways out. There are also some CWDM 
units in the works, but because SFP+ doesn't support onboard EDC you 
are limited by dispersion to 10km in the traditional 8ch 1470-1610nm 
CWDM space over SMF28.

-- 
Richard A Steenbergen r...@e-gerbil.net   http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)

Re: BGP and convergence time

2010-05-11 Thread Richard A Steenbergen

On Tue, May 11, 2010 at 09:31:51PM -0400, Jay Nakamura wrote:
 Yes, I understand BFD.  The question is, do carriers usually do BFD
 with customers?  And if they say no, are there other remedies?  ATT
 doesn't seem to be even willing to change BGP timers.  If anyone have
 been able to talk ATT or Qwest in doing so, it would really help to
 find out how they convinced them.  They are such a big bureaucracies
 that it's frustrating to do anything that makes sense.  Although Qwest
 seems a lot more responsive than ATT.

Slow as the titanic carriers won't do anything innovative for anyone,
regardless of the benefit. Try a clueful carrier and they'll be happy to
run BFD with you. Of course after promoting it for more than a year now
we have something like 5 peers and 0 customers using it (mostly because
of broken vendor implementations), but hey it's never too late to start.
:)

-- 
Richard A Steenbergen r...@e-gerbil.net   http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)

Re: Juniper firewalls - SSG or SRX

2010-04-20 Thread Richard A Steenbergen

On Tue, Apr 20, 2010 at 04:18:11AM -0700, Owen DeLong wrote:
 
 Interesting. My SRXes have been rock solid since upgrading to
 10.0R1.8.

Not so much here. My basement SRX210 starts dropping bgp sessions over
an IPSEC tunnel every 30 secs or so after around 1-1.5 days of uptime,
and won't stop until you restart rpd (which buys you another day or so
of functioning bgp). And about 1 out of every 4 times you do restart
rpd, dhcpd will spin at 100% cpu until you restart that too. Even
10.1S1.3 doesn't help these issues. It's a nice box in theory, and it
has lots of potential, but lots and lots of unresolved bugs too. I knew
things were off to a bad start when I tried to downgrade from the 10.0R1
that shipped with the box to 9.6 after my first round of issues, and it
crashed in the middle of the installer, wiping the config in the process
and requiring a tftp boot of new code to recover. :)

-- 
Richard A Steenbergen r...@e-gerbil.net   http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)

Re: what about 48 bits?

2010-04-04 Thread Richard A Steenbergen

On Sun, Apr 04, 2010 at 11:53:54AM -0300, A.B. Jr. wrote:
 Hi,
 
 Lots of traffic recently about 64 bits being too short or too long.
 
 What about mac addresses? Aren't they close to exhaustion? Should be.
 Or it is assumed that mac addresses are being widely reused throughout
 the world? All those low cost switches and wifi adapters DO use unique
 mac addresses?

http://en.wikipedia.org/wiki/MAC_address

The IEEE expects the MAC-48 space to be exhausted no sooner than the 
year 2100[3]; EUI-64s are not expected to run out in the foreseeable 
future.

-- 
Richard A Steenbergen r...@e-gerbil.net   http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)

Re: what about 48 bits?

2010-04-04 Thread Richard A Steenbergen

On Mon, Apr 05, 2010 at 10:57:46AM +0930, Mark Smith wrote:
 
 Has anybody considered lobbying the IEEE to do a point to point version
 of Ethernet to gets rid of addressing fields? Assuming an average 1024
 byte packet size, on a 10Gbps link they're wasting 100+ Mbps. 100GE /
 1TE starts to make it even more worth doing.

If you're lobbying to have the IEEE do something intelligent to Ethernet
why don't you start with a freaking standardization of jumbo frames. The
lack of a real standard and any type of negotiation protocol for two
devices under different administrative control are all but guaranteeing
end to end jumbo frame support will never be practical.

-- 
Richard A Steenbergen r...@e-gerbil.net   http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)

Re: CSIRT - Backbone Security : Runtime Monitoring and DynamicReconfiguration for Intrusion Detection Systems

2010-03-17 Thread Richard A Steenbergen

On Thu, Mar 18, 2010 at 12:18:40AM +, char...@www.knownelement.com wrote:
 Mods,
 
 Can we get the spam off the list? Its getting old. 

FYI this guy has been spamming individuals and PeeringDB contacts for a
couple months now too.

-- 
Richard A Steenbergen r...@e-gerbil.net   http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)

Re: YouTube AS36561 began announcing 1.0.0.0/8

2010-03-12 Thread Richard A Steenbergen

On Fri, Mar 12, 2010 at 07:34:10AM -0500, Patrick W. Gilmore wrote:
 Oh, I understand what's going on exactly.  YouTube is trying to
 balance their ratios. :)

That might explain why they're only announcing it behind Cogent. :)

-- 
Richard A Steenbergen r...@e-gerbil.net   http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)

Re: Linux Router distro's with dual stack capability

2010-02-11 Thread Richard A Steenbergen

On Thu, Feb 11, 2010 at 03:46:13PM -0800, Kevin Oberman wrote:
 Polling is excellent for low speed lines, but for Gig and faster, most
 newer interfaces support interrupt coalescing. This easily resolves the
 issue in hardware as interrupts are only issued when needed but limited
 to a reasonable rate, Polling does not use interrupts, but consumes
 system resources regardless of traffic.
 
 FreeBSD has supported polling for a long time (V6?) and interrupt
 coalescing since some release of V7. (Latest release is V8.)

I'm pretty sure it's been around for a lot longer than that. I seem to
recall playing with both back in 4.x. Of course interrupt coalescing is
mostly a function of the NIC (though some driver involvement is required
to take advantage of it), so the quality of the implementations have
varied significantly over the years. The first generation GE NICs which
offered it didn't do a particularly good job with it though, so for
example it was still possible to cripple a box with high interrupt
rates while the same box would be perfectly fine with polling.

That said, I think your use case for polling is backwards. As you say, 
normally the NIC fires off an interrupt every time a packet is 
received, and the kernel stops what it is doing to process the new 
packet. On a low speed (or at least low traffic) interface this isn't a 
problem, but as the packet/sec rate increases the amount of time wasted 
as interrupt processing overhead becomes significant. For example, 
even a GE interface is capable of doing 1.488 million packets/sec.

By switching to a polling based model, you switch off the interrupt 
generation completely and simply check the NIC for new packets a set 
rate (for example, 1000 times/sec). This gives you a predictable and 
consistent CPU use, so even if you had 1.488M/s interrupts coming in you 
would still only be checking 1000 times/sec. If you did less than 
1000pps it would be a net increase in CPU, but if you do more (or ever 
risk doing more, such as during a DoS attack) it could be a net benefit. 
This is makes the most sense for people doing a lot of traffic 
regardless.

Of course the downside is higher latency, since you're delaying the 
processing of the packet by some amount of time after it comes in. In 
the 1000 times/sec example above, you could be delaying processing of 
your packet by up to 1ms. For most applications this isn't enough to 
cause any harm, but it's something to keep in mind. Interrupt coalescing 
works around the problem of large interrupt rates by simply having the 
NIC limit the number of interrupts it generates under load, giving you 
the benefits of low-latency processing and low-interrupt rate under high 
load. I haven't played with this stuff in many many years, so I'm sure 
modern interrupt coalescing is much better than it used to be, and the 
extra work of configuring polling and dealing with the potential 
latency/jitter implications isn't worth the benefits for most people. :)

-- 
Richard A Steenbergen r...@e-gerbil.net   http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)

Re: [NANOG] Contacts @ China Unicom and China Telecom

2010-02-03 Thread Richard A Steenbergen

On Wed, Feb 03, 2010 at 11:40:38AM -0800, Justin Ream wrote:
 Hi All -
 
 Does anyone have peering contacts for China Unicom and China Telecom?
 Finding that the ones for Any2 in peeringdb.com are no good.  Will
 take replies offlist, thanks!

Last I checked the China Telecom e-mails listed worked fine, but the
China Unicom/China Netcom addresses have all bounced for at least a
couple of years now. I've personally tried every possible combination
and permutation of every address listed, including the e-mail address
that was used to register the PeeringDB account, and none of them work.

-- 
Richard A Steenbergen r...@e-gerbil.net   http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)

Re: Using /126 for IPv6 router links

2010-01-25 Thread Richard A Steenbergen

On Mon, Jan 25, 2010 at 09:12:49AM +, Andy Davidson wrote:
 There are 4,294,967,296 /64s in my own /32 allocation.  If we only ever
 use 2000::/3 on the internet, I make that 2,305,843,009,213,693,952
 /64s.  This is enough to fill over seven Lake Eries.  The total amount
 of ipv6 address space is exponentially larger still - I have just looked
 at 2000::/3 in these maths.
 
 THE IPv6 ADDRESS SPACE IS VERY, VERY, VERY BIG.

Don't get carried away with all of that IPv6 is huge math, it quickly
deteriorates when you start digging into it. Auto-configuration reduces
it from 340282366920938463463374607431768211456 to 18446744073709551616
(that's 0.05% of the original 128 bit space). Now as an
end user you might get anything ranging from a /56 to a /64, that's only
between 1 - 256 IPs, barely a significant increase at all if you were to
actually use a /64 for each routed IP rather than as each routed subnet.
As a small network you might get a /48, so that even if you gave out
/64s to everyone it would be only 16 bits of space for you (the
equivalent of getting a class B back in IPv4 land), something like a
8-16 bit improvement over what a similar sized network would have gotten
in IPv4.  As a bigger ISP you might get a /32, but it's the same thing,
only 16 bits of space when you have to give out /48s. All we've really
done is buy ourselves an 8 to 16 bit improvement at every level of
allocation space (and a lot of prefix bloat for when we start using more
than 2000::/3), which is a FAR cry from the 2^128 omg big number, we
can give every molecule an IPv6 address math of the popular
imagination. :)

-- 
Richard A Steenbergen r...@e-gerbil.net   http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)

Re: Using /126 for IPv6 router links

2010-01-25 Thread Richard A Steenbergen

On Mon, Jan 25, 2010 at 09:10:11AM -0500, TJ wrote:
 While I agree with parts of what you are saying - that using the simple
 2^128 math can be misleading, let's be clear on a few things:
 *) 2^61 is still very, very big.  That is the number of IPv6 network
 segments available within 2000::/3.  
 *) An end-user should get something between a /48 and a /56, _maybe_ as low
 as a /60 ... hopefully never a /64.  Really.
 **) Let's call the /48s enterprise assignments, and the /56s home
 assignments ... ?
 **) And your /56 to /64 is NOT 1-256 IPs, it is 1-256 segments.

It is if we are to follow the always use a /64 as a single IP 
guidelines. Not that I'm encouraging this, I'm just saying this is what 
we're told to do with the space. I for one have this little protocol 
called DHCP that does IP assignments along with a bunch of other things 
that I need anyways, so I'm more than happy to take a single /64 for 
house as a single lan segment (well, never minding the fact that my 
house has a /48).

 **) And, using the expected /48-/56, the numbers are really 256-64k subnets.
...
 Note: All we've really done is buy ourselves an 8 to 16 bit improvement at
 every level of allocation space
 *) And you don't think 8-16 bits _AT EVERY LEVEL_ is a bit deal??

I'm not saying that 8-16 bits isn't an improvement, but it's a far cry
from the bazillions of numbers everyone makes IPv6 out to be. By the
time you figure in the overhead of autoconfiguration, restrictive
initial deployments, and the now that the space is much bigger, we
should be reallocating bigger blocks logic at every layer of
redistribution, that is what you're left with. So far all we've really
done with v6 is created a flashback to the days when every end user
could get a /24 just by asking, every enterprise could get a /16 just by
asking, and every big network could get a /8 just by asking, just bit 
shifted a little bit. That's all well and good, but it isn't a 
bazillion. :)

-- 
Richard A Steenbergen r...@e-gerbil.net   http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)

Re: Foundry CLI manual?

2010-01-23 Thread Richard A Steenbergen

On Sat, Jan 23, 2010 at 10:51:57AM -0500, David Hubbard wrote:
 Anyone have the Foundry/Brocade CLI reference PDF
 they could send me?  Brocade feels you should have a
 support contract to have a list of commands the 
 hardware you purchased offers and I'm having difficulty
 with a oc12 pos module.

Ironically enough the manuals themselves are accessable without a login,
but the list of manuals is not. You fail to mention which product you're
interested in, so I'm going to take a stab and hope that it's something
current with a pos card like an MLX/XMR. If you're still rocking an old
B2P622, I'd say you're in need of far more help than any manual can
provide. :)

http://www.foundrynet.com/services/documentation/xmr_user/current/NetIron_04100_ConfigGuide.pdf
http://www.foundrynet.com/services/documentation/xmr_diag/current/NetIronXMR-MLX_04100_DiagnosticRef.pdf

-- 
Richard A Steenbergen r...@e-gerbil.net   http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)

Re: Enhancing automation with network growth

2010-01-20 Thread Richard A Steenbergen

On Wed, Jan 20, 2010 at 09:54:50PM -0500, Steve Bertrand wrote:
 Hi all,
 
 I'm reaching the point where adding in a new piece of infrastructure
 hardware, connecting up a new cable, and/or assigning address space to
 a client is nearly 50% documentation and 50% technical.
 
 One thing that would take a major load off would be if my MRTG system
 could simply update its config/index files for itself, instead of me
 having to do it on each and every port change.

It is really quite trivial to auto-discover ifindex-ifdescr mappings on
every poll cycle then track your interfaces by their names, pretty much
every modern poller system can manage this. MRTG is absurdly old, slow,
and generally nasty, and should not be used by anyone in this day and 
age.

-- 
Richard A Steenbergen r...@e-gerbil.net   http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)

Re: dark fiber and sfp distance limitations

2010-01-01 Thread Richard A Steenbergen

On Fri, Jan 01, 2010 at 02:52:33PM -0800, Mike wrote:
 I am looking at the possibility of leasing a ~70 mile run of fiber. I 
 don't have access to any mid point section for regeneration purposes, 
 and so I am wondering what the chances that a 120km rated SFP would be 
 able to light the path and provide stable connectivity. There are a lot 
 of unknowns including # of splices, condition of the cable, or the 
 actual dispersion index or other properties (until we actually get 
 closer to leasing it). Its spare telco fibers in the same cable binder 
 they are using interoffice transport, but there are regen huts along the 
 way so it works for them but may not for us, and 'finding out' is 
 potentially expensive. How would someone experienced go about 
 determining the feasibillity of this concept and what options might 
 there be? Replies online or off would be appreciated.

That shouldn't be too difficult, especially at only 1G (though pesonally
I can't imagine why you would bother leasing dark fiber for that :P). 
There are several ways you could do it, including 120km+ rated SFPs
(iirc there have been 200km SFPs out for a while too), an external
optical amplifier (ideally you'd want to amp in the middle, but with a
single channel you should be fine w/pre-amp), and a digital FEC wrapper
to extend the receive sensitivity. Remember that the distance spec on
optics is mostly a rough guideline, so depending on the fiber conditions
and number of splices/panels along the way you could potentially expect
to get the entire distance out of a standard 100km optic.

-- 
Richard A Steenbergen r...@e-gerbil.net   http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)

Re: UltraDNS Failure?

2009-12-23 Thread Richard A Steenbergen

On Wed, Dec 23, 2009 at 05:38:21PM -0800, Shrdlu wrote:
 I'm still seeing the DNS servers at udns down, hard. Amazon's cloud will 
 need a reboot when this is over. Dang, what the heck happened to all 
 that anycast stuff?

We have some DNS providing type customers (not UltraDNS) receiving a few
million packets/sec of UDP/53 DoS traffic, starting at about the same
time as the UltraDNS problems. No clue if it's related, but it certainly
sounds suspicious. :)

-- 
Richard A Steenbergen r...@e-gerbil.net   http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)

Re: fight club :) richard bennett vs various nanogers, on paid peering

2009-11-25 Thread Richard A Steenbergen

 on the
other side for the traffic, thus allowing you to double dip for the
same bit and potentially make more money. 

Of course in practice it doesn't work this way at all. The vast majority 
of the cost of operating a network is transporting the bits from one 
place to another, and when you sell paid peering you are guaranteed that 
the traffic is going to stay on your network and be hauled. This makes 
it some of the most expensive traffic to deliver, and typically results 
in prices which are higher than those of another network who is hot 
potatoing those bits off their network in one location, and who is 
sending the traffic to a settlement-free peer. There is nothing wrong 
with paid peering, it often has a time and a place (such as when two 
networks are close to being settlement-free peers, but not quite, and 
someone needs to sweeten the deal a little bit), but it is not the 
panacea you think it is. Of course nobody else seems to think the FCC 
Question 106 is talking about regulating paid peering (which would be 
absurd), so fortunately I don't think we have anything to worry about.

Of course all of these points (and more) were already quite elegantly
expressed by fine folks like Vijay Gill, Dan Golding, Patrick Gilmore,
Joe Provo, and others. They tried to help correct your misinformation
with free advice, and you repaid them with delusional rants. Now you
simply look like a fool to everyone.

-- 
Richard A Steenbergen r...@e-gerbil.net   http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)

Re: fight club :) richard bennett vs various nanogers, on paid peering

2009-11-25 Thread Richard A Steenbergen

On Wed, Nov 25, 2009 at 02:29:33PM -0800, Richard Bennett wrote:
 (pardon me if this message is not formatted correctly, T-bird doesn't 
 like this list)
 
 I agree that this is not the proper venue for discussion of the
 politics of Internet regulation; the post I wrote for GigaOm has
 comments enabled, and many people with an anti-capitalist bone to pick
 have already availed themselves of that forum to advocate for the
 people's revolution. There are some technical issues that might be of
 more interest and relevance to operators, however.

So now anyone who points out the massive flaws in your statements are
part of an anti-capitalist movement? Any more conspiracy theories you'd
like to put forward? I can't speak for anyone else, but personally I
consider myself very pro-capitalism and it has absolutely no impact on
how I feel about the blatantly wrong and baseless crap you are spewing.

 * One claim I made in my blog post is that traffic increases on the
 Internet aren't measured by MINTS very well. MINTS uses data from
 Meet-me switches, but IX's and colos are pulling x-connects like mad
 so more and more traffic is passing directly through the x-connects
 and therefore not being captured by MINTS. Rate of traffic increase is
 important for regulators as it relates to the cost of running an ISP
 and the need for traffic shaping. Seems to me that MINTS understates
 traffic growth, and people are dealing with it by lighting more dark
 fiber, pulling more fiber, and the x-connects are the tip of the
 iceberg that says this is going on.

This is all completely irrelevent to everything else that has been
discussed so far, but what the hell I'll bite. Traffic on the Internet
is indeed growing rapidly, while the predominate technology for cost
effectively interconnecting the vast majority of the bits (10 Gigabit
Ethernet) has remained relatively static in recent years. Without a cost
effective technology for interconnecting devices in  10Gbps increments
(40Gbps OC-768 has existed for a while, but is far more expensive than
simply doing 4x10GbE), the only reasonable way to scale a network is to
build your links out of Nx10G bundles. In places with reasonable
crossconnect pricing, it is far cheaper to simply order multiple
crossconnects than it is to pay for DWDM gear, and thus you see a rapid
increase in fiber crossconnects.

 * A number of people said I have no basis for the claim that paid
 peering is on the increase, and it's true that the empirical data is
 slim due to the secretive nature of peering and transit agreements.
 This claim is based on hearsay and on the observation that Comcast now
 has a nationwide network and a very open policy regarding peering and
 paid peering. So if paid peering is only increasing at Comcast, now a
 top 10 network, it's increasing overall.

So in other words, you're admitting that you have absolutely no basis
for your claim, and you're simply making it up based on indirect hearsay 
modified with your own ill-informed conclusions? First intelligent thing 
you've said so far.

If you actually bothered to ask anyone in the industry with experience 
dealing with Comcast, they would tell you that while Comcast initially 
entered the market primarily trying to sell paid peering, they have 
since switched their efforts to primarily selling full transit. There 
are only a certain number of networks who even know what to DO with a 
paid peering product, and a vastly larger number who know what to do 
with a transit product, so it makes perfect sense really.

 * Some other people said I'm not entitled to have an opinion; so much
 for democracy and free speech.

You are not entitled to opine an opinion on a subject matter which you
do not understand, without being called out for it. Sane and rational
people understand when they are talking out their ass and are being
corrected by knowledgable experts, and will shut the hell up and listen.
Sadly this seems to be a skill you lack.

 I'd be glad to hear from anyone who has data or informed opinions on
 these subjects, on-list of off-. The reason you should share is that
 people in Washington and Brussels listen to me, so it's in everybody's
 interest for me to be well-informed; I don't really have an ax to
 grind one way or another, but I do want law and regulation to be based
 on fact, not speculation and ideology.

So far none of the above statements seem to be true.

-- 
Richard A Steenbergen r...@e-gerbil.net   http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)

Re: fight club :) richard bennett vs various nanogers, on paid peering

2009-11-25 Thread Richard A Steenbergen

On Wed, Nov 25, 2009 at 02:29:33PM -0800, Richard Bennett wrote:
 * One claim I made in my blog post is that traffic increases on the 
 Internet aren't measured by MINTS very well. MINTS uses data from 
 Meet-me switches, but IX's and colos are pulling x-connects like mad so 
 more and more traffic is passing directly through the x-connects and 
 therefore not being captured by MINTS. Rate of traffic increase is 
 important for regulators as it relates to the cost of running an ISP and 
 the need for traffic shaping. Seems to me that MINTS understates traffic 
 growth, and people are dealing with it by lighting more dark fiber, 
 pulling more fiber, and the x-connects are the tip of the iceberg that 
 says this is going on.

Oh also I forgot to mention that trying to map a direct relationship
between IX traffic growth and total IP traffic growth is completely
bogus. There is a significant modifier you're missing, and it's called
price. Two years ago the price for an IX port at the large commercial
exchange points in the US (which account for the vast majority of the
traffic, no offense to the small non-comercial exchanges out there) was
between 4-7x higher than the price for the same ports today. The reason
for the price drop had nothing to do with changing economics of
providing the service, but rather it was because of a wide-spread price
war between the two largest IX operators in the US. Such a massive
change in the economics for the IP network operators will obviously
result in major changes to the amount of traffic delivered over IX
fabrics vs private interconnection. Again, something you could have
actually asked operators about rather than making up conclusons in your
head.

-- 
Richard A Steenbergen r...@e-gerbil.net   http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)

Re: Juniper M120 Alternatives

2009-11-17 Thread Richard A Steenbergen

On Tue, Nov 17, 2009 at 09:24:24AM -0600, Jack Bates wrote:
 Richard A Steenbergen wrote:
 They've definitely been improving it over the years though, so much that
 I almost never trigger a session reset on me unintentionally any more. 
 
 They must have. This was new to me and came as a shock. I don't think 
 I've ever seen my m120 behave any different than my cisco when it comes 
 to flapping BGP. Things have just worked as I expected them to. Not that 
 I go screwing with underlying interface configs or changing a peer from 
 one group to another or changing the asn; at least not on a live 
 session. These things would seem to indicate that the session might be 
 subject to reset.
 
 Perhaps it just behaves for normal users and not power users. :)

But those things won't trigger session resets on Cisco, so it often comes
as a shock. Also, one might very well expect that changing the peer-as on
a neighbor is going to cause a reset, but if you didn't know from
experience you might not expect that renaming a group or changing an
underlying interface MTU would do it too.

The issue is that there is a fundamental design difference between Cisco
and Juniper. Cisco lets you configure anything you want in a line by line
basis, but it doesn't immediately apply those changes until you command
it to do so. Juniper's philosophy is that you make a bunch of changes to
a candiate configuration, commit to apply those changes, and then you
can expect those changes to take effect (or at least begin trying to take
effect) immediately.

Personally I think the Juniper design philosophy is better. Besides the
obvious stuff like being able to rollback your config, think about how
non-deterministic it is when you update a route-map but forget to soft
clear the BGP session. The routes that have been exchanged so far will
retain their old policy, while any new updates you receive after the
route-map change will receive the new policy, leaving the session in an
inconsistent state that will slowly and unpredictably change over time as
routing updates come in. The trade-off is that you lose the ability to do
non-impacting changes, where you make a change but know that it hasn't
actually taken effect yet, and won't until the next time the session
bounces. What Juniper is trying to do really is a good thing, I just wish 
it could tell me before I commit what is and isn't going to flap. :)

-- 
Richard A Steenbergen r...@e-gerbil.net   http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)

Re: Juniper M120 Alternatives

2009-11-16 Thread Richard A Steenbergen

On Tue, Nov 17, 2009 at 01:28:06AM +0100, Daniel Roesen wrote:
 PS: and of course JUNOS still undeterministically resetting unrelated
 BGP sessions for no good reason when modifying BGP configuration - so
 one is well-advised to do ANY configuration changes in the area of BGP
 within a maint window as it might happen that you configure a peering
 session and whoops there goes your IBGP mesh... or all your other
 peerings, or, ...

Well to be fair, the session resetting on config change behavior is
actually quite deterministic (being EASY to determine is not part of the
requirements, technically speaking :P), and most of the resets really do
have perfectly good reasons. I'll certainly go with really annoying and
often a giant pain in the @#$%^* though.

They've definitely been improving it over the years though, so much that
I almost never trigger a session reset on me unintentionally any more. 
The things to watch out for are:

a) any time you change the update replication by moving a neighbor
between groups, renaming groups, or significantly changing the export
policy chain.

b) any time you change a major part of the underlying interface
configuration for an eBGP session, such as mtu or vlan tagging config.

c) any time you change something about the bgp session which really does
require a session reset to take effect, such as a new ASN, new endpoint
address, new mbgp family configuration, new md5 password, new tcp mss,
etc.

You can actually safeguard yourself from a lot of the accidental reset
behaviors while implementing other features at the same time by using
commit scripts (i.e. as a side-effect of my scripts which exist for
other reasons, I automatically protect myself against changes to the
policy chain or family configuration which might cause unintended
session flaps), though I'll certainly admit this is well into the
category of power user and not appropriate for most people. They are
making some progress though, you can actually turn NSR on and off now
without flapping your sessions, which is certainly an improvement over
the serious logic flaws in earlier versions (where you couldn't turn off
NSR without flapping every session, but you also couldn't commit w/NSR
enabled and the backup RE offline, effectively locking you out of config
changes without a total box flap if you didn't have both RE's running).

It would certainly be a lot more user friendly if they could tell you 
what sessions would be reset as part of a commit check process though.

-- 
Richard A Steenbergen r...@e-gerbil.net   http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)

Re: Resilience - How many BGP providers

2009-11-12 Thread Richard A Steenbergen

On Wed, Nov 11, 2009 at 11:18:20AM -0800, Steve Gibbard wrote:
 If you have three components, the chances of all three being broken at
 once are even less than the chances of two of them being broken at
 once.  With four, you're even safer, and so on and so forth.  But once
 you get beyond two, you hit a point of diminishing returns pretty
 quickly.

Not only that, but you have to ask yourself what are the chances that
all these extra components will become extra points of failure and
actually increase the likelihood of something going wrong. I know a lot
of folks who have gotten themselves into a lot of trouble buying transit
from everyone they can possibly buy from, thinking it will make their
network more reliable. In most cases all it does is make their network
more unstable. The more transit paths you have out there, the more
likely you are to have something flap and wipe you out w/flap dampening,
and the more likely you are to see any single event cause a massive
amount of churn. I've seen people with 8 transit providers appear to
others on the internet as though they flapped 100+ times over a single
session flap, because of all the churn as the network reconverges. More
transit providers also means more 95th calculations, and thus a higher
bill, but that is another story for another day. :)

-- 
Richard A Steenbergen r...@e-gerbil.net   http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)

Re: Upstream BGP community support

2009-11-05 Thread Richard A Steenbergen

On Fri, Nov 06, 2009 at 12:04:18AM +0100, Daniel Roesen wrote:
 On Mon, Nov 02, 2009 at 02:13:38PM -0600, Richard A Steenbergen wrote:
  Rather than simply double the size and break it
  up into 32:32, the designers reserved the top 16 bits for type and
  subtype attributes, leaving you only 48 bits to work with. Clearly the
  only suitable mapping for support of 32-bit ASNs on the Internet is
  32:16, leaving us with exactly the same range of data values that we
  have today.
 
 ... which breaks schemes such as
 
 65123:45678
 
 where 45678 is the neighboring AS to apply the action defined by
 65123 to. Seen those multiple times.
 
 Of course using anything else then your own ASN in the AS part of
 TE communities is certainly debatable.

Definitely a problem. The point of using 65123:45678 in the first place
(with a private ASN field in the AS part) is to avoid stepping on
anyone else's ASN with your internal use community. Clearly we won't be
able to continue implementing this pattern AND fully support 32 bit
ASNs, so the type field is going to have to come to the rescue here. 

Fortunately there is a transitive bit on the extended community type
that could be used to signal a behavior to your upstream network without
allowing that community to leak any further, so in theory one could use
something like that to do a localtarget:45678:actiondata tag without
poluting the namespace. Uou would lose the ability to send a community
to your upstream's upstream, but that is probably of questionable
legitimacy anyhow. But the way I read RFC5668 and the IANA extended
community registry it doesn't look like there is an explicit definition
of a non-transitive target type, and the way I read RFC4360 it doesn't
look like the non-transitive value gets automatically reserved. So I
guess someone will need to request 0x4002 and 0x4202 non-transitive
target types for this purpose. Unless someone has a better idea of how
to handle the problem stated above?

-- 
Richard A Steenbergen r...@e-gerbil.net   http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)

Re: Upstream BGP community support

2009-11-02 Thread Richard A Steenbergen

On Mon, Nov 02, 2009 at 05:19:32AM -0500, Randy Bush wrote:
 i try to use as few tricks, knobs, and clever things as possible and
 still get my job done.  i try to be extremely conscious of, and minimal,
 when what i am doing effects or is visible to my neighbors and/or the
 global net.  
 
 i try to complicate the internals of my network as little as possible,
 after all, complexity == opex and i value my time, it is a non-renewable
 resource.
 
 i prefer to be seen as an old and lazy minimalist, not a clever person.
 clever was a pejorative where/when i was brought up.

Translation:

randybush You damn kids! Get off my lawn!

But seriously now, the reason we have these squishy things taking up
space between our ears in the first place is so we can come up with new
ideas and better ways to solve our problems. Obviously you can take it
too far, I'm sure we've all seen examples of absurdly over-complicated
designs which existed only for the edification of someone's ego, but I
have never seen a intelligent and well thought out BGP community system
do anything other than make everyone's life easier. I don't know who
these people are who you claim are busy breaking things with
communities, but I've never seen them. Being clever is a good thing when
it helps you find new ways to do more with less, and those who can't
innovate in this industry are dooming themselves to obsolescence.

-- 
Richard A Steenbergen r...@e-gerbil.net   http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)

Re: Upstream BGP community support

2009-11-02 Thread Richard A Steenbergen

On Mon, Nov 02, 2009 at 01:38:00PM -0600, Jack Bates wrote:
 Communities (except the standardized well known ones) are extremely 
 diverse. For those that support even more granular traffic engineering 
 by limiting which of their peers your routes might be transiting, I 
 believe there are 2 distinct methods of using communities.

Even the standardized ones aren't guaranteed to be useful. For example
RFC3765 defines a NOPEER community, i.e. a standardized way to say do
not export this route to peers (in the settlement free bilateral sense
of the word). But there is no possible way for the router to know what
is or isn't a peer, so it's up to the operator to implement the matching
for this supposedly standard community. But guess what, most people
don't, and those that do implement the functionality end up writing
their own network specific version anyways (either because they want to
keep it secret, or because they need to implement far more powerful
version anyways).

If we want to turn this into a discussion on useful things to do with
communities (to try and recover some value from this otherwise brain
rotting thread), how about this one. We as network operators are
currently facing a problem with BGP communities, and that problem is
called 32 bit ASNs. Quite simply, 32 bit ASNs don't fit into the
existing 16:16 scheme. There are new 64 bit communities (extended
communities) out there, but they aren't a 1:1 mapping of the way
communities work today. Rather than simply double the size and break it
up into 32:32, the designers reserved the top 16 bits for type and
subtype attributes, leaving you only 48 bits to work with. Clearly the
only suitable mapping for support of 32-bit ASNs on the Internet is
32:16, leaving us with exactly the same range of data values that we
have today.

So why do I bring this up? Because of those top 16 bits for type and
subtype. Two of the type fields that are newly introduced in extended 
communities are target and origin, which specifically mean this 
tag is trying to tell $ASN something, vs this $ASN is trying to tell 
you something. This actually has the benefit of addressing one of the 
most common problems with communities today, namespace collision between 
folks trying to both send instructions and receive data within the same 
ASN:x tag. Since we're all going to need to start updating our 
routing policies to support 32 bit ASNs soon anyways (unless you want to 
tell people getting them that they aren't allowed to use communities 
:P), now is a good time to start thinking about taking advantage of 
these new features to resolve age-old problems in your new community 
design.

Another feature I think would be beneficial for router vendors to
consider implementing is a way to map between regular and extended
communities. For example, I might be able to apply a policy at the edge
of my network which imports regular communities from my neighbor, and
turns them into origin: tags of extended communities. I might then be
able to update my internal network to work on extended communities, and
translate them back again to regular for backwards compatibility at the
edge. Also, now is a good time to find out if your router vendor 
ACTUALLY supports extended communities in all of their features (for 
example, regexp support), or if they only exist for l3vpn support and 
are not actually prepared to use them to work with 32-bit ASNs. Hint: 
Some vendors still fall into this category last I looked.

Apologies if this post contained too much clever and made Randy's head 
explode.

-- 
Richard A Steenbergen r...@e-gerbil.net   http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)

1 2 >

1 - 100 of 153 matches

Mail list logo