I am looking at ideas to improve the setup, or if you do this, your experience with it.

The setup have to account for so far.

4 main transit access in different locations and 249 peering setup in major data center for public and private peering.

Currently ~945,000 IPv4 routes, ~196,000 IPv6 routes.

The use of Arista switch for routes in hardware is good, but limited obviously. Not possible to have full table in it.

IPv4 and IPv6 are on different boxes to take advantage of the TCAM capacity as much as possible for each version.

Put into the Arista boxes the most use routes and have the rest process by the OpenBSD boxes.

Use route reflector is kind of obvious to keep things manageable and help to track what's best to dump into the layer 3 boxes.

Sure you can use SFlow and NetFlow to track usage, but it is resources intensive.

I don't think this exists, but I thought it would be nice IF, somehow there was a simple counter into the BGP table that increment each time a route is selected, so sorting by that counter periodically and then add these routes to the Arista switch would keep the process as fast as possible. One in hardware. the other in software.

But I am not aware of anything that can do that super easy and cheap in resources is it?

Having two BGP transit sessions on a /29 per locations isn't always welcome by transit providers and none would really want, or like to peer to route reflector on your side and have to add static route to your main layer 3 switch to accommodate your traffic priority back to your layer 3 switch. And if they do, it's not a standard setup, sooner or later they will be remove and then your stuck, and then have to find someone welling to listen to you and do it again, until someone else change it back. Best to not have to do this obviously!

One way to make it work, might be to have one feed to the Arista box, then you limit what you accept there, (No choice as TCAM is limited) then the second to the OpenBSD one.

On Arista you can limit all routes from /18 and bigger and allow all your routes from your public/private direct peering as long as you keep the total under the hardware limit of the Arista boxes.

If you specialize it to only IPv4 and the other to IPv6 and layer 3 only, giving up the layer 2, then you could go almost to 350,000 routes in hardware. Very respectable.

Sure it's not the full internet routes, but unless you are really big, may be your customers don't use more then 100,000 routes. Speculation on my side here, that would need to be proven. I just pick the 100K, may be 200K, or may be 50K is the most realistic number.

And the setup to your transit is both your switch and server announce your full IP space the same except the server one may be using med if your transit will honor it, or if not, then prepend your AS instead.

And then you have your default route form your switch go to your server, instead of your transit. I explain why below.

Not ideal obviously as the best would be two switch that would provide 100% redundant setup, but they can't have the full table in hardware.

Why having the default route from the switch to your server, well it has the full table, so it may send the traffic to a better exit, oppose to your line to your transit, your switch would use and it may not be the best path anyway. Remember that your switch can't have the full table in hardware...

Now the issue would be to find the best way to update the routes in your switches that doesn't take to much resources like sflow (switch) and netflow (server) would.

And instead dedicate as much resources to routing as possible. Splitting setup between IPv4 and IPv6 is already a good thing as long as your peering point do not also limit your connection by mac address too. Two different boxes, two different MAC and if you do IPv4 and IPv6 as well, that's 4 mac address. :( Equinix will ONLY allow you 1 MAC address per dedicated fiber connection to your side.

Anyone with more experience with this type of unconventional setup have input, suggestions, experience, good/bad story, gotcha, etc?

That's why I thought to have a simple counter in the BGP would be nice and simple, but obviously NOT in the RFC, so definitely NOT build in.

However that would be so easy to use I guess.

Any feedback on these ideas would be greatly appreciated.

Thanks for your time and reading this.

Daniel

Reply via email to