Ideas for a mix of Arista Leyer 3 switches and OpenBSD BGPd setup.

Daniel Ouellet Thu, 16 Nov 2023 06:06:18 -0800

I am looking at ideas to improve the setup, or if you do this, yourexperience with it.


The setup have to account for so far.

4 main transit access in different locations and 249 peering setup inmajor data center for public and private peering.


Currently ~945,000 IPv4 routes, ~196,000 IPv6 routes.

The use of Arista switch for routes in hardware is good, but limitedobviously. Not possible to have full table in it.

IPv4 and IPv6 are on different boxes to take advantage of the TCAMcapacity as much as possible for each version.

Put into the Arista boxes the most use routes and have the rest processby the OpenBSD boxes.

Use route reflector is kind of obvious to keep things manageable andhelp to track what's best to dump into the layer 3 boxes.

Sure you can use SFlow and NetFlow to track usage, but it is resourcesintensive.

I don't think this exists, but I thought it would be nice IF, somehowthere was a simple counter into the BGP table that increment each time aroute is selected, so sorting by that counter periodically and then addthese routes to the Arista switch would keep the process as fast aspossible. One in hardware. the other in software.

But I am not aware of anything that can do that super easy and cheap inresources is it?

Having two BGP transit sessions on a /29 per locations isn't alwayswelcome by transit providers and none would really want, or like to peerto route reflector on your side and have to add static route to yourmain layer 3 switch to accommodate your traffic priority back to yourlayer 3 switch. And if they do, it's not a standard setup, sooner orlater they will be remove and then your stuck, and then have to findsomeone welling to listen to you and do it again, until someone elsechange it back. Best to not have to do this obviously!

One way to make it work, might be to have one feed to the Arista box,then you limit what you accept there, (No choice as TCAM is limited)then the second to the OpenBSD one.

On Arista you can limit all routes from /18 and bigger and allow allyour routes from your public/private direct peering as long as you keepthe total under the hardware limit of the Arista boxes.

If you specialize it to only IPv4 and the other to IPv6 and layer 3only, giving up the layer 2, then you could go almost to 350,000 routesin hardware. Very respectable.

Sure it's not the full internet routes, but unless you are really big,may be your customers don't use more then 100,000 routes. Speculation onmy side here, that would need to be proven. I just pick the 100K, may be200K, or may be 50K is the most realistic number.

And the setup to your transit is both your switch and server announceyour full IP space the same except the server one may be using med ifyour transit will honor it, or if not, then prepend your AS instead.

And then you have your default route form your switch go to your server,instead of your transit. I explain why below.

Not ideal obviously as the best would be two switch that would provide100% redundant setup, but they can't have the full table in hardware.

Why having the default route from the switch to your server, well it hasthe full table, so it may send the traffic to a better exit, oppose toyour line to your transit, your switch would use and it may not be thebest path anyway. Remember that your switch can't have the full table inhardware...

Now the issue would be to find the best way to update the routes in yourswitches that doesn't take to much resources like sflow (switch) andnetflow (server) would.

And instead dedicate as much resources to routing as possible. Splittingsetup between IPv4 and IPv6 is already a good thing as long as yourpeering point do not also limit your connection by mac address too. Twodifferent boxes, two different MAC and if you do IPv4 and IPv6 as well,that's 4 mac address. :( Equinix will ONLY allow you 1 MAC address perdedicated fiber connection to your side.

Anyone with more experience with this type of unconventional setup haveinput, suggestions, experience, good/bad story, gotcha, etc?

That's why I thought to have a simple counter in the BGP would be niceand simple, but obviously NOT in the RFC, so definitely NOT build in.


However that would be so easy to use I guess.

Any feedback on these ideas would be greatly appreciated.

Thanks for your time and reading this.

Daniel

Ideas for a mix of Arista Leyer 3 switches and OpenBSD BGPd setup.

Reply via email to