Re: Whitebox Routers Beyond the Datasheet

2024-05-12 Thread Mike Hammett
Some of you have pointed out (onlist and offlist) the importance of the OS to 
these concerns. Yes, that makes sense. THe Venn Diagram of hardware that 
can\can't and OSes that can\can't. 

I'd appreciate some feedback as well on the OS side of things. 




- 
Mike Hammett 
Intelligent Computing Solutions 
http://www.ics-il.com 

Midwest-IX 
http://www.midwest-ix.com 

- Original Message -

From: "Mike Hammett"  
To: nanog@nanog.org 
Sent: Friday, April 12, 2024 8:03:49 AM 
Subject: Whitebox Routers Beyond the Datasheet 


I'm looking at the suitability of whitebox routers for high through, low port 
count, fast BGP performance applications. Power efficiency is important as 
well. 


What I've kind of come down to (based on little more than spec sheets) are the 
EdgeCore AGR400 and the UfiSpace S9600-30DX. They can both accommodate at least 
three directions of 400G for linking to other parts of my network and then have 
enough 100G or slower ports to connect to transit, peers, and customers as 
appropriate. Any other suggestions for platforms similar to those would be 
appreciated. 


They both appear to carry buffers large enough to accommodate buffering 
differences in port capacities, which is an issue I've seen with boxes more 
targeted to cloud\datacenter switching. 


What isn't in the spec sheets is BGP-related information. They don't mention 
how many routes they can hold, just that they have additional TCAM to handle 
more routes and features. That's wonderful and all, but does it take it from 
64k routes to 512k routes, or does it take it from 256k routes up to the 
millions of routes? Also, BGP convergence isn't listed (nor do I rarely ever 
see it talked about in such sheets). I know that software-based routers can now 
load a full table in 30 seconds or less. I know that getting the FIB updated 
takes a little bit longer. I know that withdrawing a route takes a little bit 
longer. However, often, that performance is CPU-based. An underpowered CPU may 
take a minute or more to load that table and may take minutes to handle route 
churn. Can anyone speak to these routers (or routers like these) ability to 
handle modern route table activity? 


My deployment locations and philosophies simply won't have me in an environment 
where I need the density of dozens of 400G\100G ports. That the routers that 
seem to be more marketed to the use case are designed for. 




- 
Mike Hammett 
Intelligent Computing Solutions 
http://www.ics-il.com 

Midwest-IX 
http://www.midwest-ix.com 




Re: Whitebox Routers Beyond the Datasheet

2024-04-19 Thread heasley
Fri, Apr 12, 2024 at 08:03:49AM -0500, Mike Hammett:
> I'm looking at the suitability of whitebox routers for high through, low port 
> count, fast BGP performance applications. Power efficiency is important as 
> well. 
> 
> 
> What I've kind of come down to (based on little more than spec sheets) are 
> the EdgeCore AGR400 and the UfiSpace S9600-30DX. They can both accommodate at 
> least three directions of 400G for linking to other parts of my network and 
> then have enough 100G or slower ports to connect to transit, peers, and 
> customers as appropriate. Any other suggestions for platforms similar to 
> those would be appreciated. 

Most of the white boxes are same, in mpov, with small variations.  And
that is the whole idea.

I would choose the NOS you want first.  There are several, but few I would
want in production.  If it is a PoS or unmanageable, it does not matter
what the h/w capabilities are.  Was it created by seasoned engineers in
Internet-scale routing?  And, because each box will require some software
specific to it, though limited in scope, the NOS will dictate which boxes
are available to choose among.

Beyond the hardware capabilities, also consider with whom your NOS mfg
has the best working relationship.  That will dictate their ability to
quickly resolve h/w-specific issues in their s/w or even answer
h/w-specific questions for you.

Also consider what the h/w maintenance program is globally.  Is it
important for you to have 4hr replacements in Hong Kong?  That will
affect your decision greatly.

~1.5yr ago, it seemed like everyone was moving toward UfiSpace h/w,
away from EdgeCore.

Ask others about the reliability of the specific h/w you are considering.


Re: Whitebox Routers Beyond the Datasheet

2024-04-14 Thread Tom Beecher
Agreed.

I think as a practical matter, the large majority of operators probably
only care about time from last update / EoRIB -> FIBs forwarding working.
As long as that time delta is acceptable for their environment and
circumstances, that's 'good enough'. Definitely some edge cases and
circumstances that can factor in.

Those of us in the massive scale part of the world certainly have our own
unique problems with this stuff. :)

On Sat, Apr 13, 2024 at 11:58 AM Jared Mauch  wrote:

>
>
> > On Apr 13, 2024, at 12:15 AM, 7ri...@gmail.com wrote:
> >
> >
> >> I feel like this shouldn't be listed on a data sheet for just the
> whitebox hardware. The software running on it would be the gating factor.
> > There would be two things ... BGP convergence, and then the time
> required to get routes from the RIB into the hardware forwarding tables.
> These are completely separate things. Both are gated on software for the
> most part, and it would be hard to measure them unless you know a lot more
> about the environment. Even then it would be a bit of a guess.
> >
> > Contact me off list if you're interested in prior experience in this
> area.
> >
> > :-) /r
>
>
> Yeah, I think the question is coming from the wrong direction, which is
> what route scale do you need then match it to the hardware.  You can load a
> variety of software on these devices, including putting something like cRPD
> on top of it so you have the Juniper software and policy language, or roll
> your own with FRR, BIRD or something else.
>
> The kernel -> FIB (hardware) download performance will vary as will the
> way the TCAM is carved up into the various routes and profiles.
>
> It also depends on what you download to the FIB vs what you have in your
> RIB, for example a fib-filter in Juniper parlance may give you the ability
> to carry a full routing table but just a default and your local stub routes
> depending on the device role.  (Connected/static + local iBGP+eBGP learned)
>
> - Jared


Re: Whitebox Routers Beyond the Datasheet

2024-04-13 Thread Jared Mauch



> On Apr 13, 2024, at 12:15 AM, 7ri...@gmail.com wrote:
> 
> 
>> I feel like this shouldn't be listed on a data sheet for just the whitebox 
>> hardware. The software running on it would be the gating factor.
> There would be two things ... BGP convergence, and then the time required to 
> get routes from the RIB into the hardware forwarding tables. These are 
> completely separate things. Both are gated on software for the most part, and 
> it would be hard to measure them unless you know a lot more about the 
> environment. Even then it would be a bit of a guess.
> 
> Contact me off list if you're interested in prior experience in this area.
> 
> :-) /r


Yeah, I think the question is coming from the wrong direction, which is what 
route scale do you need then match it to the hardware.  You can load a variety 
of software on these devices, including putting something like cRPD on top of 
it so you have the Juniper software and policy language, or roll your own with 
FRR, BIRD or something else.

The kernel -> FIB (hardware) download performance will vary as will the way the 
TCAM is carved up into the various routes and profiles.

It also depends on what you download to the FIB vs what you have in your RIB, 
for example a fib-filter in Juniper parlance may give you the ability to carry 
a full routing table but just a default and your local stub routes depending on 
the device role.  (Connected/static + local iBGP+eBGP learned)

- Jared

Re: Whitebox Routers Beyond the Datasheet

2024-04-12 Thread William Herrin
On Fri, Apr 12, 2024 at 6:03 AM Mike Hammett  wrote:
> What I've kind of come down to (based on little more than spec sheets)
> are the EdgeCore AGR400 and the UfiSpace S9600-30DX.

> That's wonderful and all, but does it take it from 64k routes
>  to 512k routes, or does it take it from 256k routes up to
>  the millions of routes?

Hi Mike,

You're combining two questions here. Break them apart. How many
distinct routes can its line-speed forwarding engine handle (the FIB)
and what processing/DRAM is available to handle the routing database
(RIB).

To take the EdgeCore AGR400, it has an 8-core Xeon and 32 gigs of ram.
It's going to handle a BGP RIB in the many millions of routes and a
BGP convergence time as fast as anything else you can buy.

For switching (FIB), it uses a Broadcom BCM88823. The data sheets on
the broadcom 88800 series chips are suspiciously light on information
about their internal capacity for forwarding decisions. Just a black
box in the diagram described as an "external lookup engine" and some
notes in the "packet processing" section that the ingress receive
packet processor "handles the main packet processing stage" and
"optionally" uses "expandable databases" from an external lookup
interface.

Large TCAMS are power hungry and the chip is physically small, so I'd
bet against it having an DFZ-size embedded TCAM. Smells like a switch
chip that typically has something in the single-digit thousands of
slots but that's strictly a guess. And with only 8 cores, any
software-switched "expandable databases" are going to be wimpy.

I'm not familiar with the specific hardware you mentioned, so none of
the above is certain. Just based on what I glean from the specs I
could find via google and my experience with white-box routing in
general. As a rule though, if the marketing materials don't call out a
component of the product as impressive, it probably isn't.

In general, white box routing is done with a framework like DPDK on
machines with large numbers of CPU cores. While they can handle
100gbps, they do it by running the cores in single-thread busywait
loops that eliminate the need for interrupts from the network devices.
This generates lots of heat and consumes lots of electricity.

Regards,
Bill Herrin


-- 
William Herrin
b...@herrin.us
https://bill.herrin.us/


Re: Whitebox Routers Beyond the Datasheet

2024-04-12 Thread Mike Hammett
That makes sense, but also why I'm going beyond the datasheet here to solicit 
people's feedback. 




- 
Mike Hammett 
Intelligent Computing Solutions 
http://www.ics-il.com 

Midwest-IX 
http://www.midwest-ix.com 

- Original Message -

From: "Tom Beecher"  
To: "Mike Hammett"  
Cc: nanog@nanog.org 
Sent: Friday, April 12, 2024 1:30:04 PM 
Subject: Re: Whitebox Routers Beyond the Datasheet 




Also, BGP convergence isn't listed (nor do I rarely ever see it talked about in 
such sheets). 




I feel like this shouldn't be listed on a data sheet for just the whitebox 
hardware. The software running on it would be the gating factor. 


On Fri, Apr 12, 2024 at 9:05 AM Mike Hammett < na...@ics-il.net > wrote: 





I'm looking at the suitability of whitebox routers for high through, low port 
count, fast BGP performance applications. Power efficiency is important as 
well. 


What I've kind of come down to (based on little more than spec sheets) are the 
EdgeCore AGR400 and the UfiSpace S9600-30DX. They can both accommodate at least 
three directions of 400G for linking to other parts of my network and then have 
enough 100G or slower ports to connect to transit, peers, and customers as 
appropriate. Any other suggestions for platforms similar to those would be 
appreciated. 


They both appear to carry buffers large enough to accommodate buffering 
differences in port capacities, which is an issue I've seen with boxes more 
targeted to cloud\datacenter switching. 


What isn't in the spec sheets is BGP-related information. They don't mention 
how many routes they can hold, just that they have additional TCAM to handle 
more routes and features. That's wonderful and all, but does it take it from 
64k routes to 512k routes, or does it take it from 256k routes up to the 
millions of routes? Also, BGP convergence isn't listed (nor do I rarely ever 
see it talked about in such sheets). I know that software-based routers can now 
load a full table in 30 seconds or less. I know that getting the FIB updated 
takes a little bit longer. I know that withdrawing a route takes a little bit 
longer. However, often, that performance is CPU-based. An underpowered CPU may 
take a minute or more to load that table and may take minutes to handle route 
churn. Can anyone speak to these routers (or routers like these) ability to 
handle modern route table activity? 


My deployment locations and philosophies simply won't have me in an environment 
where I need the density of dozens of 400G\100G ports. That the routers that 
seem to be more marketed to the use case are designed for. 




- 
Mike Hammett 
Intelligent Computing Solutions 
http://www.ics-il.com 

Midwest-IX 
http://www.midwest-ix.com 






Re: Whitebox Routers Beyond the Datasheet

2024-04-12 Thread Tom Beecher
>
> Also, BGP convergence isn't listed (nor do I rarely ever see it talked
> about in such sheets).


I feel like this shouldn't be listed on a data sheet for just the whitebox
hardware. The software running on it would be the gating factor.

On Fri, Apr 12, 2024 at 9:05 AM Mike Hammett  wrote:

> I'm looking at the suitability of whitebox routers for high through, low
> port count, fast BGP performance applications. Power efficiency is
> important as well.
>
> What I've kind of come down to (based on little more than spec sheets) are
> the EdgeCore AGR400 and the UfiSpace S9600-30DX. They can both accommodate
> at least three directions of 400G for linking to other parts of my network
> and then have enough 100G or slower ports to connect to transit, peers, and
> customers as appropriate. Any other suggestions for platforms similar to
> those would be appreciated.
>
> They both appear to carry buffers large enough to accommodate buffering
> differences in port capacities, which is an issue I've seen with boxes more
> targeted to cloud\datacenter switching.
>
> What isn't in the spec sheets is BGP-related information. They don't
> mention how many routes they can hold, just that they have additional TCAM
> to handle more routes and features. That's wonderful and all, but does it
> take it from 64k routes to 512k routes, or does it take it from 256k routes
> up to the millions of routes? Also, BGP convergence isn't listed (nor do I
> rarely ever see it talked about in such sheets). I know that software-based
> routers can now load a full table in 30 seconds or less. I know that
> getting the FIB  updated takes a little bit longer. I know that withdrawing
> a route takes a little bit longer. However, often, that performance is
> CPU-based. An underpowered CPU may take a minute or more to load that table
> and may take minutes to handle route churn. Can anyone speak to these
> routers (or routers like these) ability to handle modern route table
> activity?
>
> My deployment locations and philosophies simply won't have me in an
> environment where I need the density of dozens of 400G\100G ports. That the
> routers that seem to be more marketed to the use case are designed for.
>
>
>
> -
> Mike Hammett
> Intelligent Computing Solutions
> http://www.ics-il.com
>
> Midwest-IX
> http://www.midwest-ix.com
>
>


Re: Whitebox Routers Beyond the Datasheet

2024-04-12 Thread Michel Blais
I'm surprised they stopped showing those options in the datasheet. They
used to have those in the spec sheet when I bought Edgecore hardware
some years ago. Seems like you will have to contact sales teams.

If it can help, I use AS5916-XKS and the TCAM supports millions of routes.
Out of my mind, it is something like 6 or 10 millions for IPv4 and 1.5
millions for IPv6.