RE: 400G forwarding - how does it work?

ljwobker Sun, 07 Aug 2022 07:37:56 -0700

Buffering is a near-religious topic across a large swath of the network 
industry, but here are some opinions of mine:

a LOT of operators/providers need more buffering than you can realistically put 
directly onto the ASIC die.  Fast chips without external buffers measure 
capacity in tens of microseconds, which is nowhere near enough for a lot of the 
market.  We can (and do) argue about exactly where and what network roles can 
be met by this amount of buffering, but it's absolutely not a large enough part 
of the market to totally go away from "big" external buffers.
Once you "jump off the cliff" of needing something more than on-chip SRAM, 
you're in this weird area where nothing exists in the technology space that 
*really* solves the problem, because you really need access rate and bandwidth 
more than you need capacity.   HBM is currently the best (or at least the most 
popular) combination of capacity, power, access rate, and bandwidth... but it's 
still nowhere near perfect.  A common HBM2 implementation gives you 8GB of 
buffer space and about 2Tb of raw bandwidth, and a few hundred million IOPS.  
(A lot of that gets gobbled up by various overheads....)

These values are a function of two things:
1) memory physics - I don't know enough about how these things are Like Really 
Actually Built to talk about this part.
2) market forces... the market for this stuff is really GPUs, ML/AI 
applications, etc.  The networking silicon market is a drop in the ocean 
compared to the rest of compute, so the specific needs of my router aren't 
going to ever drive enough volume to get big memory makers to do exactly what 
**I** want.  I'm at the mercy of what they build for the gigantic players in 
the rest of the market.  

If you told me that someone had a memory technology that was something like 
"one-fourth the capacity of HBM, but four times the bandwidth and four times 
the access rate" I would do backflips and buy a lot of it, because it's a way 
better fit for the specific performance dimensions I need for A Really Fast 
Router.  But nothing remotely along these lines exists... so like a lot of 
other people I just have to order off the menu.   ;-)

--lj

-----Original Message-----
From: NANOG <[email protected]> On Behalf Of Masataka 
Ohta
Sent: Sunday, August 7, 2022 5:13 AM
To: [email protected]
Subject: Re: 400G forwarding - how does it work?

[email protected] wrote:

> Buffer designs are *really* hard in modern high speed chips, and there 
> are always lots and lots of tradeoffs.  The "ideal" answer is an 
> extremely large block of memory that ALL of the forwarding/queueing 
> elements have fair/equal access to... but this physically looks more 
> or less like a full mesh between the memory/buffering subsystem and 
> all the forwarding engines, which becomes really unwieldly 
> (expensive!) from a design standpoint.  The amount of memory you can 
> practically put on the main NPU die is on the order of 20-200 **mega** 
> bytes, where a single stack of HBM memory comes in at 4GB -- it's 
> literally 100x the size.

I'm afraid you imply too much buffer bloat only to cause unnecessary and 
unpleasant delay.

With 99% load M/M/1, 500 packets (750kB for 1500B MTU) of buffer is enough to 
make packet drop probability less than 1%. With 98% load, the probability is 
0.0041%.

But, there are so many router engineers who think, with bloated buffer, packet 
drop probability can be zero, which is wrong.

For example,

https://www.broadcom.com/products/ethernet-connectivity/switching/stratadnx/bcm88690
        Jericho2 delivers a complete set of advanced features for
        the most demanding carrier, campus and cloud environments.
        The device supports low power, high bandwidth HBM packet
        memory offering up to 160X more traffic buffering compared
        with on-chip memory, enabling zero-packet-loss in heavily
        congested networks.

                                        Masataka Ohta

RE: 400G forwarding - how does it work?

Reply via email to