On Sun, 2 Jul 2023 at 17:15, Mark Tinka <[email protected]> wrote: > Technically, do we not think that an oversubscribed Juniper box with a > single Trio 6 chip with no fabric is feasible? And is it not being built > because Juniper don't want to cannibalize their other distributed > compact boxes? > > The MX204, for example, is a single Trio 3 chip that is oversubscribed > by an extra 240Gbps. So we know they can do it. The issue with the MX204 > is that most customers will run out of ports before they run out of > bandwidth.
Not disagreeing here, but how do we define oversubscribed here? Are all boxes oversubscribed which can't do a) 100% at max size packet and b) 100% at min size packet and c) 100% of packets to delay buffer, I think this would be quite reasonable definition, but as far as I know, no current device of non-modest scale would satisfy each 3, almost all of them would only satisfy a). Let's consider first gen trio serdes 1) 2/4 goes to fabric (btree replication) 2) 1/4 goes to delay buffer 3) 1/4 goes to WAN port (and actually like 0.2 additionally goes to lookup engine) So you're selling less than 1/4th of the serdes you ship, more than 3/4 are 'overhead'. Compared to say Silicon1, which is partially buffered, they're selling almost 1/2 of the serdes they ship. You could in theory put ports on all of these serdes in BPS terms, but not in PPS terms at least not with off-chip memory. And in each case, in a pizza box case, you could sell those fabric ports, as there is no fabric. So given NPU has always ~2x the bps in pizza box format (but usually no more pps). And in MX80/MX104 Juniper did just this, they sell 80G WAN ports, when in linecard mode it only is 40G WAN port device. I don't consider it oversubscribed, even though the minimum packet size went up, because the lookup capacity didn't increase. Curiously AMZN told Nanog their ratio, when design is fully scaled to 100T is 1/4, 400T bought ports, 100T useful ports. Unclear how long 100T was going to scale, but obviously they wouldn't launch architecture which needs to be redone next year, so when they decided 100T cap for the scale, they didn't have 100T need yet. This design was with 112Gx128 chips, and boxes were single chip, so all serdes connect ports, no fabrics, i.e. true pizzabox. I found this very interesting, because the 100T design was, I think 3 racks? And last year 50T asics shipped, next year we'd likely get 100T asics (224Gx512? or 112Gx1024?). So even hyperscalers are growing slower than silicon, and can basically put their dc-in-a-chip, greatly reducing cost (both CAPEX and OPEX) as no need for wasting 3/4th of the investment on overhead. The scale also surprised me, even though perhaps it should not have, they quoted +1M network devices, considering they quote +20M nitro system shipped, that's like <20 revenue generating compute per network device. Depending on the refresh cycle, this means amazon is buying 15-30k network devices per month, which I expect is significantly more than cisco+juniper+nokia ship combined to SP infra, so no wonder SPs get little love. -- ++ytti _______________________________________________ juniper-nsp mailing list [email protected] https://puck.nether.net/mailman/listinfo/juniper-nsp

