> > My understanding of Juniper's approach to the problem is that instead > of employing TCAMs for next-hop lookup, they use general purpose CPUs > operating on a radix tree, exactly as you would for an all-software > router. >
Absolutely are not doing that with "general purpose CPUs". The LU block on early gen Trios was a dedicated ASIC (LU by itself, then consolidated slightly) , then later gen Trio put everything on a single chip, but again dedicated ASIC. To > achieve an -aggregate- lookup speed comparable to a TCAM, they > implement a bunch of these lookup engines as dedicated parallel > subprocessors rather than using the router's primary compute engine. > You're correct that there is parallelism in the LU functions , but I still think you're kinda smushing a bunch of stuff that's happening in different places together. On Fri, Sep 29, 2023 at 4:44 PM William Herrin <b...@herrin.us> wrote: > On Thu, Sep 28, 2023 at 10:29 PM Saku Ytti <s...@ytti.fi> wrote: > > On Fri, 29 Sept 2023 at 08:24, William Herrin <b...@herrin.us> wrote: > > > Maybe. That's where my comment about CPU cache starvation comes into > > > play. I haven't delved into the Juniper line cards recently so I could > > > easily be wrong, but if the number of routes being actively used > > > pushes past the CPU data cache, the cache miss rate will go way up and > > > it'll start thrashing main memory. The net result is that the > > > achievable PPS drops by at least an order of magnitude. > > > > When you say, you've not delved into the Juniper line cards recently, > > to which specific Juniper linecard your comment applies to? > > Howdy, > > My understanding of Juniper's approach to the problem is that instead > of employing TCAMs for next-hop lookup, they use general purpose CPUs > operating on a radix tree, exactly as you would for an all-software > router. This makes each lookup much slower than a TCAM can achieve. > However, that doesn't matter much: the lookup delays are much shorter > than the transmission delays so it's not noticeable to the user. To > achieve an -aggregate- lookup speed comparable to a TCAM, they > implement a bunch of these lookup engines as dedicated parallel > subprocessors rather than using the router's primary compute engine. > > A TCAM lookup is approximately O(1) while a radix tree lookup is > approximately O(log n). (Neither description is strictly correct but > it's close enough to understand the running time.) Log n is pretty > small so it doesn't take much parallelism for the practical run time > to catch up to the TCAM. > > Feel free to correct me if I'm mistaken or fill in any important > details I've glossed over. > > Regards, > Bill Herrin > > > -- > William Herrin > b...@herrin.us > https://bill.herrin.us/ >