>
> My understanding of Juniper's approach to the problem is that instead
> of employing TCAMs for next-hop lookup, they use general purpose CPUs
> operating on a radix tree, exactly as you would for an all-software
> router.
>

Absolutely are not doing that with "general purpose CPUs".

The LU block on early gen Trios was a dedicated ASIC (LU by itself, then
consolidated slightly) , then later gen Trio put everything on a
single chip, but again dedicated ASIC.

To
> achieve an -aggregate- lookup speed comparable to a TCAM, they
> implement a bunch of these lookup engines as dedicated parallel
> subprocessors rather than using the router's primary compute engine.
>

You're correct that there is parallelism in the LU functions , but I still
think you're kinda smushing a bunch of stuff that's happening in different
places together.

On Fri, Sep 29, 2023 at 4:44 PM William Herrin <b...@herrin.us> wrote:

> On Thu, Sep 28, 2023 at 10:29 PM Saku Ytti <s...@ytti.fi> wrote:
> > On Fri, 29 Sept 2023 at 08:24, William Herrin <b...@herrin.us> wrote:
> > > Maybe. That's where my comment about CPU cache starvation comes into
> > > play. I haven't delved into the Juniper line cards recently so I could
> > > easily be wrong, but if the number of routes being actively used
> > > pushes past the CPU data cache, the cache miss rate will go way up and
> > > it'll start thrashing main memory. The net result is that the
> > > achievable PPS drops by at least an order of magnitude.
> >
> > When you say, you've not delved into the Juniper line cards recently,
> > to which specific Juniper linecard your comment applies to?
>
> Howdy,
>
> My understanding of Juniper's approach to the problem is that instead
> of employing TCAMs for next-hop lookup, they use general purpose CPUs
> operating on a radix tree, exactly as you would for an all-software
> router. This makes each lookup much slower than a TCAM can achieve.
> However, that doesn't matter much: the lookup delays are much shorter
> than the transmission delays so it's not noticeable to the user. To
> achieve an -aggregate- lookup speed comparable to a TCAM, they
> implement a bunch of these lookup engines as dedicated parallel
> subprocessors rather than using the router's primary compute engine.
>
> A TCAM lookup is approximately O(1) while a radix tree lookup is
> approximately O(log n). (Neither description is strictly correct but
> it's close enough to understand the running time.) Log n is pretty
> small so it doesn't take much parallelism for the practical run time
> to catch up to the TCAM.
>
> Feel free to correct me if I'm mistaken or fill in any important
> details I've glossed over.
>
> Regards,
> Bill Herrin
>
>
> --
> William Herrin
> b...@herrin.us
> https://bill.herrin.us/
>

Reply via email to