HI Robin,

Thanks a lot for the review.

Le mer. 1 avr. 2026, 00:17, Robin Jarry <[email protected]> a écrit :

> Hi Maxime,
>
> Maxime Leroy, Mar 31, 2026 at 23:41:
> > This RFC proposes an optional shared tbl8 pool for FIB/FIB6,
> > to address the difficulty of sizing num_tbl8 upfront.
> >
> > In practice, tbl8 usage depends on prefix distribution and
> > evolves over time. In multi-VRF environments, some VRFs are
> > elephants (full table, thousands of tbl8 groups) while others
> > consume very little (mostly /24 or shorter). Per-FIB sizing
> > forces each instance to provision for its worst case, leading
> > to significant memory waste.
> >
> > A shared pool solves this: all FIBs draw from the same tbl8
> > memory, so elephant VRFs use what they need while light VRFs
> > cost almost nothing. The sharing granularity is flexible: one pool per
> > VRF, per address family, a global pool, or no sharing at all.
> >
> > This series adds:
> >
> >   - A shared tbl8 pool, replacing per-backend allocation
> >     (bitmap in dir24_8, stack in trie) with a common
> >     refcounted O(1) stack allocator.
> >   - An optional resizable mode (grow via alloc + copy + QSBR
> >     synchronize), removing the need to guess peak usage at
> >     creation time.
> >   - A stats API (rte_fib_tbl8_pool_get_stats()) exposing
> >     used/total/max counters.
> >
> > All features are opt-in:
> >
> >   - Existing per-FIB allocation remains the default.
> >   - Shared pool is enabled via the tbl8_pool config field.
> >   - Resize is enabled by setting max_tbl8 > 0 with QSBR.
>
> The shared pool is nice, but dynamic resize is awesome.
>
> I have gone over the implementation and it seems sane to me. The only
> concern I might have is the change of tbl8 pool allocator for DIR24_8
> from a O(n/64) slab to O(1) stack. I don't know if it can have
> a performance impact on lookup or if it only affects the control plane
> operations (add/del).
>

This only affects control-plane operations (tbl8 alloc/free on
add/del).Lookup only reads the final tbl24/tbl8 arrays and does not
interact withthe allocator itself.

So the motivation for the stack allocator here is to make
shared-poolmanagement simple and O(1) on the update path, not to change
lookupbehavior.

If shrinking ever becomes interesting later, then I agree the allocator
choice may need to be revisited. A LIFO stack immediately reuses the most
recently freed entries, so high indices tend to get reused first and it
becomes difficult to form a contiguous free tail. A low-first bitmap/slab
allocator, or a min-heap, would be better for shrinking because they prefer
lower free indices and therefore leave high indices unused longer.The
trade-off is that put/get become more expensive (O(n/64) for bitmap scan or
O(log n) for a heap, instead of O(1) for the stack).


> > Shrinking (reducing pool capacity after usage drops) is not
> > part of this series. It would always be best-effort since
> > there is no compaction: if any tbl8 group near the end of the
> > pool is still in use, the pool cannot shrink. The current LIFO
> > free-list makes this less likely by immediately reusing freed
> > high indices, which prevents a contiguous free tail from
> > forming. A different allocation strategy (e.g. a min-heap
> > favoring low indices) could improve shrink opportunities, but
> > is better addressed separately.
>
> Shrinking would be nice to have but not critical in my opinion. I would
> prefer if we could add a dynamic resize feature (and possibly RIB node
> mempool sharing) for rte_rib* as well so that FIB objects can really be
> scaled up on demand. For now, if you run out of space in the RIB, you
> will get an ENOSPC error even if the FIB tbl8 pool still has room.
>

Agreed. Today tbl8 is only one side of the sizing problem.

For rte_rib*, I think we should probably move in a similar direction as
well: avoid per-VRF/per-instance worst-case provisioning, while keeping
separate global limits for IPv4 and IPv6, e.g. max_ipv4_routes and
max_ipv6_routes.

The difference with tbl8 is that the trade-off is not the same. tbl8 usage
is both expensive (2 KB per group) and hard to predict from route count
alone, since it depends on prefix distribution and table shape. RIB nodes
are much smaller and their usage is more predictable, so a shared global
node pool per AF already looks like a sensible first step.

That would remove per-VRF over-provisioning while keeping global limits. If
later true on-demand growth is needed, I think the mechanism would likely
have to be different from tbl8 resizing anyway: tbl8 is an indexed array
and can grow via alloc + copy + pointer swap, while RIB nodes are linked by
pointers, so they cannot be relocated transparently. In that case, a
chunked allocator would probably make more sense.

Also, I do not think hugepage-backed allocation (i.e. rte_mempool) is
really needed for rte_rib*.

Maxime

Reply via email to