On Thu, 7 May 2026 11:50:40 +0200 Maxime Leroy <[email protected]> wrote:
> Asymmetric ADD/DEL sequences in FIB6 trie (a covering parent > removed between the ADD and DEL of a longer prefix) eventually > make ADD fail with -ENOSPC even when the tbl8 pool is empty. > > Patch 2/3 is a small reproducer. > > Root cause: rsvd_tbl8s is updated by depth_diff recomputed from > the current RIB, so increments at ADD and decrements at DEL do > not cancel when the RIB state changes in between. The counter > drifts and wraps to UINT32_MAX. > > The simplest fix I could find (1/3, 3/3) is to drop rsvd_tbl8s > and use the pool counters already maintained by alloc/free: > tbl8_pool_pos in trie, cur_tbl8s in dir24_8. The DQ reclaim > inside tbl8_alloc() is moved into the pre-check. > > I am not sure I understood the original intent of keeping > rsvd_tbl8s separate from the pool counters. In dir24_8 the two > mirror each other 1:1 and rsvd_tbl8s looks redundant; in trie, > depth_diff gives it a worst-case-reservation flavor but the > recomputation from the RIB is exactly what makes it drift. If > there was a deliberate reason, please point it out. > > Patch 3/3 is a no-op cleanup that aligns dir24_8 with the trie > pattern. > > Maxime Leroy (3): > fib6: fix tbl8 reservation drift in trie > test/fib6: add reproducer for tbl8 reservation drift > fib: drop redundant tbl8 reservation counter > > app/test/test_fib6.c | 83 ++++++++++++++++++++++++++++++++++++++++++++ > lib/fib/dir24_8.c | 31 ++++++----------- > lib/fib/dir24_8.h | 1 - > lib/fib/trie.c | 23 ++++++------ > lib/fib/trie.h | 3 +- > 5 files changed, 104 insertions(+), 37 deletions(-) > Used this a test case for multiple AI reviewers. Most reported false positives in various places. But the winner was Claude web (or desktop) with Opus 4.7 extended. It saw: Patch 3 - fib: drop redundant tbl8 reservation counter Warning: The commit message frames this as removing a "redundant" counter and claims "Behavior is unchanged for non-DQ modes," but dir24_8 has the same rsvd_tbl8s drift bug that patch 1/3 fixes for trie6 — this patch silently fixes it as well. Trace (no RCU, depth-1 mode): ADD /28 A (no /24 cover): - tmp == NULL, pre-check passes - install_to_fib allocates 1 tbl8 -> cur_tbl8s = 1 - post-add hook: tmp == NULL -> rsvd_tbl8s++ = 1 ADD /24 B covering A: - depth == 24, no rsvd_tbl8s update - tbl8 entries rewritten, no alloc/free - state: rsvd_tbl8s = 1, cur_tbl8s = 1 DEL /28 A (with /24 cover B): - modify_fib installs B's nh over A's range - tbl8 becomes uniform -> tbl8_recycle() -> cur_tbl8s = 0 - post-del check: tmp != NULL (B is the cover) -> rsvd_tbl8s NOT decremented - state: rsvd_tbl8s = 1, cur_tbl8s = 0 (diverged) DEL /24 B: - depth == 24, no rsvd_tbl8s update - final: rsvd_tbl8s = 1, cur_tbl8s = 0 Each repetition leaks +1 into rsvd_tbl8s with no actual tbl8 in use. Over enough cycles, rsvd_tbl8s reaches number_tbl8s and the pre-check rejects valid /25-/32 ADDs even though the pool is empty. This is the same class of bug as the IPv6 case, just with smaller per-cycle drift. The patch should carry: Fixes: <commit that introduced rsvd_tbl8s in dir24_8> Cc: [email protected]

