On Sat, May 02, 2026 at 03:22:19PM +0300, Vastargazing wrote:
> The four nexthop torture subtests delete and re-add a group member
> while ping -f and mausezahn keep traffic flowing through the same
> group, so on each iteration the read side runs nh_grp_entry_stats_inc()
> while the write side goes through remove_nh_grp_entry(). That is the
> exact race fixed in commit b2662e7593e9 ("net: nexthop: fix percpu
> use-after-free in remove_nh_grp_entry").
>
> The reason it never tripped these tests is the assertion. Each subtest
> ends with "if we did not crash, success", so a KASAN splat without
> panic_on_warn=1 lands in dmesg and the test still prints [OK]. The UAF
> above would have been visible to a KASAN run of fib_nexthops.sh; the
> torture loop just did not bother to look.
Do you have a trace?
The netdev CI and our internal CI run the test and look at the kernel
log for splats. Both did not flag it, most likely because per-CPU
allocations are not covered by KASAN.
>
> Drop a marker into /dev/kmsg before each torture subtest, grep for
> KASAN/UBSAN/KCSAN/KFENCE/Oops/"kernel BUG at" lines once the load is
> killed, and fail the subtest with the offending lines printed if any
> match. The check is skipped when /dev/kmsg is not writable so the
> existing pass behaviour is preserved on restricted setups. No new
> TEST_PROGS, no new test mechanism, just close the assertion gap.
I prefer to avoid such random markers and rely on the system running the
tests to catch these issues.