On 20 Aug 2025, at 18:00, Mark Johnston wrote:
On Wed, Aug 20, 2025 at 02:30:20PM +0200, Kristof Provost wrote:
We’re panicing because the V_rtzone zone has been cleaned up (in
vnet_rtzone_destroy()). I explicitly NULL out V_rtzone too, to make
this
more obvious.
Note that we failed to completely free all rtentries (`Freed UMA keg
(rtentry) was not empty (2 items). Lost 1 pages of memory.`).
Presumably at
least on of those two gets freed later, and that’s the panic we
see.
rt_free() queues the actual delete as an epoch callback
(`NET_EPOCH_CALL(destroy_rtentry_epoch, &rt->rt_epoch_ctx);`), and
that’s
what we see here: the zone is removed before we’re done freeing all
of the
rtentries.
vnet_rtzone_destroy() is called from rtables_destroy(), but that
explicitly
calls NET_EPOCH_DRAIN_CALLBACKS() first, so I’d expect all of the
pending
cleanups to have been done at that point. The comment block above
does
suggest that there may still be nexthop entries pending deletion even
after
the we drain the callbacks. I think I can see how that’d happen for
nexthops, but I do not see how it can happen for rtentries.
Is it possible that if_detach_internal()->rt_flushifroutes() is
running
after the rtentry zone is being destroyed? That is, maybe we're
destroying interfaces too late in the jail teardown process?
With a little work to pass the calling function and line number through
the call stack (and a lot of patience to reproduce the panic) I think
I’ve found where we initially rt_free() the relevant rtentry, but
it’s left me even more confused.
The call happens from ip6_destroy() -> in6_purgeaddr() ->
ifa_del_loopback_route() -> ifa_maintain_loopback_route() ->
rib_action() -> rib_del_route() -> rt_free().
That’s a NET_EPOCH_CALL(), which should be fine because in
rtables_destroy() we NET_EPOCH_CALLBACK_DRAIN() before we
vnet_rtzone_destory() (which naturally destroys the relevant uma zone).
ip6_destroy()’s VNET_SYSUNIT is SI_SUB_PROTO_DOMAIN/SI_ORDER_THIRD and
rtables_destroy()’s is SI_SUB_PROTO_DOMAIN/SI_ORDER_FIRST. Given that
it’s *un*init that means we call ip6_destroy() first, so that should
all just work.
The enqueued freeing of the rtentries should all be handled once
NET_EPOCH_CALLBACK_DRAIN completes, but that appears to not be the case.
—
Kristof