Re: rtentry_free panic

Kristof Provost Sun, 31 Aug 2025 23:51:59 -0700

On 20 Aug 2025, at 18:00, Mark Johnston wrote:

On Wed, Aug 20, 2025 at 02:30:20PM +0200, Kristof Provost wrote:
We’re panicing because the V_rtzone zone has been cleaned up (in
vnet_rtzone_destroy()). I explicitly NULL out V_rtzone too, to makethis
more obvious.
Note that we failed to completely free all rtentries (`Freed UMA keg
(rtentry) was not empty (2 items). Lost 1 pages of memory.`).Presumably atleast on of those two gets freed later, and that’s the panic wesee.
rt_free() queues the actual delete as an epoch callback
(`NET_EPOCH_CALL(destroy_rtentry_epoch, &rt->rt_epoch_ctx);`), andthat’swhat we see here: the zone is removed before we’re done freeing allof the
rtentries.
vnet_rtzone_destroy() is called from rtables_destroy(), but thatexplicitlycalls NET_EPOCH_DRAIN_CALLBACKS() first, so I’d expect all of thependingcleanups to have been done at that point. The comment block abovedoessuggest that there may still be nexthop entries pending deletion evenafter
the we drain the callbacks. I think I can see how that’d happen for
nexthops, but I do not see how it can happen for rtentries.
Is it possible that if_detach_internal()->rt_flushifroutes() isrunning
after the rtentry zone is being destroyed?  That is, maybe we're
destroying interfaces too late in the jail teardown process?

With a little work to pass the calling function and line number throughthe call stack (and a lot of patience to reproduce the panic) I thinkI’ve found where we initially rt_free() the relevant rtentry, butit’s left me even more confused.

The call happens from ip6_destroy() -> in6_purgeaddr() ->ifa_del_loopback_route() -> ifa_maintain_loopback_route() ->rib_action() -> rib_del_route() -> rt_free().That’s a NET_EPOCH_CALL(), which should be fine because inrtables_destroy() we NET_EPOCH_CALLBACK_DRAIN() before wevnet_rtzone_destory() (which naturally destroys the relevant uma zone).

ip6_destroy()’s VNET_SYSUNIT is SI_SUB_PROTO_DOMAIN/SI_ORDER_THIRD andrtables_destroy()’s is SI_SUB_PROTO_DOMAIN/SI_ORDER_FIRST. Given thatit’s *un*init that means we call ip6_destroy() first, so that shouldall just work.The enqueued freeing of the rtentries should all be handled onceNET_EPOCH_CALLBACK_DRAIN completes, but that appears to not be the case.


—
Kristof

Re: rtentry_free panic

Reply via email to