On Wed, Aug 20, 2025 at 10:48:49PM +0200, Kristof Provost wrote:
> On 20 Aug 2025, at 18:00, Mark Johnston wrote:
> > On Wed, Aug 20, 2025 at 02:30:20PM +0200, Kristof Provost wrote:
> > > Hi,
> > > 
> > > Running the pf tests I very occasional (say 1 out of 10 runs) see
> > > panics
> > > freeing an rtentry.
> > > This mostly manifests during bricoler test runs, and usually with
> > > the KMSAN
> > > kernel config. I assume that’s because there’s a timing factor
> > > involved
> > > rather than it being an issue that’s directly detected by
> > > KMSAN/KASAN.
> > 
> > I've seen this before, but not in the past few months.  I'm running with
> > the default parallelism of 4 most of the time.
> > 
> I have the distinct impression (but no data to prove it) that it comes and
> goes.
> 
> > > We’re panicing because the V_rtzone zone has been cleaned up (in
> > > vnet_rtzone_destroy()). I explicitly NULL out V_rtzone too, to make
> > > this
> > > more obvious.
> > > Note that we failed to completely free all rtentries (`Freed UMA keg
> > > (rtentry) was not empty (2 items).  Lost 1 pages of memory.`).
> > > Presumably at
> > > least on of those two gets freed later, and that’s the panic we see.
> > > 
> > > rt_free() queues the actual delete as an epoch callback
> > > (`NET_EPOCH_CALL(destroy_rtentry_epoch, &rt->rt_epoch_ctx);`), and
> > > that’s
> > > what we see here: the zone is removed before we’re done freeing all
> > > of the
> > > rtentries.
> > > 
> > > vnet_rtzone_destroy() is called from rtables_destroy(), but that
> > > explicitly
> > > calls NET_EPOCH_DRAIN_CALLBACKS() first, so I’d expect all of the
> > > pending
> > > cleanups to have been done at that point.  The comment block above
> > > does
> > > suggest that there may still be nexthop entries pending deletion
> > > even after
> > > the we drain the callbacks. I think I can see how that’d happen for
> > > nexthops, but I do not see how it can happen for rtentries.
> > 
> > Is it possible that if_detach_internal()->rt_flushifroutes() is running
> > after the rtentry zone is being destroyed?  That is, maybe we're
> > destroying interfaces too late in the jail teardown process?
> > 
> I don’t think so, I expect all of the if_detach() calls to be done by the
> time we hit rtables_destroy() -> vnet_rtzone_destroy(), because that’s
> SI_SUB_PROTO_DOMAIN/SI_ORDER_FIRST.
> We should have hit vnet_if_return() (SI_SUB_VNET_DONE/SI_ORDER_ANY) by then.

Doesn't vnet_if_return() just rehome interfaces?  I'd expect
if_clone_detach() to be responsible for destroying most interfaces
(though some are destroyed by the PR_METHOD_REMOVE callback since they
hold on to resources which prevent jail teardown.)

Various drivers seem to call if_clone_detach() at SI_SUB_PSEUDO, e.g.,
vnet_gif_uninit(), which will run after SI_SUB_PROTO_DOMAIN during
teardown.

> SI_SUB_VNET_DONE is 0xdc00000, SI_SUB_PROTO_DOMAIN is 0x8800000 and the
> vnet_uninit calls are done in descending order, so VNET_DONE should be
> first.
> 
> I’m going to kick off a few test runs where I assert that V_rtzone hasn’t
> been freed yet when we’re in if_detach_interal() to confirm, because clearly
> I’m missing *something*, and it could be this.
> 
> —
> Kristof

Reply via email to