> Date: Wed, 8 Apr 2026 19:24:56 +0200
> From: Jeremie Courreges-Anglas <[email protected]>
>
> On Mon, Mar 16, 2026 at 01:19:36PM -0600, Theo de Raadt wrote:
> > Jeremie Courreges-Anglas <[email protected]> wrote:
> >
> > > On Mon, Mar 16, 2026 at 12:18:05PM -0600, Theo de Raadt wrote:
> > > > I'm surprised at your proposal.
> > > >
> > > > If this condition gets detected, why do you think it is fine to
> > > > continue? A kernel data structure is seriously corrupted.
> > >
> > > I'm not saying it's fine, sorry if my mail was too long to read. ;)
> > >
> > > 1. I'm not 100% sure the checks that trigger are correct, after all
> > > they're not using volatile reads. Maaaaybe that's the bug but I
> > > have no idea right now.
> > >
> > > 2. Kurt had posted this on ports@ earlier, then on bugs@, so far no
> > > one has a fix and you recently tagged 7.9. This diff is an attempt
> > > to make kmos' and users life easier before next release. Obviously
> > > everybody would be happier with a proper fix. Maybe this admittedly
> > > incomplete fix will spark a discussion?
> >
> > Maybe.
> >
> > But you cannot delete that ddb enter. You could replace it with a
> > panic. If you continue to run after that printf, the system will just
> > crash in other unknown ways which are more difficult to debug.
>
> We have proof that the system doesn't necessarily crash after that
> message is printed. kmos tested the db_enter removal yesterday
> and confirmed that he got the message on the console without the
> system crashing. Using the diff below, I got this today on my LDOM's
> console:
>
> Apr 8 11:37:26 ports /bsd: ctx_free: context 1641 still active in dmmu
> Apr 8 12:21:12 ports /bsd: ctx_free: context 7896 still active in dmmu
> Apr 8 12:24:29 ports /bsd: ctx_free: context 3150 still active in dmmu
> Apr 8 13:43:56 ports /bsd: ctx_free: context 4221 still active in dmmu
> Apr 8 15:55:50 ports /bsd: ctx_free: context 1264 still active in dmmu
> Apr 8 18:55:48 ports /bsd: ctx_free: context 5664 still active in dmmu
>
> The system is running many loops of perl subprocesses in an attempt to
> reproduce another bug:
>
> count=0; while perl t.pl; do count=$((count + 1)); done; echo $count
>
> I have zero reason to believe that this is specific to perl. eg it
> may happens when building rust which AFAIK doesn't use perl.
>
> So I stand by my initial proposal (or the variant below). I'm not
> happy either with our partial understanding of this issue, and if
> someone had a better fix, I'd be all for it. BUT the db_enter() call
> in -current and next 7.9 has so far done more harm than good.
Sorry, but this is really bad. It means stale TSB entries have been
left behind and may be re-used when the context is re-used. And that
could lead to some serious memory corruption.
If we want to paper over this issue, we should at least invalidate the
stale TSB entry. So something like:
for (i = 0; i < TSBENTS; i++) {
tag = READ_ONCE(&tsb_dmmu[i].tag);
if (TSB_TAG_CTX(tag) == oldctx) {
atomic_cas_ulong(&tsb_dmmu[i].tag, tag,
TSB_TAG_INVALID);
printf("ctx_free: context %d still active in dmmu\n",
oldctx);
}
tag = READ_ONCE(&tsb_immu[i].tag);
if (TSB_TAG_CTX(tag) == oldctx) {
atomic_cas_ulong(&tsb_dmmu[i].tag, tag,
TSB_TAG_INVALID);
printf("ctx_free: context %d still active in immu\n",
oldctx);
}
}
> Index: pmap.c
> ===================================================================
> RCS file: /cvs/src/sys/arch/sparc64/sparc64/pmap.c,v
> diff -u -p -r1.127 pmap.c
> --- pmap.c 14 Dec 2025 12:37:22 -0000 1.127
> +++ pmap.c 7 Apr 2026 08:58:11 -0000
> @@ -2597,11 +2597,10 @@ ctx_free(struct pmap *pm)
> db_enter();
> }
> for (i = 0; i < TSBENTS; i++) {
> - if (TSB_TAG_CTX(tsb_dmmu[i].tag) == oldctx ||
> - TSB_TAG_CTX(tsb_immu[i].tag) == oldctx) {
> - printf("ctx_free: context %d still active\n", oldctx);
> - db_enter();
> - }
> + if (TSB_TAG_CTX(tsb_dmmu[i].tag) == oldctx)
> + printf("ctx_free: context %d still active in dmmu\n",
> oldctx);
> + if (TSB_TAG_CTX(tsb_immu[i].tag) == oldctx)
> + printf("ctx_free: context %d still active in immu\n",
> oldctx);
> }
> #endif
> /* We should verify it has not been stolen and reallocated... */
>
>
>
> --
> jca
>