The -stable version "crash1" was reproducible almost every run; each run is about an hour on this 8-processor Power9 running a load average about 30. The -current version "crash2" has only happened once so far, though because of other issues (hitting a user process limit of 126) it was failing early until I recognized the issue today.
It is certainly not a clean, condensed, reproducible bug and I ordinarily would not report. But given folks were just in the code, I thought better to say something just in case. I'm fine with leaving the kernel unchanged until I can get a more reproducible case. If there is anything you'd like me to print from ddb upon a crash, if this becomes one of those annoying every-few-weeks issues, please let me know. I'm happy to leave the machine sitting at ddb> for a few hours or days in that case. On Mon, May 20, 2024 at 6:08 PM Jeremie Courreges-Anglas <j...@wxcvbn.org> wrote: > > On Tue, May 21, 2024 at 02:51:39AM +0200, Jeremie Courreges-Anglas wrote: > > On Sat, May 18, 2024 at 01:11:56PM -0700, Eric Grosse wrote: > > > The openbsd-ppc64-n2vi Go builder machine is converting over to LUCI > > > build infrastructure and the new workload may have stepped on a > > > pagedaemon corner case. While running 7.5-stable I reproducibly get > > > kernel panics "pmap_enter: failed to allocate pted". I saw recent > > > powerpc64/pmap.c changes from gkoehler@ and kettenis@, so updated the > > > machine to 7.5-snapshot and now see "trap type 300" from pmap_remove. > > > > Is that also reproducible? cc'ing bugs@. > > > > > In an effort to reproduce this with a more familiar workload, I tried > > > "/usr/src$ make -j32 build" to pound on the hardware with a similar > > > load average and temperature, but that runs without crashing. I'd > > > welcome suggestions on anything I can do to reduce this to a useful > > > bug report. > > > > > > https://n2vi.com/t.dmesg latest dmesg > > > https://n2vi.com/t.crash1 ddb serial console from the 7.5-stable panics > > > > This doesn't look powerpc64-specific. It feels like > > uvm_km_kmemalloc_pla() should call pmap_enter() with PMAP_CANFAIL and > > unwind in case of a resource shortage. > > The diff below behaves when I inject fake pmap_enter() failures on > amd64. It would be nice to test it on -stable and/or -current, > depending on whether it happens on -stable only or also on -current. > > > diff --git a/sys/uvm/uvm_km.c b/sys/uvm/uvm_km.c > index a715173529a..3779ea3d7ee 100644 > --- a/sys/uvm/uvm_km.c > +++ b/sys/uvm/uvm_km.c > @@ -335,7 +335,7 @@ uvm_km_kmemalloc_pla(struct vm_map *map, struct > uvm_object *obj, vsize_t size, > vaddr_t kva, loopva; > voff_t offset; > struct vm_page *pg; > - struct pglist pgl; > + struct pglist pgl, pgldone; > int pla_flags; > > KASSERT(vm_map_pmap(map) == pmap_kernel()); > @@ -372,6 +372,7 @@ uvm_km_kmemalloc_pla(struct vm_map *map, struct > uvm_object *obj, vsize_t size, > * whom should ever get a handle on this area of VM. > */ > TAILQ_INIT(&pgl); > + TAILQ_INIT(&pgldone); > pla_flags = 0; > KASSERT(uvmexp.swpgonly <= uvmexp.swpages); > if ((flags & UVM_KMF_NOWAIT) || > @@ -396,6 +397,7 @@ uvm_km_kmemalloc_pla(struct vm_map *map, struct > uvm_object *obj, vsize_t size, > while (loopva != kva + size) { > pg = TAILQ_FIRST(&pgl); > TAILQ_REMOVE(&pgl, pg, pageq); > + TAILQ_INSERT_TAIL(&pgldone, pg, pageq); > uvm_pagealloc_pg(pg, obj, offset, NULL); > atomic_clearbits_int(&pg->pg_flags, PG_BUSY); > UVM_PAGE_OWN(pg, NULL); > @@ -408,9 +410,28 @@ uvm_km_kmemalloc_pla(struct vm_map *map, struct > uvm_object *obj, vsize_t size, > pmap_kenter_pa(loopva, VM_PAGE_TO_PHYS(pg), > PROT_READ | PROT_WRITE); > } else { > - pmap_enter(map->pmap, loopva, VM_PAGE_TO_PHYS(pg), > + if (pmap_enter(map->pmap, loopva, VM_PAGE_TO_PHYS(pg), > PROT_READ | PROT_WRITE, > - PROT_READ | PROT_WRITE | PMAP_WIRED); > + PROT_READ | PROT_WRITE | PMAP_WIRED | > + PMAP_CANFAIL) != 0) { > + pmap_remove(map->pmap, kva, loopva); > + > + while ((pg = TAILQ_LAST(&pgldone, pglist))) { > + TAILQ_REMOVE(&pgldone, pg, pageq); > + TAILQ_INSERT_HEAD(&pgl, pg, pageq); > + uvm_lock_pageq(); > + uvm_pageclean(pg); > + uvm_unlock_pageq(); > + } > + > + if (obj != NULL) > + rw_exit(obj->vmobjlock); > + > + uvm_unmap(map, kva, kva + size); > + uvm_pglistfree(&pgl); > + > + return 0; > + } > } > loopva += PAGE_SIZE; > offset += PAGE_SIZE; > > > -- > jca