Stuart Henderson <[email protected]> wrote:
> On 2021/01/11 17:42, Stuart Henderson wrote:
> > On 2021/01/11 17:39, Stuart Henderson wrote:
> > > I've hit this twice too, each after about 20h uptime. Forks are failing
> > > ENOMEM. top reports plenty (several GB) free. No pool alloc failures in
> > > vmstat -m. Seen with
> > > 
> > > OpenBSD 6.8-current (GENERIC.MP) #265: Sat Jan  9 01:54:38 MST 2021
> > >     [email protected]:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> > > 
> > > Not seen with
> > > 
> > > OpenBSD 6.8-current (GENERIC.MP) #263: Thu Jan  7 00:42:27 MST 2021
> > >     [email protected]:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> > 
> > Forgot to say: things recover if I kill X. They go pear shaped again quickly
> > if I restart X.
> 
> Currently running with "Let vmalloc() use km_alloc(9) instead of malloc(9) and
> let kvmalloc() only use malloc(9) for small (less than a page) allocations
> and atomic allocations" backed out and OK at 22h uptime.

I'm still on the same kernel borked kernel. Indeed, it works fine for
some time after killing X, but eventually rans out of memory.
Interestingly, today I got the following in the console

Jan 12 21:32:49 oolong /bsd: i915_vma_coredump_create: stub
Jan 12 21:32:49 oolong last message repeated 8 times
Jan 12 21:32:49 oolong /bsd: pool_fini: stub
Jan 12 21:32:49 oolong /bsd: drm:pid41046:intel_gt_reset *NOTICE* Resetting 
chip for stopped heartbeat on rcs0
Jan 12 21:32:49 oolong /bsd: drm:pid41046:mark_guilty *NOTICE* Xorg[51304] 
context reset due to GPU hang
Jan 12 21:33:04 oolong /bsd: i915_vma_core_dump_create: stub
Jan 12 21:33:04 oolong last message repeated 8 times
Jan 12 21:33:04 oolong /bsd: pool_fini: stub
Jan 12 21:33:04 oolong /bsd: err_free_sgl: stub
Jan 12 21:33:04 oolong /bsd: drm:pid52120:intel_gt_reset *NOTICE* Resetting 
chip for stopped heartbeat on rcs0
Jan 12 21:33:04 oolong /bsd: drm:pid52120:mark_guilty *NOTICE* Xorg[51304] 
context reset due to GPU hang

I was also suspecting from that commit, as it seems the only relevant
thing (from my lack of knowledge) happened between 07-Jan and 09-Jan,
based on the dates of your previously working kernel.

Yesterday I compiled a kernel with a couple of printfs hoping to be
lucky and catch some leak, but nothing obvious shows up. Diff and
message.0.gz (~200k lines of which only 1000 aren't the result of those
printfs) below in case it helps. The short bursts every one second
seems related to the rendering my status bar, and I _think_ the longer
bursts are related to xconsole rendering as it was flooding.

Index: drm_linux.c
===================================================================
RCS file: /home/cvs/src/sys/dev/pci/drm/drm_linux.c,v
retrieving revision 1.75
diff -u -p -r1.75 drm_linux.c
--- drm_linux.c 8 Jan 2021 23:02:09 -0000       1.75
+++ drm_linux.c 11 Jan 2021 20:41:09 -0000
@@ -518,18 +518,24 @@ vfree(const void *addr)
 void *
 kvmalloc(size_t size, gfp_t flags)
 {
+       void *p;
        if ((flags & M_NOWAIT) || size < PAGE_SIZE)
-               return malloc(size, M_DRM, flags);
-       if (flags & M_ZERO)
-               return vzalloc(size);
+               p = malloc(size, M_DRM, flags);
+       else if (flags & M_ZERO)
+               p = vzalloc(size);
        else
-               return vmalloc(size);
+               p = vmalloc(size);
+       printf("(!) kvmalloc %p %zu 0x%08x %d 0x%08x\n", p, size,
+           flags & M_NOWAIT, size < PAGE_SIZE, flags & M_ZERO);
+       return p;
 }
 
 void
 kvfree(const void *addr)
 {
-       if (is_vmalloc_addr(addr))
+       bool t = is_vmalloc_addr(addr);
+       printf("(!) kvmfree %p %d\n", addr, t);
+       if (t)
                vfree(addr);
        else
                free((void *)addr, M_DRM, 0);

Attachment: message.0.gz
Description: application/gzip

Reply via email to