On Thu, May 14, 2026 at 9:39 PM Mikko Perttunen <[email protected]> wrote: > > On Wednesday, May 13, 2026 1:26 PM Aaron Kling wrote: > > On Tue, May 12, 2026 at 10:26 PM Mikko Perttunen <[email protected]> > > wrote: > > > > > > On Tuesday, May 12, 2026 2:29 PM Aaron Kling wrote: > > > > There is an issue with tegra-drm where some buffers get created, then > > > > freed, but the dma buffer never gets freed. Causing display controller > > > > memory allocations to start failing after the leaks fill up cma. > > > > > > > > I created an issue on the freedesktop issue tracker [0] with a patch > > > > with some debug logs I added, then a log from Android that contains > > > > these logs. CMA is set to 512MB, and when allocations start to fail, > > > > the unfreed allocations add up to just shy of 500MB, where it's > > > > reasonable to expect that 8MB contiguous is no longer available. The > > > > log was generated on a Jetson TX2 NX, but I have seen this leak on > > > > other archs as well, this also does not appear to be limited to soc's > > > > with nvdisplay. > > > > > > > > This does not appear to be a userspace issue. The graphics allocator > > > > works as expected for other soc vendors. And as the logs show, the > > > > delete dumb buffer ioctl is called, but is not always followed by the > > > > dma buffer getting freed. I have also observed this issue with a > > > > gralloc that uses the tegra gem create and such, this is not unique to > > > > dumb buffers, that's just the last log I had when deciding to post the > > > > issue to lkml. > > > > > > > > What I primarily intend to ask here is how to further debug this > > > > issue. I'm not finding any direct path between the delete dumb ioctl > > > > handling and gem release or tegra bo free. Can someone point me to the > > > > pieces in the middle I'm missing, where the logic is to decide is a > > > > buffer should be freed? > > > > > > > > Aaron > > > > > > > > [0] https://gitlab.freedesktop.org/drm/tegra/-/work_items/9 > > > > > > > > > > If the issue is specific to buffers that get used with display, I have > > > an idea of what the issue is -- there is some circular reference > > > counting with the BO cache in the host1x driver, and that means that > > > BOs that end up in the cache never get released. > > > > As far as I know, this only affects display controller buffers. Though > > unfortunately, I have limited ways to test the media engines right > > now. > > I've been working on some more userspace for the media engines. > Hopefully I can get that in shape soon.
Great to hear. My android use case unfortunately has some very specific requirements, namely a c2 aidl hal. But maybe with more examples of the uapi in action, I can try looking at one again. Though, my last attempt using the existing nvdec example had my head spinning in about 3 seconds flat between that and the c2 api. > > > > > Let me do some testing locally and I'll send out a patch once ready. > > > > Sounds good, thanks. > > I posted a fix, please give it a try. Incidentally, on my side I don't > have that much testing set up for the display :) My initial test run on p2972 using swiftshader is looking good for this specific issue at least. Part way through a vts run and I haven't got any allocation fails, far past where I got them previously. However, this may have peeled back that onion to another problem. I'm getting stack traces from shared plane atomics, and a lot of mmu faults during the graphics tests. I'll see if I can narrow down a simple reproduction and trace down the cause. And I'll check the bo caching patch on a few other devices, then post a tested-by on there if they work. Aaron
