On Thu, May 14, 2026 at 9:39 PM Mikko Perttunen <[email protected]> wrote:
>
> On Wednesday, May 13, 2026 1:26 PM Aaron Kling wrote:
> > On Tue, May 12, 2026 at 10:26 PM Mikko Perttunen <[email protected]> 
> > wrote:
> > >
> > > On Tuesday, May 12, 2026 2:29 PM Aaron Kling wrote:
> > > > There is an issue with tegra-drm where some buffers get created, then
> > > > freed, but the dma buffer never gets freed. Causing display controller
> > > > memory allocations to start failing after the leaks fill up cma.
> > > >
> > > > I created an issue on the freedesktop issue tracker [0] with a patch
> > > > with some debug logs I added, then a log from Android that contains
> > > > these logs. CMA is set to 512MB, and when allocations start to fail,
> > > > the unfreed allocations add up to just shy of 500MB, where it's
> > > > reasonable to expect that 8MB contiguous is no longer available. The
> > > > log was generated on a Jetson TX2 NX, but I have seen this leak on
> > > > other archs as well, this also does not appear to be limited to soc's
> > > > with nvdisplay.
> > > >
> > > > This does not appear to be a userspace issue. The graphics allocator
> > > > works as expected for other soc vendors. And as the logs show, the
> > > > delete dumb buffer ioctl is called, but is not always followed by the
> > > > dma buffer getting freed. I have also observed this issue with a
> > > > gralloc that uses the tegra gem create and such, this is not unique to
> > > > dumb buffers, that's just the last log I had when deciding to post the
> > > > issue to lkml.
> > > >
> > > > What I primarily intend to ask here is how to further debug this
> > > > issue. I'm not finding any direct path between the delete dumb ioctl
> > > > handling and gem release or tegra bo free. Can someone point me to the
> > > > pieces in the middle I'm missing, where the logic is to decide is a
> > > > buffer should be freed?
> > > >
> > > > Aaron
> > > >
> > > > [0] https://gitlab.freedesktop.org/drm/tegra/-/work_items/9
> > > >
> > >
> > > If the issue is specific to buffers that get used with display, I have
> > > an idea of what the issue is -- there is some circular reference
> > > counting with the BO cache in the host1x driver, and that means that
> > > BOs that end up in the cache never get released.
> >
> > As far as I know, this only affects display controller buffers. Though
> > unfortunately, I have limited ways to test the media engines right
> > now.
>
> I've been working on some more userspace for the media engines.
> Hopefully I can get that in shape soon.

Great to hear. My android use case unfortunately has some very
specific requirements, namely a c2 aidl hal. But maybe with more
examples of the uapi in action, I can try looking at one again.
Though, my last attempt using the existing nvdec example had my head
spinning in about 3 seconds flat between that and the c2 api.

> >
> > > Let me do some testing locally and I'll send out a patch once ready.
> >
> > Sounds good, thanks.
>
> I posted a fix, please give it a try. Incidentally, on my side I don't
> have that much testing set up for the display :)

My initial test run on p2972 using swiftshader is looking good for
this specific issue at least. Part way through a vts run and I haven't
got any allocation fails, far past where I got them previously.
However, this may have peeled back that onion to another problem. I'm
getting stack traces from shared plane atomics, and a lot of mmu
faults during the graphics tests. I'll see if I can narrow down a
simple reproduction and trace down the cause. And I'll check the bo
caching patch on a few other devices, then post a tested-by on there
if they work.

Aaron

Reply via email to