On Thu, 8 Jan 2026 10:28:00 +0000 Steven Price <[email protected]> wrote:
> Hi Boris, > > Happy new year! > > On 08/01/2026 10:10, Boris Brezillon wrote: > > drm_gem_put_pages(), which we rely on for returning BO pages to shmem, > > assume per-folio refcounting and not per-page. If we call > > shmem_read_mapping_page() per-page, we break this assumption and leak > > pages every time we get a huge page allocated. > > > > Cc: Loïc Molinari <[email protected]> > > Fixes: c12e9fcb5a5a ("drm/panfrost: Introduce huge tmpfs mountpoint option") > > Signed-off-by: Boris Brezillon <[email protected]> > > --- > > drivers/gpu/drm/panfrost/panfrost_mmu.c | 37 +++++++++++++++++++++---- > > 1 file changed, 31 insertions(+), 6 deletions(-) > > > > diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c > > b/drivers/gpu/drm/panfrost/panfrost_mmu.c > > index 8f3b7a7b6ad0..9b61ad98a77e 100644 > > --- a/drivers/gpu/drm/panfrost/panfrost_mmu.c > > +++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c > > @@ -595,6 +595,7 @@ static int panfrost_mmu_map_fault_addr(struct > > panfrost_device *pfdev, int as, > > pgoff_t page_offset; > > struct sg_table *sgt; > > struct page **pages; > > + u32 nr_pages; > > > > bomapping = addr_to_mapping(pfdev, as, addr); > > if (!bomapping) > > @@ -613,6 +614,7 @@ static int panfrost_mmu_map_fault_addr(struct > > panfrost_device *pfdev, int as, > > addr &= ~((u64)SZ_2M - 1); > > page_offset = addr >> PAGE_SHIFT; > > page_offset -= bomapping->mmnode.start; > > + nr_pages = bo->base.base.size >> PAGE_SHIFT; > > > > obj = &bo->base.base; > > > > @@ -626,8 +628,7 @@ static int panfrost_mmu_map_fault_addr(struct > > panfrost_device *pfdev, int as, > > goto err_unlock; > > } > > > > - pages = kvmalloc_array(bo->base.base.size >> PAGE_SHIFT, > > - sizeof(struct page *), GFP_KERNEL | > > __GFP_ZERO); > > + pages = kvmalloc_array(nr_pages, sizeof(struct page *), > > GFP_KERNEL | __GFP_ZERO); > > if (!pages) { > > kvfree(bo->sgts); > > bo->sgts = NULL; > > @@ -650,6 +651,8 @@ static int panfrost_mmu_map_fault_addr(struct > > panfrost_device *pfdev, int as, > > mapping_set_unevictable(mapping); > > > > for (i = page_offset; i < page_offset + NUM_FAULT_PAGES; i++) { > > + struct folio *folio; > > + > > /* Can happen if the last fault only partially filled this > > * section of the pages array before failing. In that case > > * we skip already filled pages. > > @@ -657,12 +660,34 @@ static int panfrost_mmu_map_fault_addr(struct > > panfrost_device *pfdev, int as, > > if (pages[i]) > > continue; > > > > - pages[i] = shmem_read_mapping_page(mapping, i); > > - if (IS_ERR(pages[i])) { > > - ret = PTR_ERR(pages[i]); > > - pages[i] = NULL; > > + folio = shmem_read_folio(mapping, i); > > + if (IS_ERR(folio)) { > > + ret = PTR_ERR(folio); > > goto err_unlock; > > } > > + > > + /* We always fill the page array at a folio granularity so > > + * there's no reason for a missing page to not be the first > > + * in the folio. We want to ensure that's the case to avoid > > + * unbalanced folio_{get,put}() leading to leaks, because > > + * drm_gem_put_pages() assumes per-folio refcounting not > > + * per-page. > > + */ > > I'm a little uneasy about this reasoning. Above we do mask the address > to be a multiple of 2MB, but the folio could (in theory at least) be > bigger than 2MB. So I don't see what stops a GPU job faulting in the > middle of a buffer and triggering this warning. > > Can we not walk backwards to the beginning of the folio if the address > isn't aligned and check that? Yep, I assumed the heap buffer would be faulted from bottom to top (which I think is a valid assumption for the tiler, but I might be wrong). I can certainly revisit the logic to map on both side if we're in the middle of a folio and drop this WARN_ON(). > > > + if (WARN_ON(folio_file_page(folio, i) != folio_page(folio, 0))) > > { > > + folio_put(folio); > > + ret = -EINVAL; > > + goto err_unlock; > > + } > > + > > + /* Fill a folio worth of pages, even if it goes beyond > > + * NUM_FAULT_PAGES, otherwise we'll end up with unbalanced > > + * refcounting next time we hit an unmapped section crossing > > + * this folio. > > + */ > > + for (u32 j = 0; j < folio_nr_pages(folio) && i < nr_pages; j++) > > > > This "i < nr_pages" check is wrong - presumably it should be "i + j < > nr_pages". Oops, yep. It comes from a version where I was doing pages[i++] = folio_pages(folio, j); > > > + pages[i + j] = folio_page(folio, j); > > + > > + i += folio_nr_pages(folio) - 1; > > I feel like the outer for() loop needs reworking into a different form. > It makes complete sense to walk in terms of folios, but we've got this > weird mix of pages and folios going on, leading to odd things like this > "- 1" fix up here. Yep, I agree. I was initially trying to keep the diff as small as possible, but it's a bit messy as it is.
