Hi Boris,

On 1/21/26 14:52, Boris Brezillon wrote:
On Wed, 21 Jan 2026 11:49:34 +0000
Akash Goel <[email protected]> wrote:

Hi Boris,

On 1/9/26 13:08, Boris Brezillon wrote:
From: Akash Goel <[email protected]>

This implementation is losely based on the MSM shrinker, and it's
relying on the drm_gpuvm eviction/validation infrastructure.

Right now we only support swapout/eviction, but we could add an extra
flag to specify when buffer content doesn't need to be preserved to
avoid the swapout/swapin dance.

Locking is a bit of a nightmare, but using _trylock() all the way in
the reclaim path seems to make lockdep happy. And yes, we might be
missing opportunities to reclaim when the system is under heavy GPU
load/heavy memory pressure/heavy GPU VM activity, but that's better
than no reclaim at all.

Signed-off-by: Akash Goel <[email protected]>
Co-developed-by: Boris Brezillon <[email protected]>
Signed-off-by: Boris Brezillon <[email protected]>
---
   drivers/gpu/drm/panthor/panthor_device.c |  11 +-
   drivers/gpu/drm/panthor/panthor_device.h |  73 ++++
   drivers/gpu/drm/panthor/panthor_gem.c    | 427 ++++++++++++++++++++++-
   drivers/gpu/drm/panthor/panthor_gem.h    |  67 ++++
   drivers/gpu/drm/panthor/panthor_mmu.c    | 338 +++++++++++++++++-
   drivers/gpu/drm/panthor/panthor_mmu.h    |   8 +
   6 files changed, 901 insertions(+), 23 deletions(-)



/** +void panthor_vm_update_bo_reclaim_lru_locked(struct panthor_gem_object *bo)
+{
+       struct panthor_device *ptdev = container_of(bo->base.dev, struct 
panthor_device, base);
+       struct panthor_vm *vm = NULL;
+       struct drm_gpuvm_bo *vm_bo;
+
+       dma_resv_assert_held(bo->base.resv);
+       lockdep_assert_held(&bo->base.gpuva.lock);
+
+       drm_gem_for_each_gpuvm_bo(vm_bo, &bo->base) {
+               /* We're only supposed to have one vm_bo in the list if we get 
there. */
+               drm_WARN_ON(&ptdev->base, vm);
+               vm = container_of(vm_bo->vm, struct panthor_vm, base);
+
+               mutex_lock(&ptdev->reclaim.lock);
+               drm_gem_lru_move_tail_locked(&vm->reclaim.lru, &bo->base);
+               if (list_empty(&vm->reclaim.lru_node))
+                       list_move(&vm->reclaim.lru_node, &ptdev->reclaim.vms);
+               mutex_unlock(&ptdev->reclaim.lock);
+       }
+}
+
+int panthor_vm_evict_bo_mappings_locked(struct panthor_gem_object *bo)
+{
+       struct drm_gpuvm_bo *vm_bo;
+
+       drm_gem_for_each_gpuvm_bo(vm_bo, &bo->base) {
+               struct panthor_vm *vm = container_of(vm_bo->vm, struct 
panthor_vm, base);
+               struct drm_gpuva *va;
+
+               if (!mutex_trylock(&vm->op_lock))
+                       return -EDEADLK;
+
+               drm_gpuvm_bo_evict(vm_bo, true);
+               drm_gpuvm_bo_for_each_va(va, vm_bo) {
+                       struct panthor_vma *vma = container_of(va, struct 
panthor_vma, base);
+
+                       panthor_vm_lock_region(vm, va->va.addr, va->va.range);
+                       panthor_vm_unmap_pages(vm, va->va.addr, va->va.range);

On further testing, I ran into a kernel warning issue.

https://elixir.bootlin.com/linux/v6.18-rc5/source/drivers/iommu/io-pgtable-arm.c#L641

https://elixir.bootlin.com/linux/v6.18-rc5/source/drivers/gpu/drm/panthor/panthor_mmu.c#L930

Call trace:
   __arm_lpae_unmap+0x570/0x5c8 (P)
   __arm_lpae_unmap+0x144/0x5c8
   __arm_lpae_unmap+0x144/0x5c8
   arm_lpae_unmap_pages+0xe8/0x120
   panthor_vm_unmap_pages+0x1f0/0x3f8 [panthor]
   panthor_vm_evict_bo_mappings_locked+0xf4/0x1d8 [panthor]
   panthor_gem_try_evict+0x190/0x7c8 [panthor]
   drm_gem_lru_scan+0x388/0x698

Following sequence leads to the kernel warnings.

- BO is mapped to GPU and is in GPU_MAPPED_PRIVATE state.

- Shrinker callback gets invoked and that BO is evicted. As a result BO
is moved to UNRECLAIMABLE state.

- Userspace accesses the evicted BO and panthor_gem_fault() gets the
backing pages again. BO is moved back to GPU_MAPPED_PRIVATE state (even
though technically the BO is still in evicted state, both vm_bo->evicted
and vma->evicted are TRUE.

- Shrinker callback is invoked again and tries to evict the same BO.

- panthor_vm_evict_bo_mappings_locked() calls panthor_vm_unmap_pages()
and the kernel warnings are emiited as PTEs are already invalid.

Yep, it looks like the other side of the problem pointed out by Steve:
CPU mappings can make the buffer reclaimable again, but those are still
evicted from the VM PoV.



Made the following change locally to avoid the warning.

diff --git a/drivers/gpu/drm/panthor/panthor_mmu.c
b/drivers/gpu/drm/panthor/panthor_mmu.c
index ffd821b3be46..e0a1dda1675f 100644
--- a/drivers/gpu/drm/panthor/panthor_mmu.c
+++ b/drivers/gpu/drm/panthor/panthor_mmu.c
@@ -2344,6 +2344,8 @@ int panthor_vm_evict_bo_mappings_locked(struct
panthor_gem_object *bo)
                  drm_gpuvm_bo_for_each_va(va, vm_bo) {
                          struct panthor_vma *vma = container_of(va,
struct panthor_vma, base);

+                       if (vma->evicted)
+                               continue;
                          panthor_vm_lock_region(vm, va->va.addr,
va->va.range);
                          panthor_vm_unmap_pages(vm, va->va.addr,
va->va.range);
                          panthor_vm_unlock_region(vm);



Do you think we can also update is_gpu_mapped() so that an evicted BO
moves to MMAPPED state, instead of GPU_MAPPED_PRIVATE state, on CPU
access ?.

Not sure if the following change makes sense.

diff --git a/drivers/gpu/drm/panthor/panthor_gem.c
b/drivers/gpu/drm/panthor/panthor_gem.c
index 6e91c5022d0d..8a8411fed195 100644
--- a/drivers/gpu/drm/panthor/panthor_gem.c
+++ b/drivers/gpu/drm/panthor/panthor_gem.c
@@ -125,6 +125,8 @@ static bool is_gpu_mapped(struct panthor_gem_object *bo,
          struct drm_gpuvm_bo *vm_bo;

          drm_gem_for_each_gpuvm_bo(vm_bo, &bo->base) {
+               if (vm_bo->evicted)
+                       continue;
                  if (!vm) {
                          *state = PANTHOR_GEM_GPU_MAPPED_PRIVATE;
                          vm = vm_bo->vm;


Please let me know.

Yep, this looks correct. I'll add that to my list of fixups.


Thanks.

Sorry I have a doubt.

Should we update the panthor_gem_sync() function to ensure pages can't get reclaimed whlilst they are being synced ?.

panthor_ioctl_bo_sync() takes a reference on the BO before calling panthor_gem_sync(). But I think due to this patch, the backking pages can get released in the middle of sync loop.

        sgt = panthor_gem_get_dev_sgt(bo);
        if (IS_ERR(sgt))
                return PTR_ERR(sgt);
        for_each_sgtable_dma_sg(sgt, sgl, count) {
                dma_addr_t paddr = sg_dma_address(sgl);
                size_t len = sg_dma_len(sgl);

                dma_sync_single_for_device();
                dma_sync_single_for_cpu();
        }


Please can you confirm.

Best regards
Akash


Thanks,

Boris

Reply via email to