Hi Steven,

Thanks for the fix, I've tested it and it fixes the outstanding issue.

However, including the perfcnt sample buffer in the DebugFS GEMs file raises 
the question of what
to do with its labelling, because it isn't exposed to UM through a handle, so 
my previous assumption
about not needing to handle static labels when tagging BO's within the driver 
no longer holds.

This might require some quick rewrite so that the sample BO can be displayed 
with a fitting name.

On 19.05.2025 17:02, Steven Price wrote:
> On 15/05/2025 19:04, Daniel Stone wrote:
> > Hi Steven,
> >
> > On Thu, 8 May 2025 at 11:42, Steven Price <steven.pr...@arm.com> wrote:
> >> I'm also seeing a splat when running this, see below. I haven't got my
> >> head around how this is happening, but I see it when glmark quits at the
> >> end of the test.
> >>
> >> [  399.505066] Unable to handle kernel NULL pointer dereference at virtual 
> >> address 00000004 when write
> >> [...]
> >> [  399.882216] Call trace:
> >> [  399.882222]  panfrost_gem_free_object [panfrost] from 
> >> drm_gem_handle_delete+0x84/0xb0
> >> [  399.893813]  drm_gem_handle_delete from drm_ioctl+0x2b8/0x4f4
> >> [  399.900237]  drm_ioctl from sys_ioctl+0x428/0xe30
> >> [  399.905496]  sys_ioctl from ret_fast_syscall+0x0/0x1c
> >
> > Soooo. Let's assume it has to actually occur in
> > panfrost_gem_debugfs_bo_rm(), since that's all that's changed here.
> >
> > I don't think pfdev can be NULL here, because we've already
> > dereferenced ptdev and written to a structure member earlier in
> > panfrost_gem_free_object(). I don't think it can be the debugfs mutex,
> > because a) that's initialised with the device, and b) wouldn't be
> > offset 0x4.
> >
> > I'm looking then at list_del_init(&bo->debugfs.node), which would
> > effectively execute bo->debugfs.node->next->prev =
> > bo->debugfs.node->prev. So if bo->debugfs.node->next was NULL, that
> > would explain a write to 0x4 on 32-bit systems.
>
> So I finally got some time to do some debugging on this. And you are
> absolutely correct on where the fault is triggered.
>
> The cause of it is that panfrost_gem_debugfs_bo_add() is called from
> panfrost_gem_create(), but that isn't the only place that Panfrost GEM
> objects are created - it turns out panfrost_perfcnt_enable_locked() also
> calls drm_gem_shmem_create(). And in that case the list next/prev
> pointers are left set to NULL, causing things to blow up when the GEM
> object is freed.
>
> The below patch gets things working, or alternatively just init the list
> in panfrost_gem_create_object() if we don't want to include the perfcnt
> buffer in the list.

> Steve
>
> ---8<--
> diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.c
> b/drivers/gpu/drm/panfrost/panfrost_gem.c
> index fe2cdbe8baf0..51da13cd81f0 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_gem.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_gem.c
> @@ -297,13 +297,14 @@ struct drm_gem_object
> *panfrost_gem_create_object(struct drm_device *dev, size_t
>         obj->base.map_wc = !pfdev->coherent;
>         mutex_init(&obj->label.lock);
>
> +       panfrost_gem_debugfs_bo_add(pfdev, obj);
> +
>         return &obj->base.base;
>  }
>
>  struct panfrost_gem_object *
>  panfrost_gem_create(struct drm_device *dev, size_t size, u32 flags)
>  {
> -       struct panfrost_device *pfdev = dev->dev_private;
>         struct drm_gem_shmem_object *shmem;
>         struct panfrost_gem_object *bo;
>
> @@ -319,8 +320,6 @@ panfrost_gem_create(struct drm_device *dev, size_t
> size, u32 flags)
>         bo->noexec = !!(flags & PANFROST_BO_NOEXEC);
>         bo->is_heap = !!(flags & PANFROST_BO_HEAP);
>
> -       panfrost_gem_debugfs_bo_add(pfdev, bo);
> -
>         return bo;
>  }

Reply via email to