On Tue, 2026-03-17 at 16:39 +0100, Maarten Lankhorst wrote:
> 
> 
> Den 2026-03-17 kl. 16:26, skrev Thomas Hellström:
> > On Fri, 2026-03-13 at 16:17 +0100, Maarten Lankhorst wrote:
> > > When trying to do a rather aggressive test of igt's
> > > "xe_module_load
> > > --r reload" with a full desktop environment and game running I
> > > noticed
> > > a few OOPSes when dereferencing freed pointers, related to
> > > framebuffers and property blobs after the compositor exits.
> > > 
> > > Solve this by guarding the freeing in drm_file with
> > > drm_dev_enter/exit,
> > > and immediately put the references from struct drm_file objects
> > > during
> > > drm_dev_unplug().
> > > 
> > > Related warnings for framebuffers on the subtest:
> > > [  739.713076] ------------[ cut here ]------------
> > >                WARN_ON(!list_empty(&dev->mode_config.fb_list))
> > > [  739.713079] WARNING: drivers/gpu/drm/drm_mode_config.c:584 at
> > > drm_mode_config_cleanup+0x30b/0x320 [drm], CPU#12:
> > > xe_module_load/13145
> > > ....
> > > [  739.713328] Call Trace:
> > > [  739.713330]  <TASK>
> > > [  739.713335]  ? intel_pmdemand_destroy_state+0x11/0x20 [xe]
> > > [  739.713574]  ? intel_atomic_global_obj_cleanup+0xe4/0x1a0 [xe]
> > > [  739.713794]  intel_display_driver_remove_noirq+0x51/0xb0 [xe]
> > > [  739.714041]  xe_display_fini_early+0x33/0x50 [xe]
> > > [  739.714284]  devm_action_release+0xf/0x20
> > > [  739.714294]  devres_release_all+0xad/0xf0
> > > [  739.714301]  device_unbind_cleanup+0x12/0xa0
> > > [  739.714305]  device_release_driver_internal+0x1b7/0x210
> > > [  739.714311]  device_driver_detach+0x14/0x20
> > > [  739.714315]  unbind_store+0xa6/0xb0
> > > [  739.714319]  drv_attr_store+0x21/0x30
> > > [  739.714322]  sysfs_kf_write+0x48/0x60
> > > [  739.714328]  kernfs_fop_write_iter+0x16b/0x240
> > > [  739.714333]  vfs_write+0x266/0x520
> > > [  739.714341]  ksys_write+0x72/0xe0
> > > [  739.714345]  __x64_sys_write+0x19/0x20
> > > [  739.714347]  x64_sys_call+0xa15/0xa30
> > > [  739.714355]  do_syscall_64+0xd8/0xab0
> > > [  739.714361]  entry_SYSCALL_64_after_hwframe+0x4b/0x53
> > > 
> > > and
> > > 
> > > [  739.714459] ------------[ cut here ]------------
> > > [  739.714461] xe 0000:67:00.0: [drm]
> > > drm_WARN_ON(!list_empty(&fb-
> > > > filp_head))
> > > [  739.714464] WARNING: drivers/gpu/drm/drm_framebuffer.c:833 at
> > > drm_framebuffer_free+0x6c/0x90 [drm], CPU#12:
> > > xe_module_load/13145
> > > [  739.714715] RIP: 0010:drm_framebuffer_free+0x7a/0x90 [drm]
> > > ...
> > > [  739.714869] Call Trace:
> > > [  739.714871]  <TASK>
> > > [  739.714876]  drm_mode_config_cleanup+0x26a/0x320 [drm]
> > > [  739.714998]  ? __drm_printfn_seq_file+0x20/0x20 [drm]
> > > [  739.715115]  ? drm_mode_config_cleanup+0x207/0x320 [drm]
> > > [  739.715235]  intel_display_driver_remove_noirq+0x51/0xb0 [xe]
> > > [  739.715576]  xe_display_fini_early+0x33/0x50 [xe]
> > > [  739.715821]  devm_action_release+0xf/0x20
> > > [  739.715828]  devres_release_all+0xad/0xf0
> > > [  739.715843]  device_unbind_cleanup+0x12/0xa0
> > > [  739.715850]  device_release_driver_internal+0x1b7/0x210
> > > [  739.715856]  device_driver_detach+0x14/0x20
> > > [  739.715860]  unbind_store+0xa6/0xb0
> > > [  739.715865]  drv_attr_store+0x21/0x30
> > > [  739.715868]  sysfs_kf_write+0x48/0x60
> > > [  739.715873]  kernfs_fop_write_iter+0x16b/0x240
> > > [  739.715878]  vfs_write+0x266/0x520
> > > [  739.715886]  ksys_write+0x72/0xe0
> > > [  739.715890]  __x64_sys_write+0x19/0x20
> > > [  739.715893]  x64_sys_call+0xa15/0xa30
> > > [  739.715900]  do_syscall_64+0xd8/0xab0
> > > [  739.715905]  entry_SYSCALL_64_after_hwframe+0x4b/0x53
> > > 
> > > and then finally file close blows up:
> > > 
> > > [  743.186530] Oops: general protection fault, probably for non-
> > > canonical address 0xdead000000000122: 0000 [#1] SMP
> > > [  743.186535] CPU: 3 UID: 1000 PID: 3453 Comm: kwin_wayland
> > > Tainted:
> > > G        W           7.0.0-rc1-valkyria+ #110 PREEMPT_{RT,(lazy)}
> > > [  743.186537] Tainted: [W]=WARN
> > > [  743.186538] Hardware name: Gigabyte Technology Co., Ltd. X299
> > > AORUS Gaming 3/X299 AORUS Gaming 3-CF, BIOS F8n 12/06/2021
> > > [  743.186539] RIP: 0010:drm_framebuffer_cleanup+0x55/0xc0 [drm]
> > > [  743.186588] Code: d8 72 73 0f b6 42 05 ff c3 39 c3 72 e8 49 8d
> > > bd
> > > 50 07 00 00 31 f6 e8 3a 80 d3 e1 49 8b 44 24 10 49 8d 7c 24 08 49
> > > 8b
> > > 54 24 08 <48> 3b 38 0f 85 95 7f 02 00 48 3b 7a 08 0f 85 8b 7f 02
> > > 00
> > > 48 89 42
> > > [  743.186589] RSP: 0018:ffffc900085e3cf8 EFLAGS: 00010202
> > > [  743.186591] RAX: dead000000000122 RBX: 0000000000000001 RCX:
> > > ffffffff8217ed03
> > > [  743.186592] RDX: dead000000000100 RSI: 0000000000000000 RDI:
> > > ffff88814675ba08
> > > [  743.186593] RBP: ffffc900085e3d10 R08: 0000000000000000 R09:
> > > 0000000000000000
> > > [  743.186593] R10: 0000000000000000 R11: 0000000000000000 R12:
> > > ffff88814675ba00
> > > [  743.186594] R13: ffff88810d778000 R14: ffff888119f6dca0 R15:
> > > ffff88810c660bb0
> > > [  743.186595] FS:  00007ff377d21280(0000)
> > > GS:ffff888cec3f8000(0000)
> > > knlGS:0000000000000000
> > > [  743.186596] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [  743.186596] CR2: 000055690b55e000 CR3: 0000000113586003 CR4:
> > > 00000000003706f0
> > > [  743.186597] Call Trace:
> > > [  743.186598]  <TASK>
> > > [  743.186603]  intel_user_framebuffer_destroy+0x12/0x90 [xe]
> > > [  743.186722]  drm_framebuffer_free+0x3a/0x90 [drm]
> > > [  743.186750]  ? trace_hardirqs_on+0x5f/0x120
> > > [  743.186754]  drm_mode_object_put+0x51/0x70 [drm]
> > > [  743.186786]  drm_fb_release+0x105/0x190 [drm]
> > > [  743.186812]  ? rt_mutex_slowunlock+0x3aa/0x410
> > > [  743.186817]  ? rt_spin_lock+0xea/0x1b0
> > > [  743.186819]  drm_file_free+0x1e0/0x2c0 [drm]
> > > [  743.186843]  drm_release_noglobal+0x91/0xf0 [drm]
> > > [  743.186865]  __fput+0x100/0x2e0
> > > [  743.186869]  fput_close_sync+0x40/0xa0
> > > [  743.186870]  __x64_sys_close+0x3e/0x80
> > > [  743.186873]  x64_sys_call+0xa07/0xa30
> > > [  743.186879]  do_syscall_64+0xd8/0xab0
> > > [  743.186881]  entry_SYSCALL_64_after_hwframe+0x4b/0x53
> > > [  743.186882] RIP: 0033:0x7ff37e567732
> > > [  743.186884] Code: 08 0f 85 a1 38 ff ff 49 89 fb 48 89 f0 48 89
> > > d7
> > > 48 89 ce 4c 89 c2 4d 89 ca 4c 8b 44 24 08 4c 8b 4c 24 10 4c 89 5c
> > > 24
> > > 08 0f 05 <c3> 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 f3 0f 1e fa
> > > 55
> > > bf 01 00
> > > [  743.186885] RSP: 002b:00007ffc818169a8 EFLAGS: 00000246
> > > ORIG_RAX:
> > > 0000000000000003
> > > [  743.186886] RAX: ffffffffffffffda RBX: 00007ffc81816a30 RCX:
> > > 00007ff37e567732
> > > [  743.186887] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
> > > 0000000000000012
> > > [  743.186888] RBP: 00007ffc818169d0 R08: 0000000000000000 R09:
> > > 0000000000000000
> > > [  743.186889] R10: 0000000000000000 R11: 0000000000000246 R12:
> > > 000055d60a7996e0
> > > [  743.186889] R13: 00007ffc81816a90 R14: 00007ffc81816a90 R15:
> > > 000055d60a782a30
> > > [  743.186892]  </TASK>
> > > [  743.186893] Modules linked in: rfcomm snd_hrtimer xt_CHECKSUM
> > > xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp
> > > xt_addrtype nft_compat x_tables nft_chain_nat nf_nat nf_conntrack
> > > nf_defrag_ipv6 nf_defrag_ipv4 nf_tables overlay cfg80211 bnep
> > > mtd_intel_dg snd_hda_codec_intelhdmi mtd snd_hda_codec_hdmi
> > > nls_utf8
> > > mxm_wmi intel_wmi_thunderbolt gigabyte_wmi wmi_bmof xe drm_gpuvm
> > > drm_gpusvm_helper i2c_algo_bit drm_buddy drm_ttm_helper ttm video
> > > drm_suballoc_helper gpu_sched drm_client_lib drm_exec
> > > drm_display_helper cec drm_kunit_helpers drm_kms_helper kunit
> > > x86_pkg_temp_thermal intel_powerclamp coretemp
> > > snd_hda_codec_alc882
> > > snd_hda_codec_realtek_lib snd_hda_codec_generic snd_hda_intel
> > > snd_soc_avs snd_soc_hda_codec snd_hda_ext_core snd_hda_codec
> > > snd_hwdep snd_hda_core snd_intel_dspcfg snd_soc_core snd_compress
> > > ac97_bus snd_pcm snd_seq snd_seq_device snd_timer i2c_i801
> > > i2c_mux
> > > snd i2c_smbus btusb btrtl btbcm btmtk btintel bluetooth
> > > ecdh_generic
> > > rfkill ecc mei_me mei ioatdma dca wmi nfsd drm i2c_dev fuse
> > > nfnetlink
> > > [  743.186938] ---[ end trace 0000000000000000 ]---
> > > 
> > > And for property blobs:
> > > 
> > > void drm_mode_config_cleanup(struct drm_device *dev)
> > > {
> > > ...
> > >   list_for_each_entry_safe(blob, bt, &dev-
> > > > mode_config.property_blob_list,
> > >                            head_global) {
> > >           drm_property_blob_put(blob);
> > >   }
> > > 
> > > Resulting in:
> > > 
> > > [  371.072940] BUG: unable to handle page fault for address:
> > > 000001ffffffffff
> > > [  371.072944] #PF: supervisor read access in kernel mode
> > > [  371.072945] #PF: error_code(0x0000) - not-present page
> > > [  371.072947] PGD 0 P4D 0
> > > [  371.072950] Oops: Oops: 0000 [#1] SMP
> > > [  371.072953] CPU: 0 UID: 1000 PID: 3693 Comm: kwin_wayland Not
> > > tainted 7.0.0-rc1-valkyria+ #111 PREEMPT_{RT,(lazy)}
> > > [  371.072956] Hardware name: Gigabyte Technology Co., Ltd. X299
> > > AORUS Gaming 3/X299 AORUS Gaming 3-CF, BIOS F8n 12/06/2021
> > > [  371.072957] RIP:
> > > 0010:drm_property_destroy_user_blobs+0x3b/0x90
> > > [drm]
> > > [  371.073019] Code: 00 00 48 83 ec 10 48 8b 86 30 01 00 00 48 39
> > > c3
> > > 74 59 48 89 c2 48 8d 48 c8 48 8b 00 4c 8d 60 c8 eb 04 4c 8d 60 c8
> > > 48
> > > 8b 71 40 <48> 39 16 0f 85 39 32 01 00 48 3b 50 08 0f 85 2f 32 01
> > > 00
> > > 48 89 70
> > > [  371.073021] RSP: 0018:ffffc90006a73de8 EFLAGS: 00010293
> > > [  371.073022] RAX: 000001ffffffffff RBX: ffff888118a1a930 RCX:
> > > ffff8881b92355c0
> > > [  371.073024] RDX: ffff8881b92355f8 RSI: 000001ffffffffff RDI:
> > > ffff888118be4000
> > > [  371.073025] RBP: ffffc90006a73e08 R08: ffff8881009b7300 R09:
> > > ffff888cecc5b000
> > > [  371.073026] R10: ffffc90006a73e90 R11: 0000000000000002 R12:
> > > 000001ffffffffc7
> > > [  371.073027] R13: ffff888118a1a980 R14: ffff88810b366d20 R15:
> > > ffff888118a1a970
> > > [  371.073028] FS:  00007f1faccbb280(0000)
> > > GS:ffff888cec2db000(0000)
> > > knlGS:0000000000000000
> > > [  371.073029] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [  371.073030] CR2: 000001ffffffffff CR3: 000000010655c001 CR4:
> > > 00000000003706f0
> > > [  371.073031] Call Trace:
> > > [  371.073033]  <TASK>
> > > [  371.073036]  drm_file_free+0x1df/0x2a0 [drm]
> > > [  371.073077]  drm_release_noglobal+0x7a/0xe0 [drm]
> > > [  371.073113]  __fput+0xe2/0x2b0
> > > [  371.073118]  fput_close_sync+0x40/0xa0
> > > [  371.073119]  __x64_sys_close+0x3e/0x80
> > > [  371.073122]  x64_sys_call+0xa07/0xa30
> > > [  371.073126]  do_syscall_64+0xc0/0x840
> > > [  371.073130]  entry_SYSCALL_64_after_hwframe+0x4b/0x53
> > > [  371.073132] RIP: 0033:0x7f1fb3501732
> > > [  371.073133] Code: 08 0f 85 a1 38 ff ff 49 89 fb 48 89 f0 48 89
> > > d7
> > > 48 89 ce 4c 89 c2 4d 89 ca 4c 8b 44 24 08 4c 8b 4c 24 10 4c 89 5c
> > > 24
> > > 08 0f 05 <c3> 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 f3 0f 1e fa
> > > 55
> > > bf 01 00
> > > [  371.073135] RSP: 002b:00007ffe8e6f0278 EFLAGS: 00000246
> > > ORIG_RAX:
> > > 0000000000000003
> > > [  371.073136] RAX: ffffffffffffffda RBX: 00007ffe8e6f0300 RCX:
> > > 00007f1fb3501732
> > > [  371.073137] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
> > > 0000000000000012
> > > [  371.073138] RBP: 00007ffe8e6f02a0 R08: 0000000000000000 R09:
> > > 0000000000000000
> > > [  371.073139] R10: 0000000000000000 R11: 0000000000000246 R12:
> > > 00005585ba46eea0
> > > [  371.073140] R13: 00007ffe8e6f0360 R14: 00007ffe8e6f0360 R15:
> > > 00005585ba458a30
> > > [  371.073143]  </TASK>
> > > [  371.073144] Modules linked in: rfcomm snd_hrtimer xt_addrtype
> > > xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4
> > > xt_tcpudp nft_compat x_tables nft_chain_nat nf_nat nf_conntrack
> > > nf_defrag_ipv6 nf_defrag_ipv4 nf_tables overlay cfg80211 bnep
> > > snd_hda_codec_intelhdmi snd_hda_codec_hdmi mtd_intel_dg mtd
> > > nls_utf8
> > > wmi_bmof mxm_wmi gigabyte_wmi intel_wmi_thunderbolt xe drm_gpuvm
> > > drm_gpusvm_helper i2c_algo_bit drm_buddy drm_ttm_helper ttm video
> > > drm_suballoc_helper gpu_sched drm_client_lib drm_exec
> > > drm_display_helper cec drm_kunit_helpers drm_kms_helper kunit
> > > x86_pkg_temp_thermal intel_powerclamp coretemp
> > > snd_hda_codec_alc882
> > > snd_hda_codec_realtek_lib snd_hda_codec_generic snd_hda_intel
> > > snd_soc_avs snd_soc_hda_codec snd_hda_ext_core snd_hda_codec
> > > snd_hwdep snd_hda_core snd_intel_dspcfg snd_soc_core snd_compress
> > > ac97_bus snd_pcm snd_seq snd_seq_device snd_timer i2c_i801 btusb
> > > i2c_mux i2c_smbus btrtl snd btbcm btmtk btintel bluetooth
> > > ecdh_generic rfkill ecc mei_me mei ioatdma dca wmi nfsd drm
> > > i2c_dev
> > > fuse nfnetlink
> > > [  371.073198] CR2: 000001ffffffffff
> > > [  371.073199] ---[ end trace 0000000000000000 ]---
> > > 
> > > Add a guard around file close, and ensure the warnings from
> > > drm_mode_config
> > > do not trigger. Fix those by allowing an open reference to the
> > > file
> > > descriptor
> > > and cleaning up the file linked list entry in
> > > drm_mode_config_cleanup().
> > > 
> > > Cc: Thomas Hellström <[email protected]>
> > > Signed-off-by: Maarten Lankhorst <[email protected]>
> > > ---
> > >  drivers/gpu/drm/drm_file.c        | 5 ++++-
> > >  drivers/gpu/drm/drm_mode_config.c | 9 ++++++---
> > >  2 files changed, 10 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/drm_file.c
> > > b/drivers/gpu/drm/drm_file.c
> > > index ec820686b3021..f52141f842a1f 100644
> > > --- a/drivers/gpu/drm/drm_file.c
> > > +++ b/drivers/gpu/drm/drm_file.c
> > > @@ -233,6 +233,7 @@ static void drm_events_release(struct
> > > drm_file
> > > *file_priv)
> > >  void drm_file_free(struct drm_file *file)
> > >  {
> > >   struct drm_device *dev;
> > > + int idx;
> > >  
> > >   if (!file)
> > >           return;
> > > @@ -249,9 +250,11 @@ void drm_file_free(struct drm_file *file)
> > >  
> > >   drm_events_release(file);
> > >  
> > > - if (drm_core_check_feature(dev, DRIVER_MODESET)) {
> > > + if (drm_core_check_feature(dev, DRIVER_MODESET) &&
> > > +     drm_dev_enter(dev, &idx)) {
> > >           drm_fb_release(file);
> > >           drm_property_destroy_user_blobs(dev, file);
> > > +         drm_dev_exit(idx);
> > >   }
> > >  
> > >   if (drm_core_check_feature(dev, DRIVER_SYNCOBJ))
> > > diff --git a/drivers/gpu/drm/drm_mode_config.c
> > > b/drivers/gpu/drm/drm_mode_config.c
> > > index 84ae8a23a3678..e349418978f79 100644
> > > --- a/drivers/gpu/drm/drm_mode_config.c
> > > +++ b/drivers/gpu/drm/drm_mode_config.c
> > > @@ -583,10 +583,13 @@ void drm_mode_config_cleanup(struct
> > > drm_device
> > > *dev)
> > >    */
> > >   WARN_ON(!list_empty(&dev->mode_config.fb_list));
> > >   list_for_each_entry_safe(fb, fbt, &dev-
> > > >mode_config.fb_list,
> > > head) {
> > > -         struct drm_printer p = drm_dbg_printer(dev,
> > > DRM_UT_KMS, "[leaked fb]");
> > > +         if (list_empty(&fb->filp_head) ||
> > > drm_framebuffer_read_refcount(fb) > 1) {
> > 
> > This looks a bit scary. Can someone manipulate the fb_list and even
> > free fbs while we are iterating? Or is all other manipulation
> > blocked
> > by the device being unplugged?
> The code already frees the framebuffer here as there is nothing
> running
> that can still reference it.
> 
> The framebuffers are no longer used as everything display is already
> torn down,
> and the device unplugged. That's what the drm_dev_enter/exit in
> drm_file.c are
> there to protect.

OK, great.

Reviewed-by: Thomas Hellström <[email protected]>


> 
> Kind regards,
> ~Maarten Lankhorst

Reply via email to