HI Aman,

On Mon, Mar 23, 2026 at 07:50:42PM +0530, Aman Dhoot wrote:
> Hi, Salvatore
> 
> On Sun, Mar 22, 2026 at 1:43???PM Salvatore Bonaccorso <[email protected]>
> wrote:
> 
> > Hi Aman,
> >
> > On Fri, Mar 20, 2026 at 10:58:40PM +0530, Aman Dhoot wrote:
> > > As you told, I bisect the kernel, and this is the log
> > >
> > > ****************************************************************
> > >
> > > $ git bisect log
> > > git bisect start
> > > # status: waiting for both good and bad commits
> > > # good: [567bd8cbc2fe6b28b78864cbbbc41b0d405eb83c] Linux 6.12.63
> > > git bisect good 567bd8cbc2fe6b28b78864cbbbc41b0d405eb83c
> > > # status: waiting for bad commit, 1 good commit known
> > > # bad: [ff2177382799753070b71747f646963147eabc7c] Linux 6.12.69
> > > git bisect bad ff2177382799753070b71747f646963147eabc7c
> > > # good: [ebdbe19336f26ffe799db842d751745098dc11ff] ASoC: renesas: rz-ssi:
> > > Fix rz_ssi_priv::hw_params_cache::sample_width
> > > git bisect good ebdbe19336f26ffe799db842d751745098dc11ff
> > > # bad: [e79b03d386341e85a4f775e0a864e8aa7633a0a2] HID: intel-ish-hid: Use
> > > dedicated unbound workqueues to prevent resume blocking
> > > git bisect bad e79b03d386341e85a4f775e0a864e8aa7633a0a2
> > > # good: [feb28b6827ece47cce585599a00b02ee579532bc] powercap: fix sscanf()
> > > error return value handling
> > > git bisect good feb28b6827ece47cce585599a00b02ee579532bc
> > > # good: [68495f89a19b6835e388b89b2ffecc0c68f9666c] selftests/landlock:
> > Fix
> > > TCP bind(AF_UNSPEC) test case
> > > git bisect good 68495f89a19b6835e388b89b2ffecc0c68f9666c
> > > # good: [4433ddc3700cea880c383a6ddfc0e2ab697f9bdf] EDAC/x38: Fix a
> > resource
> > > leak in x38_probe1()
> > > git bisect good 4433ddc3700cea880c383a6ddfc0e2ab697f9bdf
> > > # bad: [94b010200a3c9a8420a9063344cedbcd71794c8f] LoongArch: dts:
> > > loongson-2k0500: Add default interrupt controller address cells
> > > git bisect bad 94b010200a3c9a8420a9063344cedbcd71794c8f
> > > # good: [654fa76032eee5df9ce8849bdff840595952c63d] mm/page_alloc: make
> > > percpu_pagelist_high_fraction reads lock-free
> > > git bisect good 654fa76032eee5df9ce8849bdff840595952c63d
> > > # bad: [8140ac7c55e75093a01c6110a2c4025fe7177c57] drm/amd: Clean up kfd
> > > node on surprise disconnect
> > > git bisect bad 8140ac7c55e75093a01c6110a2c4025fe7177c57
> > > # good: [df7a49b328928b6d6b174d954d63721d6f3848a2] LoongArch: Fix PMU
> > > counter allocation for mixed-type event groups
> > > git bisect good df7a49b328928b6d6b174d954d63721d6f3848a2
> > > # good: [ae5b1d291c814a2884c3d54a56e83bc99052b1eb] drm/amd/display: Bump
> > > the HDMI clock to 340MHz
> > > git bisect good ae5b1d291c814a2884c3d54a56e83bc99052b1eb
> > > # first bad commit: [8140ac7c55e75093a01c6110a2c4025fe7177c57] drm/amd:
> > > Clean up kfd node on surprise disconnect
> > >
> > >
> > **********************************************************************************************
> > >
> > > When the bisect is an end, it provides this output:
> > >
> > >
> > > 8140ac7c55e75093a01c6110a2c4025fe7177c57 is the first bad commit
> > > commit 8140ac7c55e75093a01c6110a2c4025fe7177c57
> > > Author: Mario Limonciello (AMD) <[email protected]>
> > > Date:   Wed Jan 7 15:37:28 2026 -0600
> > >
> > >     drm/amd: Clean up kfd node on surprise disconnect
> > >
> > >     commit 28695ca09d326461f8078332aa01db516983e8a2 upstream.
> > >
> > >     When an eGPU is unplugged the KFD topology should also be destroyed
> > >     for that GPU. This never happens because the fini_sw callbacks never
> > >     get to run. Run them manually before calling
> > > amdgpu_device_ip_fini_early()
> > >     when a device has already been disconnected.
> > >
> > >     This location is intentionally chosen to make sure that the kfd
> > locking
> > >     refcount doesn't get incremented unintentionally.
> > >
> > >     Cc: [email protected]
> > >     Closes: https://community.frame.work/t/amd-egpu-on-linux/8691/33
> > >     Signed-off-by: Mario Limonciello (AMD) <[email protected]>
> > >     Reviewed-by: Kent Russell <[email protected]>
> > >     Signed-off-by: Alex Deucher <[email protected]>
> > >     (cherry picked from commit 6a23e7b4332c10f8b56c33a9c5431b52ecff9aab)
> > >     Cc: [email protected]
> > >     Signed-off-by: Greg Kroah-Hartman <[email protected]>
> > >
> > >  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8 ++++++++
> > >  1 file changed, 8 insertions(+)
> > >
> > >
> > > According to me, this commit exists in the kernel version 6.12.66, and
> > the
> > > problem also exists in v6.12.66
> >
> > Thanks for doing that. It looks this is a regression fixed by
> > f7afda7fcd16 ("drm/amd: Fix hang on amdgpu unload by using
> > pci_dev_is_disconnected()"), which was backported to as well 6.22.77.
> >
> > If possible it would be great if you can test that indeed this fixes
> > the problem.  Cf.
> > https://kernel-team.pages.debian.net/kernel-handbook/ch-common-tasks.html#id-1.6.6.4
> >
> > Regards,
> > Salvatore
> >
> 
> Yesterday, I tested the patch f7afda7fcd16 ("drm/amd: Fix hang on amdgpu
> unload by using pci_dev_is_disconnected()") on Debian linux-source-6.12. I
> can confirm that it fixed the issue???the hook script now runs completely and
> the VM starts, so it is working as expected.

Thanks for confirming that.

> Could you tell me when this patch will be merged into the stable Trixie
> kernel (v6.12) or into the Trixie backports kernel?

It is included in 6.12.77 and so will be picked up by the next trixie
upload for 6.12.y.

Regards,
Salvatore

Reply via email to