During the KFD/KCQ coordination rework, bad queues not requiring reset were combined into the rework and generated wrong reset signals to the process. Fix it by adding the reset check.
Signed-off-by: Amber Lin <[email protected]> --- drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c index 1d12901d4823..828a7ce6eeca 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c @@ -474,7 +474,11 @@ static int reset_queues_mes(struct device_queue_manager *dqm, struct queue *q) goto fail; dqm->detect_hang_count = num_hung; - kfd_signal_reset_event(dqm->dev); + /* When MES doesn't detect any queue hang, no reset happens. Don't signal reset + * event. + */ + if (dqm->detect_hang_count) + kfd_signal_reset_event(dqm->dev); fail: dqm->detect_hang_count = 0; -- 2.43.0
