On Tue, May 20, 2025 at 01:32:41PM -0300, André Almeida wrote: > When a device get wedged, it might be caused by a guilty application. > For userspace, knowing which task was the cause can be useful for some > situations, like for implementing a policy, logs or for giving a chance > for the compositor to let the user know what task caused the problem. > This is an optional argument, when the task info is not available, the > PID and TASK string won't appear in the event string. > > Sometimes just the PID isn't enough giving that the task might be already > dead by the time userspace will try to check what was this PID's name, > so to make the life easier also notify what's the task's name in the user > event.
... > -int drm_dev_wedged_event(struct drm_device *dev, unsigned long method) > +int drm_dev_wedged_event(struct drm_device *dev, unsigned long method, > + struct drm_wedge_task_info *info) > { > const char *recovery = NULL; > unsigned int len, opt; > - /* Event string length up to 28+ characters with available methods */ > - char event_string[32]; > - char *envp[] = { event_string, NULL }; > + char event_string[WEDGE_STR_LEN], pid_string[PID_LEN] = "", > comm_string[TASK_COMM_LEN] = ""; > + char *envp[] = { event_string, NULL, NULL, NULL }; > > len = scnprintf(event_string, sizeof(event_string), "%s", "WEDGED="); > > @@ -582,6 +586,13 @@ int drm_dev_wedged_event(struct drm_device *dev, > unsigned long method) > drm_info(dev, "device wedged, %s\n", method == DRM_WEDGE_RECOVERY_NONE ? > "but recovered through reset" : "needs recovery"); > > + if (info && ((info->comm && info->comm[0] != '\0'))) { Thanks for adding this. Should we check if pid > 0? Also, I was wondering what if the driver only has info on one of the given members? Should we allow it to be flagged independently? > + snprintf(pid_string, sizeof(pid_string), "PID=%u", info->pid); > + snprintf(comm_string, sizeof(comm_string), "TASK=%s", > info->comm); > + envp[1] = pid_string; > + envp[2] = comm_string; > + } > + > return kobject_uevent_env(&dev->primary->kdev->kobj, KOBJ_CHANGE, envp); > } > EXPORT_SYMBOL(drm_dev_wedged_event); ... > diff --git a/include/drm/drm_device.h b/include/drm/drm_device.h > index e2f894f1b90a..c13fe85210f2 100644 > --- a/include/drm/drm_device.h > +++ b/include/drm/drm_device.h > @@ -30,6 +30,14 @@ struct pci_controller; > #define DRM_WEDGE_RECOVERY_REBIND BIT(1) /* unbind + bind driver */ > #define DRM_WEDGE_RECOVERY_BUS_RESET BIT(2) /* unbind + reset bus device + > bind */ > > +/** > + * struct drm_wedge_task_info - information about the guilty app of a wedge > dev s/app/task, missed an instance ;) > + */ > +struct drm_wedge_task_info { > + pid_t pid; > + char *comm; > +}; Raag