On 3/16/2022 10:50 PM, Rob Clark wrote:
On Tue, Mar 8, 2022 at 11:40 PM Shashank Sharma
<contactshashanksha...@gmail.com> wrote:

From: Shashank Sharma <shashank.sha...@amd.com>

This patch adds a new sysfs event, which will indicate
the userland about a GPU reset, and can also provide
some information like:
- process ID of the process involved with the GPU reset
- process name of the involved process
- the GPU status info (using flags)

This patch also introduces the first flag of the flags
bitmap, which can be appended as and when required.

V2: Addressed review comments from Christian and Amar
    - move the reset information structure to DRM layer
    - drop _ctx from struct name
    - make pid 32 bit(than 64)
    - set flag when VRAM invalid (than valid)
    - add process name as well (Amar)

Cc: Alexandar Deucher <alexander.deuc...@amd.com>
Cc: Christian Koenig <christian.koe...@amd.com>
Cc: Amaranath Somalapuram <amaranath.somalapu...@amd.com>
Signed-off-by: Shashank Sharma <shashank.sha...@amd.com>
---
  drivers/gpu/drm/drm_sysfs.c | 31 +++++++++++++++++++++++++++++++
  include/drm/drm_sysfs.h     | 10 ++++++++++
  2 files changed, 41 insertions(+)

diff --git a/drivers/gpu/drm/drm_sysfs.c b/drivers/gpu/drm/drm_sysfs.c
index 430e00b16eec..840994810910 100644
--- a/drivers/gpu/drm/drm_sysfs.c
+++ b/drivers/gpu/drm/drm_sysfs.c
@@ -409,6 +409,37 @@ void drm_sysfs_hotplug_event(struct drm_device *dev)
  }
  EXPORT_SYMBOL(drm_sysfs_hotplug_event);

+/**
+ * drm_sysfs_reset_event - generate a DRM uevent to indicate GPU reset
+ * @dev: DRM device
+ * @reset_info: The contextual information about the reset (like PID, flags)
+ *
+ * Send a uevent for the DRM device specified by @dev. This informs
+ * user that a GPU reset has occurred, so that an interested client
+ * can take any recovery or profiling measure.
+ */
+void drm_sysfs_reset_event(struct drm_device *dev, struct drm_reset_event 
*reset_info)
+{
+       unsigned char pid_str[13];
+       unsigned char flags_str[15];
+       unsigned char pname_str[TASK_COMM_LEN + 6];
+       unsigned char reset_str[] = "RESET=1";
+       char *envp[] = { reset_str, pid_str, pname_str, flags_str, NULL };
+
+       if (!reset_info) {
+               DRM_WARN("No reset info, not sending the event\n");
+               return;
+       }
+
+       DRM_DEBUG("generating reset event\n");
+
+       snprintf(pid_str, ARRAY_SIZE(pid_str), "PID=%u", reset_info->pid);
+       snprintf(pname_str, ARRAY_SIZE(pname_str), "NAME=%s", 
reset_info->pname);
+       snprintf(flags_str, ARRAY_SIZE(flags_str), "FLAGS=%u", 
reset_info->flags);
+       kobject_uevent_env(&dev->primary->kdev->kobj, KOBJ_CHANGE, envp);
+}
+EXPORT_SYMBOL(drm_sysfs_reset_event);
+
  /**
   * drm_sysfs_connector_hotplug_event - generate a DRM uevent for any connector
   * change
diff --git a/include/drm/drm_sysfs.h b/include/drm/drm_sysfs.h
index 6273cac44e47..5ba11c760619 100644
--- a/include/drm/drm_sysfs.h
+++ b/include/drm/drm_sysfs.h
@@ -1,16 +1,26 @@
  /* SPDX-License-Identifier: GPL-2.0 */
  #ifndef _DRM_SYSFS_H_
  #define _DRM_SYSFS_H_
+#include <linux/sched.h>
+
+#define DRM_GPU_RESET_FLAG_VRAM_INVALID (1 << 0)

  struct drm_device;
  struct device;
  struct drm_connector;
  struct drm_property;

+struct drm_reset_event {
+       uint32_t pid;

One side note, unrelated to devcoredump vs this..

AFAIU you probably want to be passing around a `struct pid *`, and
then somehow use pid_vnr() in the context of the process reading the
event to get the numeric pid.  Otherwise things will not do what you
expect if the process triggering the crash is in a different pid
namespace from the compositor.


I am not sure if it is a good idea to add the pid extraction complexity in here, it is left upto the driver to extract this information and pass it to the work queue. In case of AMDGPU, its extracted from GPU VM. It would be then more flexible for the drivers as well.

- Shashank

BR,
-R

+       uint32_t flags;
+       char pname[TASK_COMM_LEN];
+};
+
  int drm_class_device_register(struct device *dev);
  void drm_class_device_unregister(struct device *dev);

  void drm_sysfs_hotplug_event(struct drm_device *dev);
+void drm_sysfs_reset_event(struct drm_device *dev, struct drm_reset_event 
*reset_info);
  void drm_sysfs_connector_hotplug_event(struct drm_connector *connector);
  void drm_sysfs_connector_status_event(struct drm_connector *connector,
                                       struct drm_property *property);
--
2.32.0

Reply via email to