On 2023-06-01 16:47, James Zhu wrote:
Don't sleep when event age unmatch, and update last_event_age.
It is only for KFD_EVENT_TYPE_SIGNAL which is checked by user space.

Signed-off-by: James Zhu <[email protected]>
---
  drivers/gpu/drm/amd/amdkfd/kfd_events.c | 13 +++++++++++++
  1 file changed, 13 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_events.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
index b27a79c5f826..23e154811471 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_events.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
@@ -964,6 +964,19 @@ int kfd_wait_on_events(struct kfd_process *p,
                                        event_data.event_id);
                if (ret)
                        goto out_unlock;
+
+               /* last_event_age = 0 reserved for backward compatible */
+               if (event_data.last_event_age &&
+                       event_waiters[i].event->event_age != 
event_data.last_event_age) {
+                       event_data.last_event_age = 
event_waiters[i].event->event_age;
+                       WRITE_ONCE(event_waiters[i].activated, true);

I think this is probably not necessary if you're returning before the first call to test_event_condition.


+
+                       if (copy_to_user(&events[i], &event_data,
+                               sizeof(struct kfd_event_data))) {
+                               ret = -EFAULT;
+                               goto out_unlock;
+                       }
+               }

When waiting on multiple events, it would be more efficient to catch all events with outdated age in a single system call, instead of returning after finding the first one. Then return after the loop if it found one or more outdated events.

Also EFAULT is the wrong error code. It means "bad address". I think we could return 0 here because there is really no error. It's just a normal condition to allow user mode to update its event information before going to sleep. If you want a non-0 return code, I'd recommend -EDEADLK, because sleeping without outdated event information can lead to deadlocks. We'd also need to translate that into a meaningful status code in the Thunk to let ROCr know what's going on. I don't see a suitable status code in the current Thunk API for this.

Regards,
  Felix


        }
/* Check condition once. */

Reply via email to