I believe that intel_gt_retire_requests_timeout() should return either
-ETIME if all time designated by timeout argument has been consumed while
waiting for fences being signaled, or remaining time if there are requests
still not retired, or 0 otherwise.  In the latter case, remaining time
should be passed back via remaining_timeout argument.

Remaining time is updated with return value of each consecutive call to
dma_fence_wait_timeout().  If an error code is returned instead of
remaining time, a few potentially unexpected side effects occur:
- we no longer wait for consecutive timelines' last request fences being
  signaled before we try to retire requests from those timelines -- while
  expected in case of -ETIME, that's probably not intended in case of
  other errors that dma_fence_wait_timeout() can return,
- the error code (a negative value) is passed back as remaining time and
  if no more requests happen to be left pending despite the error, a user
  may pass that value forward as a remaining timeout -- that can
  potentially trigger a WARN or BUG,
- potentially unexpected error code is returned to user when a
  non-critical error that probably shouldn't stop the user from retrying
  occurs while active requests are still pending.
Moreover, should dma_fence_wait_timeout() ever return 0 (which should mean
timeout expiration) while we are processing requests and there are still
pending requests when we are about to return, that 0 value is returned to
user like if all requests were successfully retired.

Ignore error codes from dma_fence_wait_timeout() other than -ETIME and
don't overwrite remaining time with those error codes.  Also, convert 0
value returned by dma_fence_wait_timeout() to -ETIME.

Fixes: f33a8a51602c ("drm/i915: Merge wait_for_timelines with retire_request")
Signed-off-by: Janusz Krzysztofik <[email protected]>
Cc: [email protected] # v5.5+
---
 drivers/gpu/drm/i915/gt/intel_gt_requests.c | 17 ++++++++++++++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.c 
b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
index edb881d756309..6c3b8ac3055c3 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_requests.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
@@ -156,11 +156,22 @@ long intel_gt_retire_requests_timeout(struct intel_gt 
*gt, long timeout,
 
                        fence = i915_active_fence_get(&tl->last_request);
                        if (fence) {
+                               signed long time_left;
+
                                mutex_unlock(&tl->mutex);
 
-                               timeout = dma_fence_wait_timeout(fence,
-                                                                true,
-                                                                timeout);
+                               time_left = dma_fence_wait_timeout(fence,
+                                                                  true,
+                                                                  timeout);
+                               /*
+                                * 0 or -ETIME: timeout expired
+                                * other errors: ignore, assume no time consumed
+                                */
+                               if (time_left == -ETIME || time_left == 0)
+                                       timeout = -ETIME;
+                               else if (time_left > 0)
+                                       timeout = time_left;
+
                                dma_fence_put(fence);
 
                                /* Retirement is best effort */
-- 
2.25.1

Reply via email to