Hi, On 15/11/15 13:32, Chris Wilson wrote: > When waiting for high frequency requests, the finite amount of time > required to set up the irq and wait upon it limits the response rate. By > busywaiting on the request completion for a short while we can service > the high frequency waits as quick as possible. However, if it is a slow > request, we want to sleep as quickly as possible. The tradeoff between > waiting and sleeping is roughly the time it takes to sleep on a request, > on the order of a microsecond. Based on measurements from big core, I > have set the limit for busywaiting as 2 microseconds.
Sounds like solid reasoning. Would it also be worth finding the trade off limit for small core? > The code currently uses the jiffie clock, but that is far too coarse (on > the order of 10 milliseconds) and results in poor interactivity as the > CPU ends up being hogged by slow requests. To get microsecond resolution > we need to use a high resolution timer. The cheapest of which is polling > local_clock(), but that is only valid on the same CPU. If we switch CPUs > because the task was preempted, we can also use that as an indicator that > the system is too busy to waste cycles on spinning and we should sleep > instead. Hm, need_resched would not cover the CPU switch anyway? Or maybe need_resched means something other than I thought which is "there are other runnable tasks"? This would also have impact on the patch subject line.I thought we would burn a jiffie of CPU cycles only if there are no other runnable tasks - so how come an impact on interactivity? Also again I think the commit message needs some data on how this was found and what is the impact. Btw as it happens, just last week as I was playing with perf, I did notice busy spinning is the top cycle waster in some benchmarks. I was in the process of trying to quantize the difference with it on or off but did not complete it. > __i915_spin_request was introduced in > commit 2def4ad99befa25775dd2f714fdd4d92faec6e34 [v4.2] > Author: Chris Wilson <chris at chris-wilson.co.uk> > Date: Tue Apr 7 16:20:41 2015 +0100 > > drm/i915: Optimistically spin for the request completion > > Reported-by: Jens Axboe <axboe at kernel.dk> > Link: https://lkml.org/lkml/2015/11/12/621 > Cc: Jens Axboe <axboe at kernel.dk> > Cc; "Rogozhkin, Dmitry V" <dmitry.v.rogozhkin at intel.com> > Cc: Daniel Vetter <daniel.vetter at ffwll.ch> > Cc: Tvrtko Ursulin <tvrtko.ursulin at linux.intel.com> > Cc: Eero Tamminen <eero.t.tamminen at intel.com> > Cc: "Rantala, Valtteri" <valtteri.rantala at intel.com> > Cc: stable at kernel.vger.org > --- > drivers/gpu/drm/i915/i915_gem.c | 28 +++++++++++++++++++++++++--- > 1 file changed, 25 insertions(+), 3 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c > index 740530c571d1..2a88158bd1f7 100644 > --- a/drivers/gpu/drm/i915/i915_gem.c > +++ b/drivers/gpu/drm/i915/i915_gem.c > @@ -1146,14 +1146,36 @@ static bool missed_irq(struct drm_i915_private > *dev_priv, > return test_bit(ring->id, &dev_priv->gpu_error.missed_irq_rings); > } > > +static u64 local_clock_us(unsigned *cpu) > +{ > + u64 t; > + > + *cpu = get_cpu(); > + t = local_clock() >> 10; Needs comment I think to explicitly mention the approximation, or maybe drop the _us suffix? > + put_cpu(); > + > + return t; > +} > + > +static bool busywait_stop(u64 timeout, unsigned cpu) > +{ > + unsigned this_cpu; > + > + if (time_after64(local_clock_us(&this_cpu), timeout)) > + return true; > + > + return this_cpu != cpu; > +} > + > static int __i915_spin_request(struct drm_i915_gem_request *req, int state) > { > - unsigned long timeout; > + u64 timeout; > + unsigned cpu; > > if (i915_gem_request_get_ring(req)->irq_refcount) > return -EBUSY; > > - timeout = jiffies + 1; > + timeout = local_clock_us(&cpu) + 2; > while (!need_resched()) { > if (i915_gem_request_completed(req, true)) > return 0; > @@ -1161,7 +1183,7 @@ static int __i915_spin_request(struct > drm_i915_gem_request *req, int state) > if (signal_pending_state(state, current)) > break; > > - if (time_after_eq(jiffies, timeout)) > + if (busywait_stop(timeout, cpu)) > break; > > cpu_relax_lowlatency(); > Otherwise looks good. Not sure what would you convert to 32-bit from your follow up reply since you need us resolution? Regards, Tvrtko