i915: Move the modulus for ring emission to the register write

Chris Wilson Tue, 02 Aug 2016 03:15:07 -0700

On Tue, Aug 02, 2016 at 10:42:47AM +0100, Dave Gordon wrote:
> On 01/08/16 17:32, Chris Wilson wrote:
> >On Mon, Aug 01, 2016 at 05:28:34PM +0300, Joonas Lahtinen wrote:
> >>On ma, 2016-08-01 at 11:15 +0100, Chris Wilson wrote:
> >>>On Mon, Aug 01, 2016 at 01:07:55PM +0300, Joonas Lahtinen wrote:
> >>>>
> >>>>On ma, 2016-08-01 at 10:10 +0100, Chris Wilson wrote:
> >>>>>
> >>>>>Space reservation is already safe with respect to the ring->size
> >>>>>modulus, but hardware only expects to see values in the range
> >>>>>0...ring->size-1 (inclusive) and so requires the modulus to prevent us
> >>>>>writing the value ring->size instead of 0. As this is only required for
> >>>>>the register itself, we can defer the modulus to the register update and
> >>>>>not perform it after every command packet. We keep the
> >>>>>intel_ring_advance() around in the code to provide demarcation for the
> >>>>>end-of-packet (which then can be compared against intel_ring_begin() as
> >>>>>the number of dwords emitted must match the reserved space).
> >>>>>
> >>>>>Signed-off-by: Chris Wilson <[email protected]>
> >>>>>Cc: Joonas Lahtinen <[email protected]>
> >>>>>Cc: Dave Gordon <[email protected]>
> >>>>>---
> >>>>> drivers/gpu/drm/i915/intel_lrc.c        |  2 +-
> >>>>> drivers/gpu/drm/i915/intel_ringbuffer.c |  6 ++++--
> >>>>> drivers/gpu/drm/i915/intel_ringbuffer.h | 17 +++++++++++++----
> >>>>> 3 files changed, 18 insertions(+), 7 deletions(-)
> >>>>>
> >>>>>diff --git a/drivers/gpu/drm/i915/intel_lrc.c 
> >>>>>b/drivers/gpu/drm/i915/intel_lrc.c
> >>>>>index bf42a66d6624..824f7efe4e64 100644
> >>>>>--- a/drivers/gpu/drm/i915/intel_lrc.c
> >>>>>+++ b/drivers/gpu/drm/i915/intel_lrc.c
> >>>>>@@ -373,7 +373,7 @@ static void execlists_update_context(struct 
> >>>>>drm_i915_gem_request *rq)
> >>>>>         struct i915_hw_ppgtt *ppgtt = rq->ctx->ppgtt;
> >>>>>         uint32_t *reg_state = rq->ctx->engine[engine->id].lrc_reg_state;
> >>>>>
> >>>>>-        reg_state[CTX_RING_TAIL+1] = rq->tail;
> >>>>>+        reg_state[CTX_RING_TAIL+1] = intel_ring_offset(rq->ring, 
> >>>>>rq->tail);
> >>>>>
> >>>>>         /* True 32b PPGTT with dynamic page allocation: update PDP
> >>>>>          * registers and point the unallocated PDPs to scratch page.
> >>>>>diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c 
> >>>>>b/drivers/gpu/drm/i915/intel_ringbuffer.c
> >>>>>index 3142085b5cc0..21d5e8209400 100644
> >>>>>--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> >>>>>+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> >>>>>@@ -1718,7 +1718,8 @@ static void i9xx_submit_request(struct 
> >>>>>drm_i915_gem_request *request)
> >>>>> {
> >>>>>         struct drm_i915_private *dev_priv = request->i915;
> >>>>>
> >>>>>-        I915_WRITE_TAIL(request->engine, request->tail);
> >>>>>+        I915_WRITE_TAIL(request->engine,
> >>>>>+                        intel_ring_offset(request->ring, 
> >>>>>request->tail));
> >>>>> }
> >>>>>
> >>>>> static void
> >>>>>@@ -2505,7 +2506,8 @@ static void gen6_bsd_submit_request(struct 
> >>>>>drm_i915_gem_request *request)
> >>>>>                 DRM_ERROR("timed out waiting for the BSD ring to wake 
> >>>>> up\n");
> >>>>>
> >>>>>         /* Now that the ring is fully powered up, update the tail */
> >>>>>-        I915_WRITE_FW(RING_TAIL(request->engine->mmio_base), 
> >>>>>request->tail);
> >>>>>+        I915_WRITE_FW(RING_TAIL(request->engine->mmio_base),
> >>>>>+                      intel_ring_offset(request->ring, request->tail));
> >>>>>         POSTING_READ_FW(RING_TAIL(request->engine->mmio_base));
> >>>>>
> >>>>>         /* Let the ring send IDLE messages to the GT again,
> >>>>>diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h 
> >>>>>b/drivers/gpu/drm/i915/intel_ringbuffer.h
> >>>>>index 14d2ea36fb88..9ac96ddb01ee 100644
> >>>>>--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> >>>>>+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> >>>>>@@ -460,14 +460,23 @@ static inline void intel_ring_emit_reg(struct 
> >>>>>intel_ring *ring, i915_reg_t reg)
> >>>>>
> >>>>> static inline void intel_ring_advance(struct intel_ring *ring)
> >>>>> {
> >>>>>+        /* Dummy function.
> >>>>>+         *
> >>>>>+         * This serves as a placeholder in the code so that the reader
> >>>>>+         * can compare against the preceding intel_ring_begin() and
> >>>>>+         * check that the number of dwords emitted matches the space
> >>>>>+         * reserved for the command packet (i.e. the value passed to
> >>>>>+         * intel_ring_begin()).
> >>>>>+         */
> >>>>>+}
> >>>>>+
> >>>>>+static inline u32 intel_ring_offset(struct intel_ring *ring, u32 value)
> >>>>>+{
> >>>>>         /* The modulus is required so that we avoid writing
> >>>>>          * request->tail == ring->size, rather than the expected 0,
> >>>>>          * into the RING_TAIL register as that can cause a GPU hang.
> >>>>>-         * As this is only strictly required for the request->tail,
> >>>>>-         * and only then as we write the value into hardware, we can
> >>>>>-         * one day remove the modulus after every command packet.
> >>>>>          */
> >>>>>-        ring->tail &= ring->size - 1;
> >>>>>+        return value & (ring->size - 1);
> >>>>> }
> >>>>The comment seems outdated-ish as it speaks of modulus which is nowhere
> >>>>to be seen. I'd speak of 'masking'. With that,
> >>>The operation is a modulus. The common optimisation for moduli with
> >>>a power-of-two divisor being applied.
> >>
> >>Yes, I'm perfectly aware of that, but as discussed in IRC, that
> >>information would be good to be local. Modulus in this case a trick to
> >>achieve value == ring->size ? 0 : value; Which would again be more
> >>informative, we do not expect wrap, but we have one special value.
> >
> >Ditch all the verbage, and go with the single line:
> >     /* Don't write ring->size (equivalent to 0) as that hangs some GPUs. */
> >that just fits into 80cols.
> >
> >Keep it to why not how and we need not argue about language.
> >-Chris
> 
> Well yes, except this function isn't writing anything to anywhere,
> so it seems rather disconnected from the point of use. Perhaps
> 
> /* ensure value (written to TAIL) is strictly less than ring->size */
> 
> which is even shorter :)
> 
> BTW, a completely different way to avoid this problem would just be
> to ensure the packet never finished at the last word of the ring.
> Setting effective_size = size-1 would do just that!
> 
> If we set effective_size < size for engines that don't like TAIL ==
> ring->size (as well as the ones that don't like the last cacheline)
> we could then remove the don't-wrap-midsequence restriction from the
> others, which would make better use of the ring space :)


We can't just remove that as we still have the limitation that we can't
split instructions, and the intel_ring_begin() is all the information we
have at that point regarding instruction length.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 13/73] drm/i915: Move the modulus for ring emission to the register write

Reply via email to