On Mon, Jul 7, 2014 at 2:34 PM, Daniel Vetter <dan...@ffwll.ch> wrote:

> On Fri, Jul 04, 2014 at 11:50:32AM -0300, Paulo Zanoni wrote:
> > From: Paulo Zanoni <paulo.r.zan...@intel.com>
> >
> > The current code only runs when we do an I915_WRITE operation. It
> > checks if the unclaimed register flag is set before we do the
> > operation, and then it checks it again after we do the operation. This
> > double check allows us to find out if the I915_WRITE operation in
> > question is the bad one, or if some previous code is the bad one. When
> > it finds a problem, our code uses DRM_ERROR to signal it.
> >
> > The good thing about the current code is that it detects the problem,
> > so at least we can know we did something wrong. The problem is that
> > even though we find the problem, we don't really have much information
> > to actually debug it. So whenever I see one of these DRM_ERROR
> > messages on my systems, the first thing I do is apply a patch to
> > change the DRM_ERROR to a WARN and also check for unclaimed registers
> > on I915_READ operations. This local patch makes things even slower,
> > but it usually helps a lot in finding the bad code.
> >
> > The first point here is that since the current code is only useful to
> > detect whether we have a problem or not, but it is not really good to
> > find the cause of the problem, I don't think we should be checking
> > both before and after every I915_WRITE operation: just doing the check
> > once should be enough for us to quickly detect problems. With this
> > change, the code that runs by default for every single user will only
> > do 1 read operation for every single I915_WRITE, instead of 2. This
> > patch does this change.
> >
> > The second point is that the local patch I have should be upstream,
> > but since it makes things slower it should be disabled by default. So
> > I added the i915.mmio_debug option to enable it.
> >
> > So after this patch, this is what will happen:
> >  - By default, we will try to detect unclaimed registers once after
> >    every I915_WRITE operation. Previously we tried twice for every
> >    I915_WRITE.
> >  - When we find an unclaimed register we will still print a DRM_ERROR
> >    message, but we will now tell the user to try again with
> >    i915.mmio_debug=1.
> >  - When we use i915.mmio_debug=1 we will try to find unclaimed
> >    registers both before and after every I915_READ and I915_WRITE
> >    operation, and we will print stack traces in case we find them.
> >    This should really help locating the exact point of the bad code
> >    (or at least finding out that i915.ko is not the problem).
> >
> > This commit also opens space for really-slow register debugging
> > operations on other platforms. In theory we can now add lots and lots
> > of debug code behind i915.mmio_debug, enable this option on our tests,
> > and catch more problems.
> >
> > Signed-off-by: Paulo Zanoni <paulo.r.zan...@intel.com>
> > ---
> >  drivers/gpu/drm/i915/i915_drv.h     |  1 +
> >  drivers/gpu/drm/i915/i915_params.c  |  6 ++++
> >  drivers/gpu/drm/i915/intel_uncore.c | 68
> +++++++++++++++++++++++++++++++++----
> >  3 files changed, 68 insertions(+), 7 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_drv.h
> b/drivers/gpu/drm/i915/i915_drv.h
> > index ac06c0f..51d867f 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.h
> > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > @@ -2092,6 +2092,7 @@ struct i915_params {
> >       bool disable_display;
> >       bool disable_vtd_wa;
> >       int use_mmio_flip;
> > +     bool mmio_debug;
> >  };
> >  extern struct i915_params i915 __read_mostly;
> >
> > diff --git a/drivers/gpu/drm/i915/i915_params.c
> b/drivers/gpu/drm/i915/i915_params.c
> > index 8145729..7977872 100644
> > --- a/drivers/gpu/drm/i915/i915_params.c
> > +++ b/drivers/gpu/drm/i915/i915_params.c
> > @@ -49,6 +49,7 @@ struct i915_params i915 __read_mostly = {
> >       .enable_cmd_parser = 1,
> >       .disable_vtd_wa = 0,
> >       .use_mmio_flip = 0,
> > +     .mmio_debug = 0,
> >  };
> >
> >  module_param_named(modeset, i915.modeset, int, 0400);
> > @@ -161,3 +162,8 @@ MODULE_PARM_DESC(enable_cmd_parser,
> >  module_param_named(use_mmio_flip, i915.use_mmio_flip, int, 0600);
> >  MODULE_PARM_DESC(use_mmio_flip,
> >                "use MMIO flips (-1=never, 0=driver discretion [default],
> 1=always)");
> > +
> > +module_param_named(mmio_debug, i915.mmio_debug, bool, 0600);
> > +MODULE_PARM_DESC(disable_vtd_wa,
>

wrong parameter here!


> > +     "Enable the MMIO debug code (default: false). This may negatively "
> > +     "affect performance.");
> > diff --git a/drivers/gpu/drm/i915/intel_uncore.c
> b/drivers/gpu/drm/i915/intel_uncore.c
> > index 29145df..de5402f 100644
> > --- a/drivers/gpu/drm/i915/intel_uncore.c
> > +++ b/drivers/gpu/drm/i915/intel_uncore.c
> > @@ -513,21 +513,72 @@ ilk_dummy_write(struct drm_i915_private *dev_priv)
> >       __raw_i915_write32(dev_priv, MI_MODE, 0);
> >  }
> >
> > +/**
> > + * hsw_unclaimed_reg_debug - tries to find unclaimed registers
> > + * @dev_priv: device private data
> > + * @reg: register we're about to touch or just have touched
> > + * @read: are we reading or writing a register now?
> > + * @before: are we running this before or after we touch the register?
> > + *
> > + * This function tries to detect unclaimed registers and provide as much
> > + * information as possible. It will only do something if the
> i915.mmio_debug
> > + * option is enabled.
> > + *
> > + * If we detect an unclaimed register when @before is true, it means
> some
> > + * unknown code wrote to an unclaimed register, and we're just
> detecting it now.
> > + * The bad code can be i915.ko, some other Kernel module (e.g.,
> snd_hda_intel,
> > + * vgacon) or even user space. If we detect it when @before is false,
> then
> > + * there's a really good chance that the read/write operation we just
> did to
> > + * @reg is what triggered the unclaimed register message, but there's
> also the
> > + * possibility that after the operation we did, and before this
> function,
> > + * something else touched another unclaimed register, and we are
> detecting it
> > + * now.
> > + *
> > + * The unclaimed register bit can get set when we touch a register that
> does not
> > + * exist and when we touch a register that exists but is powered down.
> Please
> > + * also notice that the HW can only check some register ranges, so
> there's no
> > + * guarantee that all read and write operations to inexistent registers
> will be
> > + * caught by this function.
> > + *
> > + * Also please notice that we don't run this function on __raw
> operations and in
> > + * many other cases, so the case where @before is true is quite common.
> > + */
>
> Pet peeve of mine: Comments have a cost since they get stale really fast
> and then can be awfully misleading. We have way too many examples for
> that. My rule of interface kerneldoc for functions is that we should only
> have it for exported functions used in multiple different places in the
> driver. If there's only one caller (like for _init/fini) functions the
> comment should be very terse at most.
>
> For static inline helpers in the same file otoh code should simply be
> readable enough to not require comments. Imo this is the case here, so the
> kerneldoc should imo be dropped.
>
> If you want to keep some of it I guess we could document i915.mmio_debug
> with kerneldoc and then also pull it into our i915 section in the drm
> docbook. Maybe under a new "debug options" setting. We probably need a
> DOC: section to make this work. A few sentences should be more than
> enough.
>
> >  static void
> > -hsw_unclaimed_reg_clear(struct drm_i915_private *dev_priv, u32 reg)
> > +hsw_unclaimed_reg_debug(struct drm_i915_private *dev_priv, u32 reg,
> bool read,
> > +                     bool before)
> >  {
> > +     const char *op = read ? "reading" : "writing to";
> > +     const char *when = before ? "before" : "after";
> > +
> > +     if (!i915.mmio_debug)
> > +             return;
> > +
> >       if (__raw_i915_read32(dev_priv, FPGA_DBG) & FPGA_DBG_RM_NOCLAIM) {
> > -             DRM_ERROR("Unknown unclaimed register before writing to
> %x\n",
> > -                       reg);
> > +             WARN(1, "Unclaimed register detected %s %s register
> 0x%x\n",
> > +                  when, op, reg);
> >               __raw_i915_write32(dev_priv, FPGA_DBG,
> FPGA_DBG_RM_NOCLAIM);
> >       }
> >  }
> >
> > +/**
> > + * hsw_unclaimed_reg_detect - tries to find unclaimed registers
> > + * @dev_priv: device private data
> > + *
> > + * This function will try to detect if something touched and unclaimed
> register,
> > + * triggering the FPGA_DBG bit. If this happens, we will print a
> message telling
> > + * the user to use i915.mmio_debug=1 so we can properly debug the
> problem.
> > + *
> > + * This way, we can still detect our bugs without giving the user the
> > + * performance impact of hsw_unclaimed_reg_debug().
> > + */
> >  static void
> > -hsw_unclaimed_reg_check(struct drm_i915_private *dev_priv, u32 reg)
> > +hsw_unclaimed_reg_detect(struct drm_i915_private *dev_priv)
> >  {
> > +     if (i915.mmio_debug)
> > +             return;
> > +
> >       if (__raw_i915_read32(dev_priv, FPGA_DBG) & FPGA_DBG_RM_NOCLAIM) {
> > -             DRM_ERROR("Unclaimed write to %x\n", reg);
> > +             DRM_ERROR("Unclaimed register detected. Please use the
> i915.mmio_debug=1 to debug this problem.");
>
> But imo really the only piece that's required is this helpful message here
> into dmesg.
>
> Anyway, the concept looks solid.
> -Daniel
>
> >               __raw_i915_write32(dev_priv, FPGA_DBG,
> FPGA_DBG_RM_NOCLAIM);
> >       }
> >  }
> > @@ -564,6 +615,7 @@ gen5_read##x(struct drm_i915_private *dev_priv,
> off_t reg, bool trace) { \
> >  static u##x \
> >  gen6_read##x(struct drm_i915_private *dev_priv, off_t reg, bool trace)
> { \
> >       REG_READ_HEADER(x); \
> > +     hsw_unclaimed_reg_debug(dev_priv, reg, true, true); \
> >       if (dev_priv->uncore.forcewake_count == 0 && \
> >           NEEDS_FORCE_WAKE((dev_priv), (reg))) { \
> >               dev_priv->uncore.funcs.force_wake_get(dev_priv, \
> > @@ -574,6 +626,7 @@ gen6_read##x(struct drm_i915_private *dev_priv,
> off_t reg, bool trace) { \
> >       } else { \
> >               val = __raw_i915_read##x(dev_priv, reg); \
> >       } \
> > +     hsw_unclaimed_reg_debug(dev_priv, reg, true, false); \
> >       REG_READ_FOOTER; \
> >  }
> >
> > @@ -700,12 +753,13 @@ hsw_write##x(struct drm_i915_private *dev_priv,
> off_t reg, u##x val, bool trace)
> >       if (NEEDS_FORCE_WAKE((dev_priv), (reg))) { \
> >               __fifo_ret = __gen6_gt_wait_for_fifo(dev_priv); \
> >       } \
> > -     hsw_unclaimed_reg_clear(dev_priv, reg); \
> > +     hsw_unclaimed_reg_debug(dev_priv, reg, false, true); \
> >       __raw_i915_write##x(dev_priv, reg, val); \
> >       if (unlikely(__fifo_ret)) { \
> >               gen6_gt_check_fifodbg(dev_priv); \
> >       } \
> > -     hsw_unclaimed_reg_check(dev_priv, reg); \
> > +     hsw_unclaimed_reg_debug(dev_priv, reg, false, false); \
> > +     hsw_unclaimed_reg_detect(dev_priv); \
> >       REG_WRITE_FOOTER; \
> >  }
> >
> > --
> > 2.0.0
> >
> > _______________________________________________
> > Intel-gfx mailing list
> > Intel-gfx@lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/intel-gfx
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
>

The rest looks good. With that fixed and probably with comment removed as
Daniel mentioned, feel free to use:
Reviewed-by: Rodrigo Vivi <rodrigo.v...@gmail.com>

-- 
Rodrigo Vivi
Blog: http://blog.vivi.eng.br
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Reply via email to