On Mon, Jul 7, 2014 at 2:34 PM, Daniel Vetter <dan...@ffwll.ch> wrote:
> On Fri, Jul 04, 2014 at 11:50:32AM -0300, Paulo Zanoni wrote: > > From: Paulo Zanoni <paulo.r.zan...@intel.com> > > > > The current code only runs when we do an I915_WRITE operation. It > > checks if the unclaimed register flag is set before we do the > > operation, and then it checks it again after we do the operation. This > > double check allows us to find out if the I915_WRITE operation in > > question is the bad one, or if some previous code is the bad one. When > > it finds a problem, our code uses DRM_ERROR to signal it. > > > > The good thing about the current code is that it detects the problem, > > so at least we can know we did something wrong. The problem is that > > even though we find the problem, we don't really have much information > > to actually debug it. So whenever I see one of these DRM_ERROR > > messages on my systems, the first thing I do is apply a patch to > > change the DRM_ERROR to a WARN and also check for unclaimed registers > > on I915_READ operations. This local patch makes things even slower, > > but it usually helps a lot in finding the bad code. > > > > The first point here is that since the current code is only useful to > > detect whether we have a problem or not, but it is not really good to > > find the cause of the problem, I don't think we should be checking > > both before and after every I915_WRITE operation: just doing the check > > once should be enough for us to quickly detect problems. With this > > change, the code that runs by default for every single user will only > > do 1 read operation for every single I915_WRITE, instead of 2. This > > patch does this change. > > > > The second point is that the local patch I have should be upstream, > > but since it makes things slower it should be disabled by default. So > > I added the i915.mmio_debug option to enable it. > > > > So after this patch, this is what will happen: > > - By default, we will try to detect unclaimed registers once after > > every I915_WRITE operation. Previously we tried twice for every > > I915_WRITE. > > - When we find an unclaimed register we will still print a DRM_ERROR > > message, but we will now tell the user to try again with > > i915.mmio_debug=1. > > - When we use i915.mmio_debug=1 we will try to find unclaimed > > registers both before and after every I915_READ and I915_WRITE > > operation, and we will print stack traces in case we find them. > > This should really help locating the exact point of the bad code > > (or at least finding out that i915.ko is not the problem). > > > > This commit also opens space for really-slow register debugging > > operations on other platforms. In theory we can now add lots and lots > > of debug code behind i915.mmio_debug, enable this option on our tests, > > and catch more problems. > > > > Signed-off-by: Paulo Zanoni <paulo.r.zan...@intel.com> > > --- > > drivers/gpu/drm/i915/i915_drv.h | 1 + > > drivers/gpu/drm/i915/i915_params.c | 6 ++++ > > drivers/gpu/drm/i915/intel_uncore.c | 68 > +++++++++++++++++++++++++++++++++---- > > 3 files changed, 68 insertions(+), 7 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/i915_drv.h > b/drivers/gpu/drm/i915/i915_drv.h > > index ac06c0f..51d867f 100644 > > --- a/drivers/gpu/drm/i915/i915_drv.h > > +++ b/drivers/gpu/drm/i915/i915_drv.h > > @@ -2092,6 +2092,7 @@ struct i915_params { > > bool disable_display; > > bool disable_vtd_wa; > > int use_mmio_flip; > > + bool mmio_debug; > > }; > > extern struct i915_params i915 __read_mostly; > > > > diff --git a/drivers/gpu/drm/i915/i915_params.c > b/drivers/gpu/drm/i915/i915_params.c > > index 8145729..7977872 100644 > > --- a/drivers/gpu/drm/i915/i915_params.c > > +++ b/drivers/gpu/drm/i915/i915_params.c > > @@ -49,6 +49,7 @@ struct i915_params i915 __read_mostly = { > > .enable_cmd_parser = 1, > > .disable_vtd_wa = 0, > > .use_mmio_flip = 0, > > + .mmio_debug = 0, > > }; > > > > module_param_named(modeset, i915.modeset, int, 0400); > > @@ -161,3 +162,8 @@ MODULE_PARM_DESC(enable_cmd_parser, > > module_param_named(use_mmio_flip, i915.use_mmio_flip, int, 0600); > > MODULE_PARM_DESC(use_mmio_flip, > > "use MMIO flips (-1=never, 0=driver discretion [default], > 1=always)"); > > + > > +module_param_named(mmio_debug, i915.mmio_debug, bool, 0600); > > +MODULE_PARM_DESC(disable_vtd_wa, > wrong parameter here! > > + "Enable the MMIO debug code (default: false). This may negatively " > > + "affect performance."); > > diff --git a/drivers/gpu/drm/i915/intel_uncore.c > b/drivers/gpu/drm/i915/intel_uncore.c > > index 29145df..de5402f 100644 > > --- a/drivers/gpu/drm/i915/intel_uncore.c > > +++ b/drivers/gpu/drm/i915/intel_uncore.c > > @@ -513,21 +513,72 @@ ilk_dummy_write(struct drm_i915_private *dev_priv) > > __raw_i915_write32(dev_priv, MI_MODE, 0); > > } > > > > +/** > > + * hsw_unclaimed_reg_debug - tries to find unclaimed registers > > + * @dev_priv: device private data > > + * @reg: register we're about to touch or just have touched > > + * @read: are we reading or writing a register now? > > + * @before: are we running this before or after we touch the register? > > + * > > + * This function tries to detect unclaimed registers and provide as much > > + * information as possible. It will only do something if the > i915.mmio_debug > > + * option is enabled. > > + * > > + * If we detect an unclaimed register when @before is true, it means > some > > + * unknown code wrote to an unclaimed register, and we're just > detecting it now. > > + * The bad code can be i915.ko, some other Kernel module (e.g., > snd_hda_intel, > > + * vgacon) or even user space. If we detect it when @before is false, > then > > + * there's a really good chance that the read/write operation we just > did to > > + * @reg is what triggered the unclaimed register message, but there's > also the > > + * possibility that after the operation we did, and before this > function, > > + * something else touched another unclaimed register, and we are > detecting it > > + * now. > > + * > > + * The unclaimed register bit can get set when we touch a register that > does not > > + * exist and when we touch a register that exists but is powered down. > Please > > + * also notice that the HW can only check some register ranges, so > there's no > > + * guarantee that all read and write operations to inexistent registers > will be > > + * caught by this function. > > + * > > + * Also please notice that we don't run this function on __raw > operations and in > > + * many other cases, so the case where @before is true is quite common. > > + */ > > Pet peeve of mine: Comments have a cost since they get stale really fast > and then can be awfully misleading. We have way too many examples for > that. My rule of interface kerneldoc for functions is that we should only > have it for exported functions used in multiple different places in the > driver. If there's only one caller (like for _init/fini) functions the > comment should be very terse at most. > > For static inline helpers in the same file otoh code should simply be > readable enough to not require comments. Imo this is the case here, so the > kerneldoc should imo be dropped. > > If you want to keep some of it I guess we could document i915.mmio_debug > with kerneldoc and then also pull it into our i915 section in the drm > docbook. Maybe under a new "debug options" setting. We probably need a > DOC: section to make this work. A few sentences should be more than > enough. > > > static void > > -hsw_unclaimed_reg_clear(struct drm_i915_private *dev_priv, u32 reg) > > +hsw_unclaimed_reg_debug(struct drm_i915_private *dev_priv, u32 reg, > bool read, > > + bool before) > > { > > + const char *op = read ? "reading" : "writing to"; > > + const char *when = before ? "before" : "after"; > > + > > + if (!i915.mmio_debug) > > + return; > > + > > if (__raw_i915_read32(dev_priv, FPGA_DBG) & FPGA_DBG_RM_NOCLAIM) { > > - DRM_ERROR("Unknown unclaimed register before writing to > %x\n", > > - reg); > > + WARN(1, "Unclaimed register detected %s %s register > 0x%x\n", > > + when, op, reg); > > __raw_i915_write32(dev_priv, FPGA_DBG, > FPGA_DBG_RM_NOCLAIM); > > } > > } > > > > +/** > > + * hsw_unclaimed_reg_detect - tries to find unclaimed registers > > + * @dev_priv: device private data > > + * > > + * This function will try to detect if something touched and unclaimed > register, > > + * triggering the FPGA_DBG bit. If this happens, we will print a > message telling > > + * the user to use i915.mmio_debug=1 so we can properly debug the > problem. > > + * > > + * This way, we can still detect our bugs without giving the user the > > + * performance impact of hsw_unclaimed_reg_debug(). > > + */ > > static void > > -hsw_unclaimed_reg_check(struct drm_i915_private *dev_priv, u32 reg) > > +hsw_unclaimed_reg_detect(struct drm_i915_private *dev_priv) > > { > > + if (i915.mmio_debug) > > + return; > > + > > if (__raw_i915_read32(dev_priv, FPGA_DBG) & FPGA_DBG_RM_NOCLAIM) { > > - DRM_ERROR("Unclaimed write to %x\n", reg); > > + DRM_ERROR("Unclaimed register detected. Please use the > i915.mmio_debug=1 to debug this problem."); > > But imo really the only piece that's required is this helpful message here > into dmesg. > > Anyway, the concept looks solid. > -Daniel > > > __raw_i915_write32(dev_priv, FPGA_DBG, > FPGA_DBG_RM_NOCLAIM); > > } > > } > > @@ -564,6 +615,7 @@ gen5_read##x(struct drm_i915_private *dev_priv, > off_t reg, bool trace) { \ > > static u##x \ > > gen6_read##x(struct drm_i915_private *dev_priv, off_t reg, bool trace) > { \ > > REG_READ_HEADER(x); \ > > + hsw_unclaimed_reg_debug(dev_priv, reg, true, true); \ > > if (dev_priv->uncore.forcewake_count == 0 && \ > > NEEDS_FORCE_WAKE((dev_priv), (reg))) { \ > > dev_priv->uncore.funcs.force_wake_get(dev_priv, \ > > @@ -574,6 +626,7 @@ gen6_read##x(struct drm_i915_private *dev_priv, > off_t reg, bool trace) { \ > > } else { \ > > val = __raw_i915_read##x(dev_priv, reg); \ > > } \ > > + hsw_unclaimed_reg_debug(dev_priv, reg, true, false); \ > > REG_READ_FOOTER; \ > > } > > > > @@ -700,12 +753,13 @@ hsw_write##x(struct drm_i915_private *dev_priv, > off_t reg, u##x val, bool trace) > > if (NEEDS_FORCE_WAKE((dev_priv), (reg))) { \ > > __fifo_ret = __gen6_gt_wait_for_fifo(dev_priv); \ > > } \ > > - hsw_unclaimed_reg_clear(dev_priv, reg); \ > > + hsw_unclaimed_reg_debug(dev_priv, reg, false, true); \ > > __raw_i915_write##x(dev_priv, reg, val); \ > > if (unlikely(__fifo_ret)) { \ > > gen6_gt_check_fifodbg(dev_priv); \ > > } \ > > - hsw_unclaimed_reg_check(dev_priv, reg); \ > > + hsw_unclaimed_reg_debug(dev_priv, reg, false, false); \ > > + hsw_unclaimed_reg_detect(dev_priv); \ > > REG_WRITE_FOOTER; \ > > } > > > > -- > > 2.0.0 > > > > _______________________________________________ > > Intel-gfx mailing list > > Intel-gfx@lists.freedesktop.org > > http://lists.freedesktop.org/mailman/listinfo/intel-gfx > > -- > Daniel Vetter > Software Engineer, Intel Corporation > +41 (0) 79 365 57 48 - http://blog.ffwll.ch > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/intel-gfx > The rest looks good. With that fixed and probably with comment removed as Daniel mentioned, feel free to use: Reviewed-by: Rodrigo Vivi <rodrigo.v...@gmail.com> -- Rodrigo Vivi Blog: http://blog.vivi.eng.br
_______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx