On Thu, Nov 19, 2015 at 06:46:28PM +0100, Mario Kleiner wrote:
> Hi Alex and Michel and Ville,
> 
> it's "fix vblank stuff" time again ;-)
> 
> Ville's changes to the DRM's drm_handle_vblank() / 
> drm_update_vblank_count() code in Linux 4.4 not only made that code more 
> elegant, but also removed the robustness against the vblank irq quirks 
> in AMD hw and similar hardware. So now i get tons of off-by-one errors and
> 
> "[   432.345] (WW) RADEON(1): radeon_dri2_flip_event_handler: Pageflip 
> completion event has impossible msc 24803 < target_msc 24804" XOrg 
> messages from that kernel.

Argh. Sorry about that.

> 
> One of the reasons for trouble is that AMD hw quirk where the hw fires 
> an extra vblank irq shortly after vblank irq's get enabled, not 
> synchronized to vblank, but typically in the middle of active scanout, 
> so we get a redundant call to drm_handle_vblank in the middle of scanout.

I think that should be fine as such. The code should ignore redudntant
vbl irqs. Well, assuming you have a reliable hw counter or you use the
timestamp guesstimate mechanism and your scanout position is reported
accurately. But I guess you have a bit of problem with both.

> 
> To fix that i have a minor patch to make drm_update_vblank_count() again 
> robust against such redundant calls, which i will send out later to the 
> mailing list. Diff attached for reference.
> 
> The second quirk of AMD hw is that the vblank interrupt fires a few 
> scanlines before start of vblank, so drm_handle_vblank -> 
> drm_update_vblank_count() -> dev->driver->get_vblank_counter() gets 
> called before the start of the vblank for which the new vblank count 
> should be queried.

Does it fire too soon, or is the scanout position register value(s)
just offset by a few lines perhaps?

We have that with i915 and I simply fix up the value when reading it
out. Fortunately for us the offset is constant (or at least seems to
be) for a given platform/connector combo.

> 
> The third problem is that the DRM vblank handling always had the 
> assumption that hardware vblank counters would essentially increment at 
> leading edge of vblank - basically in sync with the firing of the vblank 
> irq, so that a hw counter readout from within the vblank irq handler 
> would always deliver the new incremented value. If this assumption is 
> violated then the counting by use of the hw counter gets unreliable, 
> because depending on random small delays in irq handling the code may 
> end up sampling the hw counter pre- or post-increment, leading to 
> inconsistent updating and funky bugs. It just so happens that AMD 
> hardware doesn't increment the hw counter at leading edge of vblank, so 
> stuff falls apart.
> 
> So to fix those two problems i'm tinkering with cooking the hw vblank 
> counter value returned by radeon_get_vblank_counter_kms() to make it 
> appear as if the counter incremented at leading edge of vblank in sync 
> with vblank irq.

Yeah, I do that in i915 too. Well, on some platforms where the
counter increments at the leading edge of active.

> 
> It almost sort of works on the rs600 code path, but i need a bit of info 
> from you:
> 
> 1. There's this register from the old specs for m76.pdf, which is not 
> part of the current register defines for radeon-kms:
> 
> "D1CRTC_STATUS_VF_COUNT - RW - 32 bits - [GpuF0MMReg:0x60A8]"
> 
> It contains the lower 16 bits of framecounter and the 13 bits of 
> vertical scanout position. It seems to give the same readings as the 24 
> bit R_0060A4_D1CRTC_STATUS_FRAME_COUNT we use for the hw counter. This 
> would come handy.
> 
> Does Evergreen and later have a same/similar register and where is it?
> 
> 2. The hw framecounter seems to increment when the vertical scanout 
> position wraps back from (VTOTAL - 1) to 0, at least on the one DCE-3 
> gpu i tested so far. Is this so on all asics? And is the hw counter 
> increment happening exactly at the moment that vertical scanout position 
> jumps back to zero, ie. both events are driven by the same signal? Or is 
> the framecounter increment just happening somewhere inside either 
> scanline VTOTAL-1 or scanline 0?
> 
> 
> If we can fix this and get it into rc2 or rc3 then we could avoid a bad 
> regression and with a bit of luck at the same time improve by being able 
> to set dev->vblank_disable_immediate = true then and allow vblank irqs 
> to get turned off more aggressively for a bit of extra power saving.
> 
> thanks,
> -mario

> -- a/drivers/gpu/drm/drm_irq.c
> +++ b/drivers/gpu/drm/drm_irq.c
> @@ -172,9 +172,11 @@ static void drm_update_vblank_count(struct drm_device 
> *dev, unsigned int pipe,
>                                     unsigned long flags)
>  {
>         struct drm_vblank_crtc *vblank = &dev->vblank[pipe];
> -       u32 cur_vblank, diff;
> +       u32 cur_vblank, diff = 0;
>         bool rc;
>         struct timeval t_vblank;
> +       const struct timeval *t_old;
> +       u64 diff_ns;
>         int count = DRM_TIMESTAMP_MAXRETRIES;
>         int framedur_ns = vblank->framedur_ns;
> 
> @@ -195,13 +197,15 @@ static void drm_update_vblank_count(struct drm_device 
> *dev, unsigned int pipe,
>                 rc = drm_get_last_vbltimestamp(dev, pipe, &t_vblank, flags);
>         } while (cur_vblank != dev->driver->get_vblank_counter(dev, pipe) && 
> --count > 0);
> 
> -       if (dev->max_vblank_count != 0) {
> -               /* trust the hw counter when it's around */
> -               diff = (cur_vblank - vblank->last) & dev->max_vblank_count;
> -       } else if (rc && framedur_ns) {
> -               const struct timeval *t_old;
> -               u64 diff_ns;
> -
> +       /*
> +        * Always use vblank timestamping based method if supported to reject
> +        * redundant vblank irqs. E.g., AMD hardware needs this to not screw 
> up
> +        * due to some irq handling quirk.

Hmm. I'm thinking it would be better to simply not claim that there's a hw
counter if it isn't reliable.

> +        *
> +        * This also sets the diff value for use as fallback below in case the
> +        * hw does not support a suitable hw vblank counter.
> +        */
> +       if (rc && framedur_ns) {
>                 t_old = &vblanktimestamp(dev, pipe, vblank->count);
>                 diff_ns = timeval_to_ns(&t_vblank) - timeval_to_ns(t_old);
> 
> @@ -212,11 +216,28 @@ static void drm_update_vblank_count(struct drm_device 
> *dev, unsigned int pipe,
>                  */
>                 diff = DIV_ROUND_CLOSEST_ULL(diff_ns, framedur_ns);
> 
> -               if (diff == 0 && flags & DRM_CALLED_FROM_VBLIRQ)
> -                       DRM_DEBUG_VBL("crtc %u: Redundant vblirq ignored."
> -                                     " diff_ns = %lld, framedur_ns = %d)\n",
> -                                     pipe, (long long) diff_ns, framedur_ns);
> -       } else {
> +               if (diff == 0 && flags & DRM_CALLED_FROM_VBLIRQ) {
> +                   DRM_DEBUG_VBL("crtc %u: Redundant vblirq ignored."
> +                   " diff_ns = %lld, framedur_ns = %d)\n",
> +                   pipe, (long long) diff_ns, framedur_ns);
> +
> +                   /* Treat this redundant invocation as no-op. */
> +                   WARN_ON_ONCE(cur_vblank != vblank->last);
> +                   return;
> +               }
> +       }
> +
> +       /*
> +        * hw counters, if marked as supported via max_vblank_count != 0,
> +        * *must* increment at leading edge of vblank and in sync with
> +        * the firing of the hw vblank irq, otherwise we have a race here
> +        * between gpu and us, where small variations in our execution
> +        * timing can cause off-by-one miscounting of vblanks.
> +        */
> +       if (dev->max_vblank_count != 0) {
> +               /* trust the hw counter when it's around */
> +               diff = (cur_vblank - vblank->last) & dev->max_vblank_count;
> +       } else if (!(rc && framedur_ns)) {
>                 /* some kind of default for drivers w/o accurate vbl 
> timestamping */
>                 diff = (flags & DRM_CALLED_FROM_VBLIRQ) != 0;
>         }


-- 
Ville Syrjälä
Intel OTC

Reply via email to