Chris Wilson <[email protected]> writes:

> On Wed, Feb 15, 2017 at 03:52:59PM +0200, Mika Kuoppala wrote:
>> Certain Baytrails, namely the 4 cpu core variants, have been
>> plaqued by spurious system hangs, mostly occurring with light loads.
>> 
>> Multiple bisects by various people point to a commit which changes the
>> reclocking strategy for Baytrail to follow its bigger brethen:
>> commit 8fb55197e64d ("drm/i915: Agressive downclocking on Baytrail")
>> 
>> There is also a review comment attached to this commit from Deepak S
>> on avoiding punit access on Cherryview and thus it was excluded on
>> common reclocking path. By taking the same approach and omitting
>> the punit access by not tweaking the thresholds when the hardware
>> has been asked to move into different frequency, considerable gains
>> in stability have been observed.
>> 
>> With J1900 box, light render/video load would end up in system hang
>> in usually less than 12 hours. With this patch applied, the cumulative
>> uptime has now been 34 days without issues. To provoke system hang,
>> light loads on both render and bsd engines in parallel have been used:
>> glxgears >/dev/null 2>/dev/null &
>> mpv --vo=vaapi --hwdec=vaapi --loop=inf vid.mp4
>> 
>> So far, author has not witnessed system hang with above load
>> and this patch applied. Reports from the tenacious people at
>> kernel bugzilla are also promising.
>> 
>> Considering that the punit access frequency with this patch is
>> considerably less, there is a possibility that this will push
>> the, still unknown, root cause past the triggering point on most loads.
>> 
>> But as we now can reliably reproduce the hang independently,
>> we can reduce the pain that users are having and use a
>> static thresholds until a root cause is found.
>> 
>> v3: don't break debugfs and simplification (Chris Wilson)
>> 
>> References: https://bugzilla.kernel.org/show_bug.cgi?id=109051
>> Cc: Chris Wilson <[email protected]>
>> Cc: Ville Syrjälä <[email protected]>
>> Cc: Len Brown <[email protected]>
>> Cc: Daniel Vetter <[email protected]>
>> Cc: Jani Nikula <[email protected]>
>> Cc: [email protected]
>> Cc: [email protected]
>> Cc: Ezequiel Garcia <[email protected]>
>> CC: Michal Feix <[email protected]>
>> Cc: Hans de Goede <[email protected]>
>> Cc: Deepak S <[email protected]>
>> Cc: Jarkko Nikula <[email protected]>
>> Cc: <[email protected]> # v4.2+
>> Acked-by: Daniel Vetter <[email protected]>
>> Signed-off-by: Mika Kuoppala <[email protected]>
>
> Had a couple of weekends to try and find an alternative explanation
> (a root cause for the hangs would be nice!). If it is just the writes to
> the RPS registers, are we safe on resume (etc)?
>
> However, I've drawn a blank on explaining what the hw is doing wrong
> (but found a couple of bugs in the byt manual RPS evaluation which
> desire review), so
> Acked-by: Chris Wilson <[email protected]>

Pushed, thanks.
-Mika

> -Chris
>
> -- 
> Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Reply via email to