Chris Wilson <[email protected]> writes: > On Wed, Feb 15, 2017 at 03:52:59PM +0200, Mika Kuoppala wrote: >> Certain Baytrails, namely the 4 cpu core variants, have been >> plaqued by spurious system hangs, mostly occurring with light loads. >> >> Multiple bisects by various people point to a commit which changes the >> reclocking strategy for Baytrail to follow its bigger brethen: >> commit 8fb55197e64d ("drm/i915: Agressive downclocking on Baytrail") >> >> There is also a review comment attached to this commit from Deepak S >> on avoiding punit access on Cherryview and thus it was excluded on >> common reclocking path. By taking the same approach and omitting >> the punit access by not tweaking the thresholds when the hardware >> has been asked to move into different frequency, considerable gains >> in stability have been observed. >> >> With J1900 box, light render/video load would end up in system hang >> in usually less than 12 hours. With this patch applied, the cumulative >> uptime has now been 34 days without issues. To provoke system hang, >> light loads on both render and bsd engines in parallel have been used: >> glxgears >/dev/null 2>/dev/null & >> mpv --vo=vaapi --hwdec=vaapi --loop=inf vid.mp4 >> >> So far, author has not witnessed system hang with above load >> and this patch applied. Reports from the tenacious people at >> kernel bugzilla are also promising. >> >> Considering that the punit access frequency with this patch is >> considerably less, there is a possibility that this will push >> the, still unknown, root cause past the triggering point on most loads. >> >> But as we now can reliably reproduce the hang independently, >> we can reduce the pain that users are having and use a >> static thresholds until a root cause is found. >> >> v3: don't break debugfs and simplification (Chris Wilson) >> >> References: https://bugzilla.kernel.org/show_bug.cgi?id=109051 >> Cc: Chris Wilson <[email protected]> >> Cc: Ville Syrjälä <[email protected]> >> Cc: Len Brown <[email protected]> >> Cc: Daniel Vetter <[email protected]> >> Cc: Jani Nikula <[email protected]> >> Cc: [email protected] >> Cc: [email protected] >> Cc: Ezequiel Garcia <[email protected]> >> CC: Michal Feix <[email protected]> >> Cc: Hans de Goede <[email protected]> >> Cc: Deepak S <[email protected]> >> Cc: Jarkko Nikula <[email protected]> >> Cc: <[email protected]> # v4.2+ >> Acked-by: Daniel Vetter <[email protected]> >> Signed-off-by: Mika Kuoppala <[email protected]> > > Had a couple of weekends to try and find an alternative explanation > (a root cause for the hangs would be nice!). If it is just the writes to > the RPS registers, are we safe on resume (etc)? > > However, I've drawn a blank on explaining what the hw is doing wrong > (but found a couple of bugs in the byt manual RPS evaluation which > desire review), so > Acked-by: Chris Wilson <[email protected]>
Pushed, thanks. -Mika > -Chris > > -- > Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list [email protected] https://lists.freedesktop.org/mailman/listinfo/intel-gfx
