On Fri, Jun 17, 2011 at 12:12:16AM +0100, Chris Wilson wrote: > On Thu, 16 Jun 2011 15:46:29 -0700, Bryce Harrington <[email protected]> > wrote: > > On Thu, Jun 16, 2011 at 12:37:00PM +0100, Chris Wilson wrote: > > > On Wed, 15 Jun 2011 18:10:29 -0700, Bryce Harrington > > > <[email protected]> wrote: > > > > https://bugs.freedesktop.org/show_bug.cgi?id=36515 > > > > > > This looks to be a continuation of the WAIT_EVENT on a dead pipe that we > > > thought we had beaten into submission. The other reports provide more > > > circumstantial evidence to suggest that the hang coincides with a hotplug > > > event. I think the cause is a race between the kernel turning the pipe off > > > due to the hotplug and reprobing and that uevent reaching the ddx. In the > > > meantime, we've queued another video frame to execute on the dead pipe. > > > Worse we may have queued it up long before the hotplug event and due to > > > buffering in the GPU command stream it only gets executed afterwards. > > > > > > commit 85345517fe6d4de27b0d6ca19fef9d28ac947c4a > > > Author: Chris Wilson <[email protected]> > > > Date: Sat Nov 13 09:49:11 2010 +0000 > > > > > > drm/i915: Retire any pending operations on the old scanout when > > > switching > > > > > > Handles the case were we are changing modes. Unfortunately, disabling an > > > output takes a different path. Though, I think we can a similar big hammer > > > approach there are well. > > > > As luck would have it, my own i965 laptop locked up today with I guess > > this same bug. IPEHR=0x01820000 > > > > Before I restart it, is there any data which could be gathered that > > would assist you? > > My theory is based upon this still being a WAIT_EVENT on a disable pipe. > The error state should support this is the DSP*CNTR is disabled for the > pipe we are waiting on. But the other observation to make is whether you > know if a modeset happened at around the same time as the hang.
The hang occurred while the system was preparing for sleep, triggered by a lid close event. >From my kern.log: Jun 14 23:40:40 lynmouth kernel: [511433.780066] tg3 0000:08:00.0: eth0: Link is down Jun 14 23:40:41 lynmouth kernel: [511434.597257] PM: Syncing filesystems ... done. Jun 14 23:40:41 lynmouth kernel: [511434.615699] PM: Preparing system for mem sleep Jun 14 23:40:45 lynmouth kernel: [511439.284049] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung Jun 14 23:40:45 lynmouth kernel: [511439.284823] [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -11 (awaiting 1680764 at 1680757, next 1680765) Jun 14 23:40:46 lynmouth kernel: [511439.788055] [drm:i915_reset] *ERROR* Failed to reset chip. Jun 16 15:02:15 lynmouth kernel: [511439.916240] Freezing user space processes ... (elapsed 0.01 seconds) done. Jun 16 15:02:15 lynmouth kernel: [511439.932109] Freezing remaining freezable tasks ... (elapsed 0.01 seconds) done. Jun 16 15:02:15 lynmouth kernel: [511439.948084] PM: Entering mem sleep I don't see a modeset event but could be it happens but doesn't cause a log entry. I'll flip on more debugging output and check. The log shows the system has an uptime of 15 days and has gone through suspend resume cycles roughly daily. I do play videos on it from time to time, although I hadn't been at the time of this suspend/resume cycle. The system does occasionally lose its dualhead configuration during suspend/resume, and comes back mirrored. I've assumed it to be a gnome-settings-daemon bug, but could be a symptom of this problem. It does hint that perhaps some modeset or output hotplug event or something does occur during resume. > > Otherwise, I can boot and test the patch you posted to the bug. > > I'm confident that that patch closes another window for the bug. I'm > less confident that that's the only race condition we have. > > > One of the difficulties with this type of bug is that it's so > > intermittent and uncertain to reproduce (and so easily confused with > > other unrelated freezes), that it's hard to tell for certain if a given > > patch has definitively helped the situation. Do you have suggestions on > > ways of measuring this better, or techniques to help in triggering the > > bug more reliably? > > If am I right, then we have two paths that cause WAIT_FOR_EVENT, > windowed swapbuffers (or sub_copy_swap) and video. So playing a number > of video streams should increase the likelihood of the bug, run in > parallel with looping xrandr mode changes - in particular disabling > outputs. Awesome, can do. The reason I ask is because the way Ubuntu's stable updates process works, if I can demonstrate that a patch improves things, in a way that's clear to a non-X person (i.e. the archive admin team) to understand, I can get the patch released to all Ubuntu users. If I can't prove it or demonstrate it in some fashion, it'll get rejected or significantly delayed. Bryce _______________________________________________ Intel-gfx mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/intel-gfx
