On Mon, Jun 20, 2011 at 10:17:02AM +1000, Ben Skeggs wrote: > On Mon, 2011-06-20 at 00:25 +0200, Marcin Slusarz wrote: > > On Wed, Jun 15, 2011 at 09:27:22AM +0300, Maxim Levitsky wrote: > > > On Tue, 2011-06-14 at 23:18 +0200, Marcin Slusarz wrote: > > > > Hi > > > > > > > > I have a very rough patchset which adds support for GPU lockup > > > > detection and fallback > > > > to (more or less) noaccel to xf86-video-nouveau. > > > > > > > > As the patches are only a proof of concept and needs a lot of work, I > > > > would like > > > > to know first if this is a desired feature - I don't want to spend a > > > > couple of days > > > > on patches which will be ignored or rejected with a reason "we don't > > > > need it". > > > > > > > > So, what do you think? > > > > > > Will love it! I have unexplained hangs here, so maybe I could debug them > > > further with this. > > > > > > > Thanks for encouragement. But... > > > > I was hoping for reponse from someone with commit access. I really really > > hate wasting > > time, so I'm not going to finish it. Oh well, I guess it's not that > > important as I thought. > Hey, > > I'd be interested in seeing the approach you've taken at least. I'm not > convinced this is something we want exactly, my fear is that a lot of > bugs will end up covered over with people not noticing. But, lets > see :) >
General idea is: detect nouveau_bo_map failures and disable acceleration. libdrm: Problem 1: timeout in __nouveau_fence_wait never triggers, because xserver uses signals, (SIGIO for input and SIGALRM for some short timers), which interrupt fence loop and causes syscall restart. Solution: detect timeouts on libdrm side. Problem 2: nouveau_pushbuf_flush asserts when it can't allocate space for next push buffer. Solution: handle it and return error. As WAIT_RING and FIRE_RING uses nouveau_pushbuf_flush, they need to propagate error further. BEGIN_RING uses WAIT_RING, so it needs propagate error too. xf86-video-nouveau: Should handle all errors (nouveau_bo_map, BEGIN_RING, WAIT_RING, FIRE_RING) and disable acceleration. This is tricky. Problem 3: we can't disable exa in the middle of accelerated operation (which might consist of several exa ops), so we need to mark channel with AccelBroken and return false from any Check/Prepare funcs. The problem is: we need at least one operation - nouveau_exa_prepare_access. On NV50 it means WrappedFB must be enabled. (I didn't investigate it yet, but maybe we could untile the pixmap?) WFB has some performance overhead, so this whole functionality would probably need driver option (e.g. DetectGPULockups), which would implicitly enable WFB :(. Exa with only PrepareAccess hook is EXTREMELY slow (~0.1 FPS, maybe even less), so after one full accel operation, we need to disable exa entirely and fallback to NoAccel - I didn't investigate how to do it yet. Additionally, nouveau_exa_prepare_access needs to use NOUVEAU_BO_NOSYNC when AccelIsBroken, because waiting for locked up pgraph does not make any sense. Completely unrelated to this madness is detecting GPU lockup at driver initialization time. It's nice and clean and it allows to restart xserver automatically in NoAccel mode after lockup (However it needs to workaround bug in xserver, bugfix already sent to xorg-devel list - http://lists.x.org/archives/xorg-devel/2011-June/023075.html). Mesa: Should assert when any of nouveau_bo_map/BEGIN_RING/WAIT_RING/FIRE_RING fail. At least for now. Marcin _______________________________________________ Nouveau mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/nouveau
