Re: [Nouveau] gpu lockup detection and fallback to noaccel

Marcin Slusarz Mon, 20 Jun 2011 04:06:37 -0700

On Mon, Jun 20, 2011 at 10:17:02AM +1000, Ben Skeggs wrote:
> On Mon, 2011-06-20 at 00:25 +0200, Marcin Slusarz wrote:
> > On Wed, Jun 15, 2011 at 09:27:22AM +0300, Maxim Levitsky wrote:
> > > On Tue, 2011-06-14 at 23:18 +0200, Marcin Slusarz wrote: 
> > > > Hi
> > > > 
> > > > I have a very rough patchset which adds support for GPU lockup 
> > > > detection and fallback
> > > > to (more or less) noaccel to xf86-video-nouveau.
> > > > 
> > > > As the patches are only a proof of concept and needs a lot of work, I 
> > > > would like
> > > > to know first if this is a desired feature - I don't want to spend a 
> > > > couple of days
> > > > on patches which will be ignored or rejected with a reason "we don't 
> > > > need it".
> > > > 
> > > > So, what do you think?
> > > 
> > > Will love it! I have unexplained hangs here, so maybe I could debug them
> > > further with this.
> > > 
> > 
> > Thanks for encouragement. But...
> > 
> > I was hoping for reponse from someone with commit access. I really really 
> > hate wasting
> > time, so I'm not going to finish it. Oh well, I guess it's not that 
> > important as I thought.
> Hey,
> 
> I'd be interested in seeing the approach you've taken at least.  I'm not
> convinced this is something we want exactly, my fear is that a lot of
> bugs will end up covered over with people not noticing.  But, lets
> see :)
>


General idea is: detect nouveau_bo_map failures and disable acceleration.

libdrm:
Problem 1: timeout in __nouveau_fence_wait never triggers, because xserver uses 
signals, (SIGIO
for input and SIGALRM for some short timers), which interrupt fence loop and 
causes syscall restart.
Solution: detect timeouts on libdrm side.

Problem 2: nouveau_pushbuf_flush asserts when it can't allocate space for next 
push buffer.
Solution: handle it and return error. As WAIT_RING and FIRE_RING uses 
nouveau_pushbuf_flush, they
need to propagate error further. BEGIN_RING uses WAIT_RING, so it needs 
propagate error too.

xf86-video-nouveau:
Should handle all errors (nouveau_bo_map, BEGIN_RING, WAIT_RING, FIRE_RING) and 
disable acceleration.
This is tricky.
Problem 3: we can't disable exa in the middle of accelerated operation (which 
might consist of
several exa ops), so we need to mark channel with AccelBroken and return false 
from any Check/Prepare
funcs. The problem is: we need at least one operation - 
nouveau_exa_prepare_access. On NV50 it means
WrappedFB must be enabled. (I didn't investigate it yet, but maybe we could 
untile the pixmap?)
WFB has some performance overhead, so this whole functionality would probably 
need driver option
(e.g. DetectGPULockups), which would implicitly enable WFB :(. Exa with only 
PrepareAccess hook
is EXTREMELY slow (~0.1 FPS, maybe even less), so after one full accel 
operation, we need
to disable exa entirely and fallback to NoAccel - I didn't investigate how to 
do it yet.

Additionally, nouveau_exa_prepare_access needs to use NOUVEAU_BO_NOSYNC when 
AccelIsBroken, because
waiting for locked up pgraph does not make any sense.

Completely unrelated to this madness is detecting GPU lockup at driver 
initialization time.
It's nice and clean and it allows to restart xserver automatically in NoAccel 
mode after lockup
(However it needs to workaround bug in xserver, bugfix already sent to 
xorg-devel list -
http://lists.x.org/archives/xorg-devel/2011-June/023075.html).

Mesa:
Should assert when any of nouveau_bo_map/BEGIN_RING/WAIT_RING/FIRE_RING fail. 
At least for now.

Marcin
_______________________________________________
Nouveau mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [Nouveau] gpu lockup detection and fallback to noaccel

Reply via email to