Hey all,

My desktop system has an NVidia graphics card that identifies as:

% lspci -v
# snip...
01:00.0 VGA compatible controller: NVIDIA Corporation GK107 [GeForce GTX
650] (rev a1) (prog-if 00 [VGA controller])
        Subsystem: Gigabyte Technology Co., Ltd GK107 [GeForce GTX 650]
        Flags: bus master, fast devsel, latency 0, IRQ 29
        Memory at f6000000 (32-bit, non-prefetchable) [size=16M]
        Memory at e0000000 (64-bit, prefetchable) [size=256M]
        Memory at f0000000 (64-bit, prefetchable) [size=32M]
        I/O ports at e000 [size=128]
        Expansion ROM at 000c0000 [disabled] [size=128K]
        Capabilities: [60] Power Management version 3
        Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+
        Capabilities: [78] Express Endpoint, MSI 00
        Capabilities: [b4] Vendor Specific Information: Len=14 <?>
        Capabilities: [100] Virtual Channel
        Capabilities: [128] Power Budgeting <?>
        Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1
Len=024 <?>
        Capabilities: [900] #19
        Kernel driver in use: nouveau

>From time to time, maybe once or twice a week, my system will fail.  The
symptoms are:

- Graphics freeze, no mouse movement, and they never start working no
matter how long I wait
- Sound is working (spotify keeps playing)
- Network connectivity works (I can ssh in)

When this happens and I ssh in and check out dmesg, I always see an error
like the following:

[11741.905192] nouveau 0000:01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT]
[11741.905202] nouveau 0000:01:00.0: fifo: gr engine fault on channel 10,
recovering...

Sometimes I see a lot of those errors, sometimes just one.  Whenever the
system is running normally those don't ever appear.  I'm always able to ssh
in and reboot cleanly.

Does anyone have any idea where I can start digging in to find out what's
happening?  Are these fifo errors happening in some logic that I can
disable with a kernel command line option?

Thanks,
Devrin

Reply via email to