Re: [Nouveau] [REGRESSION] nouveau: Crash in gk104_fifo_intr_runlist()

Ilia Mirkin Tue, 11 Aug 2015 21:01:20 -0700

I'm guessing that optimus is the operative difference, not the
specific chip. Basically something that can be put to sleep via
ACPI...


On Tue, Aug 11, 2015 at 11:53 PM, Alexandre Courbot <gnu...@gmail.com> wrote:
> Sending the revert patch to Dave after receiving his green light for
> this, and will investigate the issue on my side. I should be able to find a
> gk107 somewhere...
>
> On Wed, Aug 12, 2015 at 12:35 PM, Alexandre Courbot <gnu...@gmail.com> wrote:
>> Mmm in that case it is probably best to revert that commit for the
>> time being. It was targeting GM20B (and maybe other Maxwells too) so
>> reverting it should not hurt anyone at the moment. I think Ben is on
>> holidays for now, is there anyone else who can send a pull request to
>> Dave Airlie for this? We don't want 4.2 to ship with a crash every
>> other reboot...
>>
>> On Wed, Aug 12, 2015 at 10:01 AM, Eric Biggers <ebigge...@gmail.com> wrote:
>>> Hi,
>>>
>>> I think I've done about 10 reboots with the commit reverted and I never
>>> experienced the crash.  But with 4.2.0-rc6 I get the crash on about every
>>> other reboot.
>>>
>>> Probably relevant: the computer on which the crash occurs has two GPUs (one
>>> Intel and one Nvidia).  The Intel one is actually being used, whereas I
>>> presume the Nvidia one is being automatically disabled shortly after boot,
>>> perhaps when the crash occurs...
>>>
>>> Eric
>>>
>>> On Mon, Aug 10, 2015 at 11:28 PM, Alexandre Courbot <gnu...@gmail.com>
>>> wrote:
>>>>
>>>> Indeed, and I am actually surprised to see one here. I will
>>>> double-check that patch.
>>>>
>>>> Eric, would you be able to give an estimate of the repro rate for this
>>>> issue? More testing with and without the patch would be welcome, it'd
>>>> be good to know whether it is actually the culprit or not.
>>>>
>>>> On Mon, Aug 10, 2015 at 2:28 AM, Ilia Mirkin <imir...@alum.mit.edu> wrote:
>>>> > Alexandre, could you take a look? 0xbad* generally comes from bad mmio
>>>> > reads.
>>>> >
>>>> > On Aug 9, 2015 1:08 PM, "Eric Biggers" <ebigge...@gmail.com> wrote:
>>>> >>
>>>> >> Hi,
>>>> >>
>>>> >> I am testing Linux v4.2-rc5 and I am sporadically getting crashes
>>>> >> shortly
>>>> >> after
>>>> >> startup in gk104_fifo_intr_runlist().  What I've found is that the
>>>> >> 'mask'
>>>> >> value
>>>> >> read from offset 0x2a00 comes back as '0xbad0da00'.  This causes the
>>>> >> 'engn'
>>>> >> variable to be assigned the value 9, which is invalid; then wake_up()
>>>> >> is
>>>> >> called
>>>> >> on an uninitialized waitqueue which causes the crash.
>>>> >>
>>>> >> Reverting commit 1addc12648521d ("drm/nouveau/fifo/gk104: kick channels
>>>> >> when
>>>> >> deactivating them") seemed to make the problem go away, although I
>>>> >> can't
>>>> >> be 100%
>>>> >> sure because the problem is sporadic.
>>>> >>
>>>> >> Attached an example of the kernel log up to the crash.
>>>> >>
>>>> >> Eric
>>>> >>
>>>> >> _______________________________________________
>>>> >> Nouveau mailing list
>>>> >> Nouveau@lists.freedesktop.org
>>>> >> http://lists.freedesktop.org/mailman/listinfo/nouveau
>>>> >>
>>>> >
>>>
>>>
_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [Nouveau] [REGRESSION] nouveau: Crash in gk104_fifo_intr_runlist()

Reply via email to