Re: nouveau bug in linux/6.1.38-2

2023-08-31 Thread Linux regression tracking #update (Thorsten Leemhuis)
[TLDR: This mail in primarily relevant for Linux kernel regression
tracking. See link in footer if these mails annoy you.]

On 04.08.23 14:02, Thorsten Leemhuis wrote:
> On 02.08.23 23:28, Olaf Skibbe wrote:
>> Dear Maintainers,
>>
>> Hereby I would like to report an apparent bug in the nouveau driver in
>> linux/6.1.38-2.
> 
> Thx for your report. Maybe your problem is caused by a incomplete
> backport. I Cced the maintainers for the drivers (and the regressions
> and the stable list), maybe one of them has an idea, as they know the
> driver.

#regzbot fix: 98e470dc73a9b3539e5a7a3c72f6b7c01c98
#regzbot ignore-activity

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.




Re: nouveau bug in linux/6.1.38-2

2023-08-06 Thread Olaf Skibbe

On Fri, 4 Aug 2023 at 14:15, Karol Herbst wrote:


mind retrying with only fb725beca62d and 62aecf23f3d1 reverted?


I will do this later this day (takes some time, it is a slow machine).

Would be weird if the other two commits are causing it. If that's the 
case, it's a bit worrying that reverting either of the those causes 
issues, but maybe there is a good reason for it. Anyway, mind figuring 
out which of the two you need reverted to fix your issue? Thanks!


I can do this. But if I build two kernels anyway, isn't it faster to 
build each with only one of the patches applied? Or do you expect the 
patches to interact (so that the bug would only be present when both are 
applied)?


Cheers,
Olaf


Re: nouveau bug in linux/6.1.38-2

2023-08-06 Thread Olaf Skibbe

Dear all,

On Fri, 4 Aug 2023 at 14:15, Karol Herbst wrote:


62aecf23f3d1 drm/nouveau: add nv_encoder pointer check for NULL
fb725beca62d drm/nouveau/dp: check for NULL nv_connector->native_mode
90748be0f4f3 drm/nouveau: don't detect DSM for non-NVIDIA device
5a144bad3e75 nouveau: fix client work fence deletion race


mind retrying with only fb725beca62d and 62aecf23f3d1 reverted? Would 
be weird if the other two commits are causing it. If that's the case, 
it's a bit worrying that reverting either of the those causes issues, 
but maybe there is a good reason for it. Anyway, mind figuring out 
which of the two you need reverted to fix your issue? Thanks!


The result is:

Patch with commit fb725beca62d reverted: Graphics works. I attached the 
respective patch again to this mail.


Patch with commit 62aecf23f3d1 reverted: Screen remains black, error 
message:


# dmesg | grep -A 36 "cut here"
[2.921358] [ cut here ]
[2.921361] WARNING: CPU: 1 PID: 176 at 
drivers/gpu/drm/nouveau/nvkm/engine/disp/dp.c:460 nvkm_dp_acquire+0x26a/0x490 
[nouveau]
[2.921627] Modules linked in: sd_mod(E) t10_pi(E) crc64_rocksoft(E) 
sr_mod(E) crc64(E) crc_t10dif(E) crct10dif_generic(E) cdrom(E) nouveau(E+) 
mxm_wmi(E) i2c_algo_bit(E) drm_display_helper(E) cec(E) ahci(E) rc_core(E) 
drm_ttm_helper(E) libahci(E) ttm(E) ehci_pci(E) crct10dif_pclmul(E) 
crct10dif_common(E) ehci_hcd(E) drm_kms_helper(E) crc32_pclmul(E) 
firewire_ohci(E) sdhci_pci(E) cqhci(E) libata(E) e1000e(E) sdhci(E) psmouse(E) 
crc32c_intel(E) lpc_ich(E) ptp(E) i2c_i801(E) scsi_mod(E) i2c_smbus(E) 
firewire_core(E) scsi_common(E) usbcore(E) crc_itu_t(E) mmc_core(E) drm(E) 
pps_core(E) usb_common(E) battery(E) video(E) wmi(E) button(E)
[2.921695] CPU: 1 PID: 176 Comm: kworker/u16:5 Tainted: GE  
6.1.0-0.a.test-amd64 #1  Debian 6.1.38-2a~test
[2.921701] Hardware name: Dell Inc. Latitude E6510/0N5KHN, BIOS A17 
05/12/2017
[2.921705] Workqueue: nvkm-disp nv50_disp_super [nouveau]
[2.921948] RIP: 0010:nvkm_dp_acquire+0x26a/0x490 [nouveau]
[2.922192] Code: 48 8b 44 24 58 65 48 2b 04 25 28 00 00 00 0f 85 37 02 00 00 48 
83 c4 60 44 89 e0 5b 5d 41 5c 41 5d 41 5e 41 5f c3 cc cc cc cc <0f> 0b c1 e8 03 
41 88 6d 62 44 89 fe 48 89 df 48 69 c0 cf 0d d6 26
[2.922196] RSP: 0018:c077c04dfd60 EFLAGS: 00010246
[2.922201] RAX: 00041eb0 RBX: 9a8482624c00 RCX: 00041eb0
[2.922204] RDX: c0b47760 RSI:  RDI: c077c04dfcf0
[2.922206] RBP: 0001 R08: c077c04dfc64 R09: 5b76
[2.922209] R10: 000d R11: c077c04dfde0 R12: ffea
[2.922212] R13: 9a8517541e00 R14: 00044d45 R15: 
[2.922215] FS:  () GS:9a85a3c4() 
knlGS:
[2.922219] CS:  0010 DS:  ES:  CR0: 80050033
[2.92] CR2: 55f660bcb3a8 CR3: 00019761 CR4: 06e0
[2.96] Call Trace:
[2.922231]  
[2.922235]  ? __warn+0x7d/0xc0
[2.922244]  ? nvkm_dp_acquire+0x26a/0x490 [nouveau]
[2.922487]  ? report_bug+0xe6/0x170
[2.922494]  ? handle_bug+0x41/0x70
[2.922501]  ? exc_invalid_op+0x13/0x60
[2.922505]  ? asm_exc_invalid_op+0x16/0x20
[2.922512]  ? init_reset_begun+0x20/0x20 [nouveau]
[2.922708]  ? nvkm_dp_acquire+0x26a/0x490 [nouveau]
[2.922954]  nv50_disp_super_2_2+0x70/0x430 [nouveau]
[2.923200]  nv50_disp_super+0x113/0x210 [nouveau]
[2.923445]  process_one_work+0x1c7/0x380
[2.923456]  worker_thread+0x4d/0x380
[2.923463]  ? rescuer_thread+0x3a0/0x3a0
[2.923469]  kthread+0xe9/0x110
[2.923476]  ? kthread_complete_and_exit+0x20/0x20
[2.923482]  ret_from_fork+0x22/0x30
[2.923493]  
[2.923494] ---[ end trace  ]---

(Maybe it's worth to mention that the LED back-light is on, while the 
screen appears black.)


Cheers,
Olaf

P.S.: By the way: as a linux user for more than 20 years, I am very 
pleased to have the opportunity to contribute at least a little bit to 
the improvement. I'd like to use the chance to thank you all very much 
for building and developing this great operating system.From 47c0e938beef7335ffa179f1006754f9664c6c4d Mon Sep 17 00:00:00 2001
From: Diederik de Haas 
Date: Mon, 31 Jul 2023 19:55:54 +0200
Subject: [PATCH 2/4] Revert "drm/nouveau/dp: check for NULL
 nv_connector->native_mode"

This reverts commit fb725beca62d175c02ca619c27037c14f7ab8e7c.
---
 drivers/gpu/drm/nouveau/nouveau_connector.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_connector.c b/drivers/gpu/drm/nouveau/nouveau_connector.c
index fd984733b8e6..1991bbb1d05c 100644
--- a/drivers/gpu/drm/nouveau/nouveau_connector.c
+++ b/drivers/gpu/drm/nouveau/nouveau_connector.c
@@ -966,7 +966,7 @@ nouveau_connector_get_modes(struct drm_connector *connector)
 	/* Determine display colour depth for 

Re: nouveau bug in linux/6.1.38-2

2023-08-06 Thread Olaf Skibbe

On Fri, 4 Aug 2023 at 14:51, Karol Herbst wrote:

How are you building the kernel? Because normally from git reverting 
one of those shouldn't take long, because it doesn't recompile the 
entire kernel. But yeah, you can potentially just revert one of one 
for now and it should be fine.


I am using the `test-patches` script described here: 
https://kernel-team.pages.debian.net/kernel-handbook/ch-common-tasks.html#id-1.6.6.4 
This worked for my limited knowledge (first kernel I ever compiled).


(On the occasion a maybe silly question: am I right assuming that the 
kernel has to be build on the machine we want to reproduce the bug on? 
Otherwise it could use much faster hardware (running also bookworm).)


Cheers,
Olaf


Re: nouveau bug in linux/6.1.38-2

2023-08-06 Thread Olaf Skibbe

On Sat, 5 Aug 2023 at 01:09, Karol Herbst wrote:


Mind checking if instead of reverting the entire commit that this is
enough to fix it as well?

https://gitlab.freedesktop.org/karolherbst/nouveau/-/commit/f99ae069876f7ffeb6368da0381485e8c3adda43.patch


This patch does fix the problem as well: Screen works.

Cheers,
Olaf


Re: nouveau bug in linux/6.1.38-2

2023-08-04 Thread Karol Herbst
On Fri, Aug 4, 2023 at 8:10 PM Olaf Skibbe  wrote:
>
> Dear all,
>
> On Fri, 4 Aug 2023 at 14:15, Karol Herbst wrote:
>
> >>> 62aecf23f3d1 drm/nouveau: add nv_encoder pointer check for NULL
> >>> fb725beca62d drm/nouveau/dp: check for NULL nv_connector->native_mode
> >>> 90748be0f4f3 drm/nouveau: don't detect DSM for non-NVIDIA device
> >>> 5a144bad3e75 nouveau: fix client work fence deletion race
> >
> > mind retrying with only fb725beca62d and 62aecf23f3d1 reverted? Would
> > be weird if the other two commits are causing it. If that's the case,
> > it's a bit worrying that reverting either of the those causes issues,
> > but maybe there is a good reason for it. Anyway, mind figuring out
> > which of the two you need reverted to fix your issue? Thanks!
>
> The result is:
>
> Patch with commit fb725beca62d reverted: Graphics works. I attached the
> respective patch again to this mail.
>

Mind checking if instead of reverting the entire commit that this is
enough to fix it as well?

https://gitlab.freedesktop.org/karolherbst/nouveau/-/commit/f99ae069876f7ffeb6368da0381485e8c3adda43.patch


> Patch with commit 62aecf23f3d1 reverted: Screen remains black, error
> message:
>
> # dmesg | grep -A 36 "cut here"
> [2.921358] [ cut here ]
> [2.921361] WARNING: CPU: 1 PID: 176 at 
> drivers/gpu/drm/nouveau/nvkm/engine/disp/dp.c:460 nvkm_dp_acquire+0x26a/0x490 
> [nouveau]
> [2.921627] Modules linked in: sd_mod(E) t10_pi(E) crc64_rocksoft(E) 
> sr_mod(E) crc64(E) crc_t10dif(E) crct10dif_generic(E) cdrom(E) nouveau(E+) 
> mxm_wmi(E) i2c_algo_bit(E) drm_display_helper(E) cec(E) ahci(E) rc_core(E) 
> drm_ttm_helper(E) libahci(E) ttm(E) ehci_pci(E) crct10dif_pclmul(E) 
> crct10dif_common(E) ehci_hcd(E) drm_kms_helper(E) crc32_pclmul(E) 
> firewire_ohci(E) sdhci_pci(E) cqhci(E) libata(E) e1000e(E) sdhci(E) 
> psmouse(E) crc32c_intel(E) lpc_ich(E) ptp(E) i2c_i801(E) scsi_mod(E) 
> i2c_smbus(E) firewire_core(E) scsi_common(E) usbcore(E) crc_itu_t(E) 
> mmc_core(E) drm(E) pps_core(E) usb_common(E) battery(E) video(E) wmi(E) 
> button(E)
> [2.921695] CPU: 1 PID: 176 Comm: kworker/u16:5 Tainted: GE
>   6.1.0-0.a.test-amd64 #1  Debian 6.1.38-2a~test
> [2.921701] Hardware name: Dell Inc. Latitude E6510/0N5KHN, BIOS A17 
> 05/12/2017
> [2.921705] Workqueue: nvkm-disp nv50_disp_super [nouveau]
> [2.921948] RIP: 0010:nvkm_dp_acquire+0x26a/0x490 [nouveau]
> [2.922192] Code: 48 8b 44 24 58 65 48 2b 04 25 28 00 00 00 0f 85 37 02 00 
> 00 48 83 c4 60 44 89 e0 5b 5d 41 5c 41 5d 41 5e 41 5f c3 cc cc cc cc <0f> 0b 
> c1 e8 03 41 88 6d 62 44 89 fe 48 89 df 48 69 c0 cf 0d d6 26
> [2.922196] RSP: 0018:c077c04dfd60 EFLAGS: 00010246
> [2.922201] RAX: 00041eb0 RBX: 9a8482624c00 RCX: 
> 00041eb0
> [2.922204] RDX: c0b47760 RSI:  RDI: 
> c077c04dfcf0
> [2.922206] RBP: 0001 R08: c077c04dfc64 R09: 
> 5b76
> [2.922209] R10: 000d R11: c077c04dfde0 R12: 
> ffea
> [2.922212] R13: 9a8517541e00 R14: 00044d45 R15: 
> 
> [2.922215] FS:  () GS:9a85a3c4() 
> knlGS:
> [2.922219] CS:  0010 DS:  ES:  CR0: 80050033
> [2.92] CR2: 55f660bcb3a8 CR3: 00019761 CR4: 
> 06e0
> [2.96] Call Trace:
> [2.922231]  
> [2.922235]  ? __warn+0x7d/0xc0
> [2.922244]  ? nvkm_dp_acquire+0x26a/0x490 [nouveau]
> [2.922487]  ? report_bug+0xe6/0x170
> [2.922494]  ? handle_bug+0x41/0x70
> [2.922501]  ? exc_invalid_op+0x13/0x60
> [2.922505]  ? asm_exc_invalid_op+0x16/0x20
> [2.922512]  ? init_reset_begun+0x20/0x20 [nouveau]
> [2.922708]  ? nvkm_dp_acquire+0x26a/0x490 [nouveau]
> [2.922954]  nv50_disp_super_2_2+0x70/0x430 [nouveau]
> [2.923200]  nv50_disp_super+0x113/0x210 [nouveau]
> [2.923445]  process_one_work+0x1c7/0x380
> [2.923456]  worker_thread+0x4d/0x380
> [2.923463]  ? rescuer_thread+0x3a0/0x3a0
> [2.923469]  kthread+0xe9/0x110
> [2.923476]  ? kthread_complete_and_exit+0x20/0x20
> [2.923482]  ret_from_fork+0x22/0x30
> [2.923493]  
> [2.923494] ---[ end trace  ]---
>
> (Maybe it's worth to mention that the LED back-light is on, while the
> screen appears black.)
>
> Cheers,
> Olaf
>
> P.S.: By the way: as a linux user for more than 20 years, I am very
> pleased to have the opportunity to contribute at least a little bit to
> the improvement. I'd like to use the chance to thank you all very much
> for building and developing this great operating system.



Re: Bug#1042753: nouveau bug in linux/6.1.38-2

2023-08-04 Thread Diederik de Haas
On Friday, 4 August 2023 15:11:46 CEST Olaf Skibbe wrote:
> (On the occasion a maybe silly question: am I right assuming that the
> kernel has to be build on the machine we want to reproduce the bug on?
> Otherwise it could use much faster hardware (running also bookworm).)

If that is also an amd64 machine running Debian kernel 6.1.38-2, it should be 
fine to build the kernel on the faster machine.

signature.asc
Description: This is a digitally signed message part.


Re: nouveau bug in linux/6.1.38-2

2023-08-04 Thread Karol Herbst
On Fri, Aug 4, 2023 at 2:48 PM Olaf Skibbe  wrote:
>
> On Fri, 4 Aug 2023 at 14:15, Karol Herbst wrote:
>
> > mind retrying with only fb725beca62d and 62aecf23f3d1 reverted?
>
> I will do this later this day (takes some time, it is a slow machine).
>
> > Would be weird if the other two commits are causing it. If that's the
> > case, it's a bit worrying that reverting either of the those causes
> > issues, but maybe there is a good reason for it. Anyway, mind figuring
> > out which of the two you need reverted to fix your issue? Thanks!
>
> I can do this. But if I build two kernels anyway, isn't it faster to
> build each with only one of the patches applied? Or do you expect the
> patches to interact (so that the bug would only be present when both are
> applied)?
>

How are you building the kernel? Because normally from git reverting
one of those shouldn't take long, because it doesn't recompile the
entire kernel. But yeah, you can potentially just revert one of one
for now and it should be fine.

> Cheers,
> Olaf
>



Re: nouveau bug in linux/6.1.38-2

2023-08-04 Thread Karol Herbst
On Fri, Aug 4, 2023 at 2:02 PM Thorsten Leemhuis
 wrote:
>
> Hi!
>
> On 02.08.23 23:28, Olaf Skibbe wrote:
> > Dear Maintainers,
> >
> > Hereby I would like to report an apparent bug in the nouveau driver in
> > linux/6.1.38-2.
>
> Thx for your report. Maybe your problem is caused by a incomplete
> backport. I Cced the maintainers for the drivers (and the regressions
> and the stable list), maybe one of them has an idea, as they know the
> driver.
>
> If they don't reply in the next few days, please check if the problem is
> also present in mainline. If not, check if the latest 6.1.y. release
> already fixes this. If not, try to check which of the four patches you
> reverted to make things going is actually causing this (e.g. first only
> revert the one that was applied last; then the two last ones; ...).
>
> > Running a current debian stable on a Dell Latitude E6510 with a
> > "NVIDIA Corporation GT218M" graphic card, the monitor turns black
> > after the grub screen. Also switching to a console (Strg-Alt-F2) shows
> > just a black screen. Access via ssh is possible.
> >
> > ~# uname -r
> > 6.1.0-10-amd64
> >
> > demesg shows the following error message:
> >
> > [3.560153] WARNING: CPU: 0 PID: 176 at
> > drivers/gpu/drm/nouveau/nvkm/engine/disp/dp.c:460
> > nvkm_dp_acquire+0x26a/0x490 [nouveau]
> > [3.560287] Modules linked in: sd_mod t10_pi sr_mod crc64_rocksoft
> > cdrom crc64 crc_t10dif crct10dif_generic nouveau(+) ahci libahci mxm_wmi
> > i2c_algo_bit drm_display_helper libata cec rc_core drm_ttm_helper ttm
> > scsi_mod e1000e drm_kms_helper ptp firewire_ohci sdhci_pci cqhci
> > ehci_pci sdhci ehci_hcd firewire_core i2c_i801 crct10dif_pclmul
> > crct10dif_common drm crc32_pclmul crc32c_intel psmouse usbcore mmc_core
> > crc_itu_t pps_core scsi_common i2c_smbus lpc_ich usb_common battery
> > video wmi button
> > [3.560322] CPU: 0 PID: 176 Comm: kworker/u16:5 Not tainted
> > 6.1.0-10-amd64 #1  Debian 6.1.38-2
> > [3.560325] Hardware name: Dell Inc. Latitude E6510/0N5KHN, BIOS A17
> > 05/12/2017
> > [3.560327] Workqueue: nvkm-disp nv50_disp_super [nouveau]
> > [3.560433] RIP: 0010:nvkm_dp_acquire+0x26a/0x490 [nouveau]
> > [3.560538] Code: 48 8b 44 24 58 65 48 2b 04 25 28 00 00 00 0f 85 37
> > 02 00 00 48 83 c4 60 44 89 e0 5b 5d 41 5c 41 5d 41 5e 41 5f c3 cc cc cc
> > cc <0f> 0b c1 e8 03 41 88 6d 62 44 89 fe 48 89 df 48 69 c0 cf 0d d6 26
> > [3.560541] RSP: 0018:9899c048bd60 EFLAGS: 00010246
> > [3.560542] RAX: 00041eb0 RBX: 88e0209d2600 RCX:
> > 00041eb0
> > [3.560544] RDX: c079f760 RSI:  RDI:
> > 9899c048bcf0
> > [3.560545] RBP: 0001 R08: 9899c048bc64 R09:
> > 5b76
> > [3.560546] R10: 000d R11: 9899c048bde0 R12:
> > ffea
> > [3.560548] R13: 88e00b39e480 R14: 00044d45 R15:
> > 
> > [3.560549] FS:  () GS:88e123c0()
> > knlGS:
> > [3.560551] CS:  0010 DS:  ES:  CR0: 80050033
> > [3.560552] CR2: 7f57f4e90451 CR3: 00018141 CR4:
> > 06f0
> > [3.560554] Call Trace:
> > [3.560558]  
> > [3.560560]  ? __warn+0x7d/0xc0
> > [3.560566]  ? nvkm_dp_acquire+0x26a/0x490 [nouveau]
> > [3.560671]  ? report_bug+0xe6/0x170
> > [3.560675]  ? handle_bug+0x41/0x70
> > [3.560679]  ? exc_invalid_op+0x13/0x60
> > [3.560681]  ? asm_exc_invalid_op+0x16/0x20
> > [3.560685]  ? init_reset_begun+0x20/0x20 [nouveau]
> > [3.560769]  ? nvkm_dp_acquire+0x26a/0x490 [nouveau]
> > [3.560888]  nv50_disp_super_2_2+0x70/0x430 [nouveau]
> > [3.560997]  nv50_disp_super+0x113/0x210 [nouveau]
> > [3.561103]  process_one_work+0x1c7/0x380
> > [3.561109]  worker_thread+0x4d/0x380
> > [3.561113]  ? rescuer_thread+0x3a0/0x3a0
> > [3.561116]  kthread+0xe9/0x110
> > [3.561120]  ? kthread_complete_and_exit+0x20/0x20
> > [3.561122]  ret_from_fork+0x22/0x30
> > [3.561130]  
> >
> > Further information:
> >
> > $ lspci -v -s $(lspci | grep -i vga | awk '{ print $1 }')
> > 01:00.0 VGA compatible controller: NVIDIA Corporation GT218M [NVS 3100M]
> > (rev a2) (prog-if 00 [VGA controller])
> > Subsystem: Dell Latitude E6510
> > Flags: bus master, fast devsel, latency 0, IRQ 27
> > Memory at e200 (32-bit, non-prefetchable) [size=16M]
> > Memory at d000 (64-bit, prefetchable) [size=256M]
> > Memory at e000 (64-bit, prefetchable) [size=32M]
> > I/O ports at 7000 [size=128]
> > Expansion ROM at 000c [disabled] [size=128K]
> > Capabilities: 
> > Kernel driver in use: nouveau
> > Kernel modules: nouveau
> >
> > I reported this bug to debian already, see
> > https://bugs.debian.org/1042753 for context.
> >
> > With support (thanks Diederik!) I managed to figure out that the cause
> > was a regression between upstream kernel version 6.1.27 and 6.1.38.
> >
> 

Re: nouveau bug in linux/6.1.38-2

2023-08-04 Thread Thorsten Leemhuis
Hi!

On 02.08.23 23:28, Olaf Skibbe wrote:
> Dear Maintainers,
> 
> Hereby I would like to report an apparent bug in the nouveau driver in
> linux/6.1.38-2.

Thx for your report. Maybe your problem is caused by a incomplete
backport. I Cced the maintainers for the drivers (and the regressions
and the stable list), maybe one of them has an idea, as they know the
driver.

If they don't reply in the next few days, please check if the problem is
also present in mainline. If not, check if the latest 6.1.y. release
already fixes this. If not, try to check which of the four patches you
reverted to make things going is actually causing this (e.g. first only
revert the one that was applied last; then the two last ones; ...).

> Running a current debian stable on a Dell Latitude E6510 with a
> "NVIDIA Corporation GT218M" graphic card, the monitor turns black
> after the grub screen. Also switching to a console (Strg-Alt-F2) shows
> just a black screen. Access via ssh is possible.
> 
> ~# uname -r
> 6.1.0-10-amd64
> 
> demesg shows the following error message:
> 
> [    3.560153] WARNING: CPU: 0 PID: 176 at
> drivers/gpu/drm/nouveau/nvkm/engine/disp/dp.c:460
> nvkm_dp_acquire+0x26a/0x490 [nouveau]
> [    3.560287] Modules linked in: sd_mod t10_pi sr_mod crc64_rocksoft
> cdrom crc64 crc_t10dif crct10dif_generic nouveau(+) ahci libahci mxm_wmi
> i2c_algo_bit drm_display_helper libata cec rc_core drm_ttm_helper ttm
> scsi_mod e1000e drm_kms_helper ptp firewire_ohci sdhci_pci cqhci
> ehci_pci sdhci ehci_hcd firewire_core i2c_i801 crct10dif_pclmul
> crct10dif_common drm crc32_pclmul crc32c_intel psmouse usbcore mmc_core
> crc_itu_t pps_core scsi_common i2c_smbus lpc_ich usb_common battery
> video wmi button
> [    3.560322] CPU: 0 PID: 176 Comm: kworker/u16:5 Not tainted
> 6.1.0-10-amd64 #1  Debian 6.1.38-2
> [    3.560325] Hardware name: Dell Inc. Latitude E6510/0N5KHN, BIOS A17
> 05/12/2017
> [    3.560327] Workqueue: nvkm-disp nv50_disp_super [nouveau]
> [    3.560433] RIP: 0010:nvkm_dp_acquire+0x26a/0x490 [nouveau]
> [    3.560538] Code: 48 8b 44 24 58 65 48 2b 04 25 28 00 00 00 0f 85 37
> 02 00 00 48 83 c4 60 44 89 e0 5b 5d 41 5c 41 5d 41 5e 41 5f c3 cc cc cc
> cc <0f> 0b c1 e8 03 41 88 6d 62 44 89 fe 48 89 df 48 69 c0 cf 0d d6 26
> [    3.560541] RSP: 0018:9899c048bd60 EFLAGS: 00010246
> [    3.560542] RAX: 00041eb0 RBX: 88e0209d2600 RCX:
> 00041eb0
> [    3.560544] RDX: c079f760 RSI:  RDI:
> 9899c048bcf0
> [    3.560545] RBP: 0001 R08: 9899c048bc64 R09:
> 5b76
> [    3.560546] R10: 000d R11: 9899c048bde0 R12:
> ffea
> [    3.560548] R13: 88e00b39e480 R14: 00044d45 R15:
> 
> [    3.560549] FS:  () GS:88e123c0()
> knlGS:
> [    3.560551] CS:  0010 DS:  ES:  CR0: 80050033
> [    3.560552] CR2: 7f57f4e90451 CR3: 00018141 CR4:
> 06f0
> [    3.560554] Call Trace:
> [    3.560558]  
> [    3.560560]  ? __warn+0x7d/0xc0
> [    3.560566]  ? nvkm_dp_acquire+0x26a/0x490 [nouveau]
> [    3.560671]  ? report_bug+0xe6/0x170
> [    3.560675]  ? handle_bug+0x41/0x70
> [    3.560679]  ? exc_invalid_op+0x13/0x60
> [    3.560681]  ? asm_exc_invalid_op+0x16/0x20
> [    3.560685]  ? init_reset_begun+0x20/0x20 [nouveau]
> [    3.560769]  ? nvkm_dp_acquire+0x26a/0x490 [nouveau]
> [    3.560888]  nv50_disp_super_2_2+0x70/0x430 [nouveau]
> [    3.560997]  nv50_disp_super+0x113/0x210 [nouveau]
> [    3.561103]  process_one_work+0x1c7/0x380
> [    3.561109]  worker_thread+0x4d/0x380
> [    3.561113]  ? rescuer_thread+0x3a0/0x3a0
> [    3.561116]  kthread+0xe9/0x110
> [    3.561120]  ? kthread_complete_and_exit+0x20/0x20
> [    3.561122]  ret_from_fork+0x22/0x30
> [    3.561130]  
> 
> Further information:
> 
> $ lspci -v -s $(lspci | grep -i vga | awk '{ print $1 }')
> 01:00.0 VGA compatible controller: NVIDIA Corporation GT218M [NVS 3100M]
> (rev a2) (prog-if 00 [VGA controller])
> Subsystem: Dell Latitude E6510
> Flags: bus master, fast devsel, latency 0, IRQ 27
> Memory at e200 (32-bit, non-prefetchable) [size=16M]
> Memory at d000 (64-bit, prefetchable) [size=256M]
> Memory at e000 (64-bit, prefetchable) [size=32M]
> I/O ports at 7000 [size=128]
> Expansion ROM at 000c [disabled] [size=128K]
> Capabilities: 
> Kernel driver in use: nouveau
> Kernel modules: nouveau
> 
> I reported this bug to debian already, see
> https://bugs.debian.org/1042753 for context.
> 
> With support (thanks Diederik!) I managed to figure out that the cause
> was a regression between upstream kernel version 6.1.27 and 6.1.38.
> 
> I build a new 6.1.38 kernel with these commits reverted:
> 
> 62aecf23f3d1 drm/nouveau: add nv_encoder pointer check for NULL
> fb725beca62d drm/nouveau/dp: check for NULL nv_connector->native_mode
> 90748be0f4f3 drm/nouveau: don't detect DSM for non-NVIDIA 

nouveau bug in linux/6.1.38-2

2023-08-03 Thread Olaf Skibbe

Dear Maintainers,

Hereby I would like to report an apparent bug in the nouveau driver in
linux/6.1.38-2.

Running a current debian stable on a Dell Latitude E6510 with a
"NVIDIA Corporation GT218M" graphic card, the monitor turns black
after the grub screen. Also switching to a console (Strg-Alt-F2) shows
just a black screen. Access via ssh is possible.

~# uname -r
6.1.0-10-amd64

demesg shows the following error message:

[3.560153] WARNING: CPU: 0 PID: 176 at 
drivers/gpu/drm/nouveau/nvkm/engine/disp/dp.c:460 nvkm_dp_acquire+0x26a/0x490 
[nouveau]
[3.560287] Modules linked in: sd_mod t10_pi sr_mod crc64_rocksoft cdrom 
crc64 crc_t10dif crct10dif_generic nouveau(+) ahci libahci mxm_wmi i2c_algo_bit 
drm_display_helper libata cec rc_core drm_ttm_helper ttm scsi_mod e1000e 
drm_kms_helper ptp firewire_ohci sdhci_pci cqhci ehci_pci sdhci ehci_hcd 
firewire_core i2c_i801 crct10dif_pclmul crct10dif_common drm crc32_pclmul 
crc32c_intel psmouse usbcore mmc_core crc_itu_t pps_core scsi_common i2c_smbus 
lpc_ich usb_common battery video wmi button
[3.560322] CPU: 0 PID: 176 Comm: kworker/u16:5 Not tainted 6.1.0-10-amd64 
#1  Debian 6.1.38-2
[3.560325] Hardware name: Dell Inc. Latitude E6510/0N5KHN, BIOS A17 
05/12/2017
[3.560327] Workqueue: nvkm-disp nv50_disp_super [nouveau]
[3.560433] RIP: 0010:nvkm_dp_acquire+0x26a/0x490 [nouveau]
[3.560538] Code: 48 8b 44 24 58 65 48 2b 04 25 28 00 00 00 0f 85 37 02 00 00 48 
83 c4 60 44 89 e0 5b 5d 41 5c 41 5d 41 5e 41 5f c3 cc cc cc cc <0f> 0b c1 e8 03 
41 88 6d 62 44 89 fe 48 89 df 48 69 c0 cf 0d d6 26
[3.560541] RSP: 0018:9899c048bd60 EFLAGS: 00010246
[3.560542] RAX: 00041eb0 RBX: 88e0209d2600 RCX: 00041eb0
[3.560544] RDX: c079f760 RSI:  RDI: 9899c048bcf0
[3.560545] RBP: 0001 R08: 9899c048bc64 R09: 5b76
[3.560546] R10: 000d R11: 9899c048bde0 R12: ffea
[3.560548] R13: 88e00b39e480 R14: 00044d45 R15: 
[3.560549] FS:  () GS:88e123c0() 
knlGS:
[3.560551] CS:  0010 DS:  ES:  CR0: 80050033
[3.560552] CR2: 7f57f4e90451 CR3: 00018141 CR4: 06f0
[3.560554] Call Trace:
[3.560558]  
[3.560560]  ? __warn+0x7d/0xc0
[3.560566]  ? nvkm_dp_acquire+0x26a/0x490 [nouveau]
[3.560671]  ? report_bug+0xe6/0x170
[3.560675]  ? handle_bug+0x41/0x70
[3.560679]  ? exc_invalid_op+0x13/0x60
[3.560681]  ? asm_exc_invalid_op+0x16/0x20
[3.560685]  ? init_reset_begun+0x20/0x20 [nouveau]
[3.560769]  ? nvkm_dp_acquire+0x26a/0x490 [nouveau]
[3.560888]  nv50_disp_super_2_2+0x70/0x430 [nouveau]
[3.560997]  nv50_disp_super+0x113/0x210 [nouveau]
[3.561103]  process_one_work+0x1c7/0x380
[3.561109]  worker_thread+0x4d/0x380
[3.561113]  ? rescuer_thread+0x3a0/0x3a0
[3.561116]  kthread+0xe9/0x110
[3.561120]  ? kthread_complete_and_exit+0x20/0x20
[3.561122]  ret_from_fork+0x22/0x30
[3.561130]  

Further information:

$ lspci -v -s $(lspci | grep -i vga | awk '{ print $1 }')
01:00.0 VGA compatible controller: NVIDIA Corporation GT218M [NVS 3100M] (rev 
a2) (prog-if 00 [VGA controller])
Subsystem: Dell Latitude E6510
Flags: bus master, fast devsel, latency 0, IRQ 27
Memory at e200 (32-bit, non-prefetchable) [size=16M]
Memory at d000 (64-bit, prefetchable) [size=256M]
Memory at e000 (64-bit, prefetchable) [size=32M]
I/O ports at 7000 [size=128]
Expansion ROM at 000c [disabled] [size=128K]
Capabilities: 
Kernel driver in use: nouveau
Kernel modules: nouveau

I reported this bug to debian already, see
https://bugs.debian.org/1042753 for context.

With support (thanks Diederik!) I managed to figure out that the cause
was a regression between upstream kernel version 6.1.27 and 6.1.38.

I build a new 6.1.38 kernel with these commits reverted:

62aecf23f3d1 drm/nouveau: add nv_encoder pointer check for NULL
fb725beca62d drm/nouveau/dp: check for NULL nv_connector->native_mode
90748be0f4f3 drm/nouveau: don't detect DSM for non-NVIDIA device
5a144bad3e75 nouveau: fix client work fence deletion race

With that kernel the graphic works again.

Please inform me if further tests are required.

Cheers,
Olaf