Control: forwarded -1 https://lore.kernel.org/lkml/abE_QoS5DM-ZltaV@monoceros

#regzbot introduced: a60b990798eb17433d0283788280422b1bd94b18
#regzbot from: "Aaron D. Johnson" <[email protected]>
#regzbot monitor: https://bugs.debian.org/1127635

Hello,

On Sat, Dec 14, 2024 at 12:50:18PM +0100, Thomas Gleixner wrote:
> Alexandre observed a warning emitted from pci_msi_setup_msi_irqs() on a
> RISCV platform which does not provide PCI/MSI support:
> 
>  WARNING: CPU: 1 PID: 1 at drivers/pci/msi/msi.h:121 
> pci_msi_setup_msi_irqs+0x2c/0x32
>  __pci_enable_msix_range+0x30c/0x596
>  pci_msi_setup_msi_irqs+0x2c/0x32
>  pci_alloc_irq_vectors_affinity+0xb8/0xe2
> 
> RISCV uses hierarchical interrupt domains and correctly does not implement
> the legacy fallback. The warning triggers from the legacy fallback stub.
> 
> That warning is bogus as the PCI/MSI layer knows whether a PCI/MSI parent
> domain is associated with the device or not. There is a check for MSI-X,
> which has a legacy assumption. But that legacy fallback assumption is only
> valid when legacy support is enabled, but otherwise the check should simply
> return -ENOTSUPP.
> 
> Loongarch tripped over the same problem and blindly enabled legacy support
> without implementing the legacy fallbacks. There are weak implementations
> which return an error, so the problem was papered over.
> 
> Correct pci_msi_domain_supports() to evaluate the legacy mode and add
> the missing supported check into the MSI enable path to complete it.
> 
> Fixes: d2a463b29741 ("PCI/MSI: Reject multi-MSI early")
> Reported-by: Alexandre Ghiti <[email protected]>
> Signed-off-by: Thomas Gleixner <[email protected]>
> Tested-by: Alexandre Ghiti <[email protected]>
> Cc: [email protected]

this patch became a60b990798eb17433d0283788280422b1bd94b18 in v6.13-rc5
and was backported to 6.12.y and 6.6.y (aed157301c65 and b1f7476e07b9
respectively).

A Debian user (Aaron, on Cc:) on powerpc has boot problems and bisected
them to this commit. The relevant boot log of the failure is:

[    2.643879] BUG: Kernel NULL pointer dereference on read at 0x00000000
[    2.643891] Faulting instruction address: 0xc000000000a39514
[    2.643902] Oops: Kernel access of bad area, sig: 11 [#1]
[    2.643909] BE PAGE_SIZE=4K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
[    2.643920] Modules linked in: ohci_pci(+) ehci_hcd nvme_fabrics ohci_hcd 
nvme_keyring nvme_core usbcore nvme_auth scsi_transport_fc ipr configfs ehea(+) 
usb_common
[    2.643965] CPU: 5 UID: 0 PID: 250 Comm: (udev-worker) Not tainted 
6.12.17-powerpc64 #1  Debian 6.12.17-1
[    2.643976] Hardware name: IBM,8204-E8A POWER6 (architected) 0x3e0302 
0xf000002 of:IBM,EL350_118 hv:phyp pSeries
[    2.643986] NIP:  c000000000a39514 LR: c000000000a36ed8 CTR: c000000000a35820
[    2.643995] REGS: c0000000351f6f60 TRAP: 0300   Not tainted  
(6.12.17-powerpc64 Debian 6.12.17-1)
[    2.644004] MSR:  8000000000009032 <SF,EE,ME,IR,DR,RI>  CR: 24222288  XER: 
00000000
[    2.644031] CFAR: c00000000000cfc4 DAR: 0000000000000000 DSISR: 40000000 
IRQMASK: 0
[    2.644031] GPR00: c000000000a36ed8 c0000000351f7200 c00000000182e200 
c0000003df294000
[    2.644031] GPR04: 0000000000000000 0000000000000000 0000000000000000 
0000000000000000
[    2.644031] GPR08: 0000000000000001 0000000000000000 c00000000228fcc0 
0000000044222288
[    2.644031] GPR12: c000000000a35820 c00000000eeacb00 0000000000000020 
0000010037fcab20
[    2.644031] GPR16: 0000000022222248 0000000000020000 0000000000000000 
00003fffebe8bb80
[    2.644031] GPR20: 0000000000000000 c00000000204db60 c00000000204dd60 
c00000000b1ae780
[    2.644031] GPR24: 0000000000000000 00003fff8c9ac758 0000000000000000 
c0000003df294000
[    2.644031] GPR28: 0000000000000001 0000000000000000 c0000003df294000 
0000000000000001
[    2.644164] NIP [c000000000a39514] pci_msi_domain_supports 
(drivers/pci/msi/irqdomain.c:366)
[    2.644181] LR [c000000000a36ed8] __pci_enable_msi_range 
(drivers/pci/msi/msi.c:437)
[    2.644192] Call Trace:
[    2.644197] [c0000000351f7200] [c0000000351f7304] 0xc0000000351f7304 
(unreliable)
[    2.644211] [c0000000351f7340] [c000000000a3578c] 
pci_alloc_irq_vectors_affinity (drivers/pci/msi/api.c:277)
[    2.644225] [c0000000351f73d0] [c0003d0007d2f4d4] usb_hcd_pci_probe 
(drivers/usb/core/hcd-pci.c:192) usbcore
[    2.644246] [c0000000351f7470] [c0003d00084e6030] ohci_pci_probe 
(drivers/usb/host/ohci-pci.c:285) ohci_pci
[    2.644260] [c0000000351f7490] [c000000000a260e8] local_pci_probe 
(drivers/pci/pci-driver.c:324)
[    2.644274] [c0000000351f7510] [c000000000a26218] pci_call_probe 
(drivers/pci/pci-driver.c:392 (discriminator 1))
[    2.644287] [c0000000351f7670] [c000000000a27348] pci_device_probe 
(drivers/pci/pci-driver.c:452)
[    2.644300] [c0000000351f76b0] [c000000000b2e658] really_probe 
(drivers/base/dd.c:579 drivers/base/dd.c:658)
[    2.644314] [c0000000351f7740] [c000000000b2eb24] __driver_probe_device 
(drivers/base/dd.c:800)
[    2.644327] [c0000000351f77c0] [c000000000b2edc4] driver_probe_device 
(drivers/base/dd.c:831)
[    2.644340] [c0000000351f7800] [c000000000b2f188] __driver_attach 
(drivers/base/dd.c:1217)
[    2.644352] [c0000000351f7880] [c000000000b2ac64] bus_for_each_dev 
(drivers/base/bus.c:370)
[    2.644365] [c0000000351f78e0] [c000000000b2dac4] driver_attach 
(drivers/base/dd.c:1234)
[    2.644377] [c0000000351f7900] [c000000000b2cd98] bus_add_driver 
(drivers/base/bus.c:675)
[    2.644389] [c0000000351f7990] [c000000000b30ae4] driver_register 
(drivers/base/driver.c:246)
[    2.644402] [c0000000351f7a00] [c000000000a24f88] __pci_register_driver 
(drivers/pci/pci-driver.c:1450)
[    2.644415] [c0000000351f7a20] [c0003d00084e6800] ohci_pci_init 
(drivers/usb/host/ohci-pci.c:308) ohci_pci
[    2.644429] [c0000000351f7a50] [c00000000000fd60] do_one_initcall 
(init/main.c:1269)
[    2.644444] [c0000000351f7b30] [c0000000002760f8] do_init_module 
(kernel/module/main.c:2543)
[    2.644460] [c0000000351f7bb0] [c000000000278fe4] init_module_from_file 
(kernel/module/main.c:3199)
[    2.644473] [c0000000351f7c90] [c0000000002793e0] sys_finit_module 
(kernel/module/main.c:3211 kernel/module/main.c:3238 kernel/module/main.c:3221)
[    2.644487] [c0000000351f7da0] [c00000000002c084] system_call_exception 
(arch/powerpc/kernel/syscall.c:171)
[    2.644500] [c0000000351f7e50] [c00000000000cb54] system_call_common 
(arch/powerpc/kernel/interrupt_64.S:292)
[    2.644515] --- interrupt: c00 at 0x3fff8d653d8c
[    2.644522] NIP:  00003fff8d653d8c LR: 00003fff8c9a4680 CTR: 0000000000000000
[    2.644531] REGS: c0000000351f7e80 TRAP: 0c00   Not tainted  
(6.12.17-powerpc64 Debian 6.12.17-1)
[    2.644541] MSR:  800000000200f032 <SF,VEC,EE,PR,FP,ME,IR,DR,RI>  CR: 
22222222  XER: 00000000
[    2.644573] IRQMASK: 0
[    2.644573] GPR00: 0000000000000161 00003fffebe8b640 00003fff8d757100 
0000000000000052
[    2.644573] GPR04: 00003fff8c9ac758 0000000000000004 0000000000000058 
000000000000005a
[    2.644573] GPR08: 0000000000000000 0000000000000000 0000000000000000 
0000000000000000
[    2.644573] GPR12: 0000000000000000 00003fff8de947c0 0000000000000020 
0000010037fcab20
[    2.644573] GPR16: 0000000022222248 0000000000020000 0000000000000000 
00003fffebe8bb80
[    2.644573] GPR20: 0000000000000000 00003fffebe8bb70 0000000000000007 
0000010037fca210
[    2.644573] GPR24: 0000000000000000 0000000000000000 0000010037f6be40 
0000000000000004
[    2.644573] GPR28: 00003fff8c9ac758 0000000000020000 0000000000000004 
0000010037fca210
[    2.644698] NIP [00003fff8d653d8c] 0x3fff8d653d8c
[    2.644705] LR [00003fff8c9a4680] 0x3fff8c9a4680
[    2.644713] --- interrupt: c00
[ 2.644719] Code: 4182002c e92a0088 80690000 7c632038 7c632278 7c630034 
5463d97e 786307e0 4e800020 60000000 60000000 e92a0020 <80690000> 4bffffd8 
60000000 7ca50034
All code
========
   0:*  41 82 00 2c     beq     0x2c            <-- trapping instruction
   4:   e9 2a 00 88     ld      r9,136(r10)
   8:   80 69 00 00     lwz     r3,0(r9)
   c:   7c 63 20 38     and     r3,r3,r4
  10:   7c 63 22 78     xor     r3,r3,r4
  14:   7c 63 00 34     cntlzw  r3,r3
  18:   54 63 d9 7e     srwi    r3,r3,5
  1c:   78 63 07 e0     clrldi  r3,r3,63
  20:   4e 80 00 20     blr
  24:   60 00 00 00     nop
  28:   60 00 00 00     nop
  2c:   e9 2a 00 20     ld      r9,32(r10)
  30:   80 69 00 00     lwz     r3,0(r9)
  34:   4b ff ff d8     b       0xc
  38:   60 00 00 00     nop
  3c:   7c a5 00 34     cntlzw  r5,r5

Code starting with the faulting instruction
===========================================
   0:   80 69 00 00     lwz     r3,0(r9)
   4:   4b ff ff d8     b       0xffffffffffffffdc
   8:   60 00 00 00     nop
   c:   7c a5 00 34     cntlzw  r5,r5
[    2.644769] ---[ end trace 0000000000000000 ]---


(That's the bug splat from the bug report piped through
scripts/decode_stacktrace.sh)

The kernel has CONFIG_PCI_MSI_ARCH_FALLBACKS=y, so the first hunk
shouldn't change anything.

The disassembly of pci_msi_domain_supports in the kernel looks as
follows:

        c000000000a394c0 <pci_msi_domain_supports>:
        pci_msi_domain_supports():
        
debian/build/build_powerpc_none_powerpc64/drivers/pci/msi/irqdomain.c:334
        c000000000a394c0:       60 00 00 00     nop
        c000000000a394c4:       60 00 00 00     nop
        
debian/build/build_powerpc_none_powerpc64/drivers/pci/msi/irqdomain.c:353
        c000000000a394c8:       e9 43 02 e8     ld      r10,744(r3)
        c000000000a394cc:       2c 2a 00 00     cmpdi   r10,0
        c000000000a394d0:       41 82 00 50     beq     c000000000a39520 
<pci_msi_domain_supports+0x60>
        irq_domain_is_hierarchy():
        debian/build/build_powerpc_none_powerpc64/include/linux/irqdomain.h:661
        c000000000a394d4:       81 2a 00 28     lwz     r9,40(r10)
        pci_msi_domain_supports():
        
debian/build/build_powerpc_none_powerpc64/drivers/pci/msi/irqdomain.c:353 
(discriminator 1)
        c000000000a394d8:       71 28 00 01     andi.   r8,r9,1
        c000000000a394dc:       41 82 00 44     beq     c000000000a39520 
<pci_msi_domain_supports+0x60>
        
debian/build/build_powerpc_none_powerpc64/drivers/pci/msi/irqdomain.c:359 
(discriminator 1)
        c000000000a394e0:       71 29 01 00     andi.   r9,r9,256
        c000000000a394e4:       41 82 00 2c     beq     c000000000a39510 
<pci_msi_domain_supports+0x50>
        
debian/build/build_powerpc_none_powerpc64/drivers/pci/msi/irqdomain.c:375
        c000000000a394e8:       e9 2a 00 88     ld      r9,136(r10)
        c000000000a394ec:       80 69 00 00     lwz     r3,0(r9)
        
debian/build/build_powerpc_none_powerpc64/drivers/pci/msi/irqdomain.c:378
        c000000000a394f0:       7c 63 20 38     and     r3,r3,r4
        c000000000a394f4:       7c 63 22 78     xor     r3,r3,r4
        c000000000a394f8:       7c 63 00 34     cntlzw  r3,r3
        c000000000a394fc:       54 63 d9 7e     srwi    r3,r3,5
        
debian/build/build_powerpc_none_powerpc64/drivers/pci/msi/irqdomain.c:379
        c000000000a39500:       78 63 07 e0     clrldi  r3,r3,63
        c000000000a39504:       4e 80 00 20     blr
        c000000000a39508:       60 00 00 00     nop
        c000000000a3950c:       60 00 00 00     nop
        
debian/build/build_powerpc_none_powerpc64/drivers/pci/msi/irqdomain.c:366
        c000000000a39510:       e9 2a 00 20     ld      r9,32(r10)
        c000000000a39514:       80 69 00 00     lwz     r3,0(r9)
        c000000000a39518:       4b ff ff d8     b       c000000000a394f0 
<pci_msi_domain_supports+0x30>
        c000000000a3951c:       60 00 00 00     nop
        
debian/build/build_powerpc_none_powerpc64/drivers/pci/msi/irqdomain.c:355
        c000000000a39520:       7c a5 00 34     cntlzw  r5,r5
        c000000000a39524:       54 a3 d9 7e     srwi    r3,r5,5
        
debian/build/build_powerpc_none_powerpc64/drivers/pci/msi/irqdomain.c:379
        c000000000a39528:       78 63 07 e0     clrldi  r3,r3,63
        c000000000a3952c:       4e 80 00 20     blr


so the trapping happens in drivers/pci/msi/irqdomain.c:366 which is:

365                     info = domain->host_data;
366                     supported = info->flags;

According to the register dump domain == r10 == NULL, but then this code
would not have been reached and the faulting instruction would be at
c000000000a39510. So maybe it's only .host_data = NULL and the register
dump is unreliable??

The offsets match: .host_data is at offset 32 of struct
irq_domain and .flags is at offset 0 of struct msi_domain_info.

For more details see
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1127635 .

Does someone spot the issue?

Best regards
Uwe

Attachment: signature.asc
Description: PGP signature

Reply via email to