Hi all, I am currently debugging an issue on an x86 machine running the latest Linux kernel, involving a PCIe device whose memory is mapped via BAR0. I am encountering unexpected behavior when reading its PCI configuration space using lspci, and I am seeking guidance on whether mmiotrace can help diagnose the problem.
Issue Summary: Expected Behavior After Boot: lspci -xxx -s 01:00.0 correctly displays valid PCI configuration space values, including a properly mapped BAR0. $ sudo lspci -xxx -s 01:00.0 | grep "10:" 10: 00 00 40 b0 00 00 00 00 00 00 00 00 00 00 00 00 Unexpected Behavior After Uptime: After a few days, reading the PCI configuration space (lspci -xxx -s 01:00.0) sometimes returns all 0xffs for the entire config space. dmesg does not log any relevant errors. $ sudo lspci -xxx -s 01:00.0 01:00.0 RAM memory: PLDA Device 5555 (rev ff) 00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 10: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 20: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 30: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 40: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 50: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 60: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 70: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 90: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff a0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff b0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff c0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff d0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff e0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff f0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff After Subsequent Reads: Re-running lspci -xxx -s 01:00.0 restores non-0xff values, but BAR0 gets reset to zero. $ sudo lspci -xxx -s 01:00.0 01:00.0 RAM memory: PLDA Device 5555 00: 56 15 55 55 00 00 10 00 00 00 00 05 00 00 00 00 10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00 00 40 00 00 30: 00 00 00 00 40 00 00 00 00 00 00 00 ff 01 00 00 40: 01 48 03 00 08 00 00 00 05 60 00 00 00 00 00 00 50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 60: 10 00 02 00 c2 8f 00 00 10 28 01 00 21 f4 03 00 70: 00 00 21 00 00 00 00 00 00 00 00 00 00 00 00 00 80: 00 00 00 00 02 00 00 00 00 00 00 00 00 00 00 00 90: 20 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 This suggests that some function or driver is resetting BAR0 during or after a failed config space read. mmiotrace Setup & Results: I have enabled mmiotrace and verified it is active: # cat /sys/kernel/tracing/available_tracers hwlat blk mmiotrace function_graph wakeup_dl wakeup_rt wakeup function nop # cat current_tracer mmiotrace However, trace_pipe and trace logs remain empty even after reproducing the issue: # cat trace_pipe VERSION 20070824 PCIDEV 0000 80860f00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 iosf_mbi_pci PCIDEV 0010 80860f31 61 b0000000 0 a0000008 0 e081 0 c0002 400000 0 10000000 0 8 0 20000 i915 PCIDEV 0098 80860f23 5b e071 e061 e051 e041 e021 b0b17000 0 8 4 8 4 20 800 0 ahci PCIDEV 00a0 80860f35 5a b0b00004 0 0 0 0 0 0 10000 0 0 0 0 0 0 xhci_hcd PCIDEV 00b8 80860f50 17 b0b16000 b0b15000 0 0 0 0 0 1000 1000 0 0 0 0 0 sdhci-pci PCIDEV 00d0 80860f18 62 b0900000 b0800000 0 0 0 0 0 100000 100000 0 0 0 0 0 mei_txe PCIDEV 00d8 80860f04 16 b0b10004 0 0 0 0 0 0 4000 0 0 0 0 0 0 snd_hda_intel PCIDEV 00e0 80860f48 57 0 0 0 0 0 0 0 0 0 0 0 0 0 0 pcieport PCIDEV 00e2 80860f4c 58 0 0 0 0 0 0 0 0 0 0 0 0 0 0 pcieport PCIDEV 00e3 80860f4e 59 0 0 0 0 0 0 0 0 0 0 0 0 0 0 pcieport PCIDEV 00f8 80860f1c 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 lpc_ich PCIDEV 00fb 80860f12 12 b0b14000 0 0 0 e001 0 0 20 0 0 0 20 0 0 i801_smbus PCIDEV 0100 15565555 b b0400000 0 0 0 0 0 0 400000 0 0 0 0 0 0 PCIDEV 0300 80861533 13 b0a00000 0 d001 b0a80000 0 0 0 80000 0 20 4000 0 0 0 igb cat trace # tracer: mmiotrace # # entries-in-buffer/entries-written: 0/0 #P:1 # # _-----=> irqs-off/BH-disabled # / _----=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth # ||| / _-=> migrate-disable # |||| / delay # TASK-PID CPU# ||||| TIMESTAMP FUNCTION # | | | ||||| | | Request for Assistance: Can mmiotrace help determine the root cause of why reading the PCI configuration space results in all 0xffs? Is there a way to determine what function or driver is clearing BAR0 when the values are restored? If mmiotrace is suitable for this, how can I properly capture the relevant trace data to analyze this issue? Any insights or suggestions would be greatly appreciated. Please let me know if you need more details. Best regards, Naveen
