https://bugzilla.kernel.org/show_bug.cgi?id=220705

--- Comment #22 from Lukas Wunner ([email protected]) ---
Thanks AdriĆ ! I've analyzed the dmesg output you've provided and noticed a bug:
The kernel overwrites ASPM register bits which it should preserve instead. I
have just attached a patch to fix that. Maybe you could give it a spin.

However, while this is definitely a bug and the patch does need to be submitted
upstream, somehow I doubt that this is the root cause for the ASPM issues
you're seeing. In other words, I would be grateful if you could test the patch
but testing it will only help validate the hypothesis that this isn't the root
cause.

I dug around and it turns out that the Sunrise Point-LP PCH built into the
Pixelbook Eve suffers from an erratum which wasn't publicly documented until
2018, i.e. a year after the machine was introduced. Here's the link to the
errata sheet:
https://www.intel.de/content/dam/www/public/us/en/documents/specification-updates/100-series-chipset-spec-update.pdf

Erratum 47 at the end of the document explains that a certain clock latency
exceeds the PCIe spec-defined maximum and that L1 exit instabilities may occur
as a result. The recommended workaround is to "disable the associated PCH
SRCCLKREQ# signal to keep the PCIe clock active during L1".

Now this is really something that needs to be done by the BIOS, not by the
kernel. I think the corresponding fix for coreboot will look like this:

diff --git a/src/mainboard/google/eve/devicetree.cb
b/src/mainboard/google/eve/devicetree.cb
index 41c5cb0377..71f548664e 100644
--- a/src/mainboard/google/eve/devicetree.cb
+++ b/src/mainboard/google/eve/devicetree.cb
@@ -341,7 +341,10 @@ chip soc/intel/skylake
                        register "PcieRpAdvancedErrorReporting[0]" = "1"
                        register "PcieRpLtrEnable[0]" = "true"
                        register "PcieRpHotPlug[0]" = "1"
-                       register "PcieRpClkSrcNumber[0]" = "1"
+                       # Disable PCH SRCCLKREQ# signal to keep PCIe clock
+                       # active during L1, as recommended by erratum 47
+                       # of Intel 100 Series Chipset Family Spec Update
+                       register "PcieRpClkSrcNumber[0]" = "0x1f"
                        chip drivers/wifi/generic
                                register "wake" = "GPE0_PCI_EXP"
                                device pci 00.0 on end

So the PcieRpClkSrcNumber needs to be set to 0x1f instead of 1, which should
disable it.

I'll turn that into a proper commit but I've never worked on coreboot, so I
don't really know what I'm doing here. There are scripts in this MrChromebox
repo to install a custom firmware:
https://github.com/MrChromebox/scripts

I'm not sure if you're comfortable compiling and installing a custom coreboot.
I'm worried it might brick your machine.

So I've also prepared another test patch for the kernel. It disables the Enable
Clock Power Management bit on the wifi card. This seems to be an alternative
workaround for the erratum. The test patch also emits all the debug output of
the previous debug patch I provided. Disabling ClkPM is the only difference.
Maybe this patch helps. I'm not sure if disabling ClkPM at device enumeration
time is too late. Coreboot already enables it before passing control to the
kernel. But maybe we can verify with this patch whether I'm on the right track
and the erratum really is the root cause.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

_______________________________________________
acpi-bugzilla mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/acpi-bugzilla

Reply via email to