On Sat, Aug 29, 2020 at 10:46:27PM -0400, James Hastings wrote:
> On Mon, Aug 10, 2020 at 08:34:36PM +0000, Mikolaj Kucharski wrote:
> > >Synopsis: LTE mini-PCIe modem not showing up after reboot
> > >Category: kernel
> > >Environment:
> > System : OpenBSD 6.7
> > Details : OpenBSD 6.7-current (GENERIC.MP) #15: Sun Aug 9 17:48:40
> > MDT 2020
> >
> > [email protected]:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> >
> > Architecture: OpenBSD.amd64
> > Machine : amd64
> > >Description:
> > I have PC Engines APU2 board with Sierra Wireless MC7455 LTE modem on
> > mini-PCIe.
> > When I cold boot the system, device is properly detected:
> >
> > uhub2 at uhub1 port 1 configuration 1 interface 0 "Advanced Micro Devices
> > Hub" rev 2.00/0.18 addr 2
> > umsm0 at uhub2 port 4 configuration 1 interface 0 "Sierra Wireless,
> > Incorporated Sierra Wireless MC7455 Qualcomm\M-. Snapdragon? X7 LTE-A" rev
> > 2.10/0.06 addr 3
> > ucom0 at umsm0
> > umsm1 at uhub2 port 4 configuration 1 interface 2 "Sierra Wireless,
> > Incorporated Sierra Wireless MC7455 Qualcomm\M-. Snapdragon? X7 LTE-A" rev
> > 2.10/0.06 addr 3
> > ucom1 at umsm1
> > umsm2 at uhub2 port 4 configuration 1 interface 3 "Sierra Wireless,
> > Incorporated Sierra Wireless MC7455 Qualcomm\M-. Snapdragon? X7 LTE-A" rev
> > 2.10/0.06 addr 3
> > ucom2 at umsm2
> > umsm3 at uhub2 port 4 configuration 1 interface 8 "Sierra Wireless,
> > Incorporated Sierra Wireless MC7455 Qualcomm\M-. Snapdragon? X7 LTE-A" rev
> > 2.10/0.06 addr 3
> > ucom3 at umsm3
> > umsm4 at uhub2 port 4 configuration 1 interface 10 "Sierra Wireless,
> > Incorporated Sierra Wireless MC7455 Qualcomm\M-. Snapdragon? X7 LTE-A" rev
> > 2.10/0.06 addr 3
> > ucom4 at umsm4
> >
> > However when I reboot OpenBSD Sierra Wireless card is no longer detected by
> > the OS and
> > is not visible any more from the system:
> >
> > uhub2 at uhub1 port 1 configuration 1 interface 0 "Advanced Micro Devices
> > Hub" rev 2.00/0.18 addr 2
> >
> > >How-To-Repeat:
> > Cold boot APU2 with 1199:9071 Sierra Wireless card, then reboot the OS
> > and card is missing.
> >
> > >Fix:
> > Unknown. I've looked around and found following resources:
> >
> > - OpenBSD misc, broken EHCI USB on AMD chipset?
> > https://marc.info/?t=151192838800001&r=1&w=2
> >
> > - PCI: Workaround AMD EHCI controller PME bug
> > https://patchwork.kernel.org/patch/9797105/
> >
> > - usb: host: ehci: workaround PME bug on AMD EHCI controller
> > https://patchwork.kernel.org/patch/9783041/
> >
> > - Linux kernel, pci_fixup_amd_ehci_pme() function
> > https://github.com/torvalds/linux/blob/fc80c51fd4b23ec007e88d4c688f2cac1b8648e7/arch/x86/pci/fixup.c#L580-L593
> >
> > - SB700 Family Product Errata
> > https://www.amd.com/system/files/TechDocs/46837.pdf
> >
> > - AMD SB700/710/750 Register Programming Requirements
> > http://ftp.loongnix.org/doc/02data%20sheet/loongson3a/SB/42413_sb7xx_rpr_pub_1.00.pdf
>
> after reading errata #11: Enabling EHCI Dynamic Clock Gating May Cause Bug
> Code 0xFE System Error
>
> Description
> A system error has been observed during extended S4 Hibernation or Reboot
> cycling using the MS PWRTST
> or other similar utility. The arbiter in the Southbridge that controls the
> down stream memory traffic to the USB
> controller does not fully support the EHCI clock gating feature. If the clock
> gating feature in the EHCI
> controller is enabled, the arbiter may transfer incorrect memory data to the
> EHCI controller and cause the
> controller to not respond back correctly to the USB driver or the device. In
> such cases, the USB driver may
> timeout and cause the operating system to report the system error.
>
> Potential Effect on System
> The problem may present itself as a system halt with an operating system stop
> error message with bug check
> code related to a USB driver failure. The typical operating system error
> message is BUGCODE_USB_DRIVER
> bug check value of 0x000000FE. The system error occurs mostly if there are
> USB devices connected to the
> system. The failure is intermittent and the failure rate may vary from one
> system to another. On most systems
> the failure has been observed to occur after a very large number of reboot
> cycles (typically more than 1000
> cycles). On a small number of systems the issue may be seen within two
> hundred reboot cycles.
>
> Suggested Workaround
> A BIOS workaround is described in section 6.17.1 of the SB7xx Register
> Programming Requirements
> document (PID # 42413). The workaround involves disabling the EHCI Dynamic
> Clock gating Power
> Management feature in the USB EHCI controller. The feature, when disabled,
> impacts the total Southbridge
> power consumption by less than 10 mW.
>
> 6.17 EHCI Dynamic Clock Gating Feature
> ASIC Rev Register Settings Function/Comment
> All Revs SB7x0 EHCI_BAR 0xBC Bit[12] = 0 For normal operation, the clock
> gating feature must be
> disabled. At system reset, this bit is set to “1”. So, BIOS needs to program
> this bit to “0”.
> EHCI clock gating setting must be programmed in both the EHCI host
> controllers.
> Bus-0, dev-18 fun 2 and Bus 0 dev-19 fun-2
> A BIOS workaround is required to disable the EHCI Dynamic clock gating on
> resume from S5/S4.
>
>
> does this diff help?
Ok, I have only short answer as I didn't yet wrap my head around this.
Unfortunately it doesn't make any difference. After reboot LTE modem is
not visible from within the operating system.
I've modified your diff a bit, to see a bit more and with following
code:
/* disable dynamic clock gating */
value = EREAD4(&sc->sc, EHCI_HUDSON2_CLKGATE_REG);
printf(" XXX value1=0x%08x", value);
value &= ~EHCI_HUDSON2_CLKGATE_ENABLE;
EWRITE4(&sc->sc, EHCI_HUDSON2_CLKGATE_REG, value);
value = EREAD4(&sc->sc, EHCI_HUDSON2_CLKGATE_REG);
printf(" value2=0x%08x XXX", value);
I see following line in dmesg:
ehci0 at pci0 dev 19 function 0 "AMD Hudson-2 USB2" rev 0x39 XXX
value1=0x00005040 value2=0x00004040 XXX: apic 4 int 18
However I'm not sure, should that be reflected somehow in lspci output?
I'm not attaching lspci output, as not sure what exactly to attach and I
cannot find correlation between lspci -vvv -xxxx and quirk from your
diff.
Nonetheless thanks James for looking into this, as at present I'm stuck
without any progress on this issue.
> Index: dev/pci/ehci_pci.c
> ===================================================================
> RCS file: /cvs/src/sys/dev/pci/ehci_pci.c,v
> retrieving revision 1.31
> diff -u -p -u -r1.31 ehci_pci.c
> --- dev/pci/ehci_pci.c 2 May 2019 20:28:46 -0000 1.31
> +++ dev/pci/ehci_pci.c 30 Aug 2020 00:22:00 -0000
> @@ -67,6 +67,8 @@ struct ehci_pci_softc {
>
> int ehci_sb700_match(struct pci_attach_args *pa);
>
> +#define EHCI_HUDSON2_CLKGATE_REG 0xbc
> +#define EHCI_HUDSON2_CLKGATE_ENABLE (1 << 12)
> #define EHCI_SBx00_WORKAROUND_REG 0x50
> #define EHCI_SBx00_WORKAROUND_ENABLE (1 << 3)
> #define EHCI_VT6202_WORKAROUND_REG 0x48
> @@ -131,6 +133,17 @@ ehci_pci_attach(struct device *parent, s
>
> /* Handle quirks */
> switch (PCI_VENDOR(pa->pa_id)) {
> + case PCI_VENDOR_AMD:
> + if (PCI_PRODUCT(pa->pa_id) == PCI_PRODUCT_AMD_HUDSON2_EHCI) {
> + pcireg_t value;
> +
> + /* disable dynamic clock gating */
> + value = EREAD4(&sc->sc, EHCI_HUDSON2_CLKGATE_REG);
> + value &= ~EHCI_HUDSON2_CLKGATE_ENABLE;
> + EWRITE4(&sc->sc, EHCI_HUDSON2_CLKGATE_REG, value);
> + }
> + break;
> +
> case PCI_VENDOR_ATI:
> if (PCI_PRODUCT(pa->pa_id) == PCI_PRODUCT_ATI_SB600_EHCI ||
> (PCI_PRODUCT(pa->pa_id) == PCI_PRODUCT_ATI_SB700_EHCI &&
>
--
Regards,
Mikolaj