On 2020-05-27 15:41, Justin Hibbits wrote:
On Wed, 27 May 2020 06:27:16 -0700
John Baldwin <j...@freebsd.org> wrote:

On 5/27/20 2:39 AM, Andriy Gapon wrote:
On 27/05/2020 11:13, Andriy Gapon wrote:
I added more diagnostics and it seems to support the idea that the
problem is related to I/O cycles and bridges.

ACPI timer suddenly starts returning 0xffffffff and that lasts for
tens of microseconds before the timer goes back to returning
normal values with an expected increase.
AMD provides a proprietary way to access ACPI registers via MMIO
(0xfed808xx). That mechanism is unaffected, ACPI timer register
always returns good values.

The problem seems to happen when restoring configuration of a
particular PCI bridge.  What's interesting is that the bridge
decodes one memory range and one I/O range.

Looking at pci_cfg_restore() I wonder if it is wise to restore
PCIR_COMMAND so early.  Could it be that after the resume the
bridge is configured with a wrong I/O range (e.g., too wide) and
by writing PCIR_COMMAND we enable that decoding. So, the bridge
steals I/O cycles destined for ACPI support hardware.  If there is
nothing behind the bridge to handle those ports, then we get those
bad readings. Once the bridge configuration is fully restored, the
I/O handling goes back to normal.

 From what I see, this looks like a BIOS bug.
Upon resume, it swaps window configurations of pcib1 and pcib2
(until FreeBSD restores them).  pcib1 originally does not have an
I/O window.  So, BIOS programs both base and limit of pcib2 I/O
window to zero.   When FreeBSD writes its command register to
enable I/O decoding it starts claiming 0x0 - 0xFFF I/O port range.
That covers the ACPI ports at 0x8xx.

Some printf-s.
 From (verbose) boot time:
pcib1:   domain            0
pcib1:   secondary bus     1
pcib1:   subordinate bus   1
pcib1:   memory decode     0xfea00000-0xfeafffff
pcib2:   domain            0
pcib2:   secondary bus     2
pcib2:   subordinate bus   2
pcib2:   I/O decode        0xf000-0xffff
pcib2:   memory decode     0xfe900000-0xfe9fffff

My printf-s from resume time:
pcib1: old I/O base (low): 0xf1
pcib1: old I/O base (high): 0x0
pcib1: old I/O limit (low): 0x1
pcib1: old I/O limit (high): 0x0
pcib2: old I/O base (low): 0x1
pcib2: old I/O base (high): 0x0
pcib2: old I/O limit (low): 0x1
pcib2: old I/O limit (high): 0x0

The "solution" I think is to have resume be multi-pass and to resume
all the bridges first before trying to resume leaf devices (including
timers), but that's a fair bit of work.  It might be that we just
need to resume timer interrupts later after the new-bus resume (I
think we currently do it before?), though the reason for that was to
allow resume methods in devices to sleep (I'm not sure if any do).


That sounds like a good fit for https://reviews.freebsd.org/D203 .
Someone (TM) just needs to take it over the finish line... 6 years
later.

Is this perhaps related to:
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=237666

--HPS
_______________________________________________
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Reply via email to