> From mkb Sun Feb 25 14:19:33 2018
> From: [email protected]
> To: [email protected]
> Subject: intermittent sluggish behavior; seems to be acpi related
>
> >Synopsis: intermittent sluggish behavior; seems to be acpi related
> >Category: amd64
> >Environment:
> System : OpenBSD 6.2
> Details : OpenBSD 6.2-current (GENERIC.MP) #10: Wed Feb 21 21:26:27
> MST 2018
>
> [email protected]:/usr/src/sys/arch/amd64/compile/GENERIC.MP
>
> Architecture: OpenBSD.amd64
> Machine : amd64
> >Description:
> This is a Lenovo ThinkPad T480s.
>
> Sometimes, when I boot my system, it runs great. Other times, it runs
> very sluggishly. I'd say it's good about half the time. I've
> narrowed this down to ACPI on the following evidence.
>
> A good boot:
> $ uptime && ps auxwwk | grep acpi0
> 5:53PM up 18 mins, 1 user, load averages: 0.00, 0.00, 0.00
> root 45527 0.0 0.0 0 0 ?? DK 5:35PM 0:00.22
> (acpi0)
>
> A bad boot:
> $ uptime && ps auxwwk | grep acpi0
> 4:45PM up 18 mins, 1 user, load averages: 1.03, 1.00, 0.75
> root 97711 87.0 0.0 0 0 ?? DK 4:27PM 15:43.95
> (acpi0)
>
> The system runs very sluggishly on a bad boot. Starting an xterm
> should be and is, on a good boot, instant. On a bad boot, it takes
> 10 seconds or so. Clearly something is wrong, but I haven't been able
> to pinpoint what exactly is wrong.
>
> Here's the acpidump output:
>
> http://www.martinbrandenburg.com/2018/acpi.tar.gz
>
> In an effort to find the problem, I enabled ACPI_DEBUG. I couldn't
> make any sense, and I'm afraid too much has scrolled off the top, but
> in case any of it is useful, here it is:
>
> http://www.martinbrandenburg.com/2018/bad.dmesg
> http://www.martinbrandenburg.com/2018/good.dmesg
>
> This seems to be related to another problem. Sometimes when I boot
> the BIOS outputs "Configuration changed -- restart the system" and
> does so. I admit to not recording every instance, but it seems that
> when that occurs, the system is fine. When the system boots without
> restarting, the system is sluggish and I have the problems described.
>
> I've booted a Linux live USB quite a few times, and never had this
> kind of trouble there. As long as I don't boot OpenBSD, I never see
> the "Configuration changed -- restart the system" message. But I'd
> much prefer to actually use OpenBSD.
>
> I can supply more information or run tests to gather more data if
> needed.
> >How-To-Repeat:
> Boot OpenBSD on a ThinkPad T480s and possibly other newer ThinkPads
> until the problem occurs.
> >Fix:
> Unknown.
>
>
I have some more information.
I had noticed that the problem always shows up after suspending my
system.
Prior to suspend, vmstat -i shows
irq144/acpi0 318 0
After suspend, the system gets sluggish and acpi0's CPU time explodes as
described. Then running vmstat -i periodically over the course of about
10 minutes reveals that ACPI interrupts just go up and up.
irq144/acpi0 282494 152
irq144/acpi0 385191 197
irq144/acpi0 436550 218
irq144/acpi0 517509 250
irq144/acpi0 600715 280
irq144/acpi0 737721 325
Putting a printf in acpi_gpe revealed that excepting one at boot, no GPE
events occur until after suspend the system, where a deluge of _L6F show
up.
Decompiling the AML revealed this had something to do with Thunderbolt.
I don't think OpenBSD supports Thunderbolt, and I don't care to use it
anyway. I went to the BIOS to disable it, but found an option "Enable
Thunderbolt BIOS Assist Mode" which purported to be necessary for older
versions of Windows and Linux. I enabled it.
This seems to stop the problem after suspend.
However, I still occassionally see them when I first boot, before
attempting to suspend. The printf starts before /etc/rc even starts
running.
I have a USB-C to VGA adapter
uhidev2 at uhub1 port 1 configuration 1 interface 1 "Lenovo Lenovo USB-C to VGA
Adapter" rev 2.01/0.00 addr 2
uhidev2: iclass 3/0, 237 report ids
uhid0 at uhidev2 reportid 237: input=0, output=0, feature=80
ugen2 at uhub1 port 1 configuration 1 "Lenovo Lenovo USB-C to VGA Adapter" rev
2.01/0.00 addr 2
which also sometimes triggers it no matter whether the BIOS option is on
or off. However, I have a similar USB-C to DisplayPort adapter which
does not.
I am now running the following patch, which at least makes the machine
usable and lets me see when the first bad interrupt has happened.
Obviously it isn't a real fix.
I'll update with more information if I find it.
Index: acpi.c
===================================================================
RCS file: /cvs/src/sys/dev/acpi/acpi.c,v
retrieving revision 1.340
diff -u -p -r1.340 acpi.c
--- acpi.c 19 Feb 2018 08:59:52 -0000 1.340
+++ acpi.c 5 Mar 2018 03:57:59 -0000
@@ -2179,6 +2179,15 @@ acpi_gpe(struct acpi_softc *sc, int gpe,
struct aml_node *node = arg;
uint8_t mask, en;
+ if (!sc->gpe_table[gpe].edge && gpe == 111) {
+ static unsigned short i;
+ if (i == 0) {
+ i++;
+ printf("acpi_gpe %d %s IGNORING\n", gpe, node->name);
+ }
+ } else {
+ printf("acpi_gpe %d %s\n", gpe, node->name);
+
dnprintf(10, "handling GPE %.2x\n", gpe);
aml_evalnode(sc, node, 0, NULL, NULL);
@@ -2187,6 +2196,7 @@ acpi_gpe(struct acpi_softc *sc, int gpe,
acpi_write_pmreg(sc, ACPIREG_GPE_STS, gpe>>3, mask);
en = acpi_read_pmreg(sc, ACPIREG_GPE_EN, gpe>>3);
acpi_write_pmreg(sc, ACPIREG_GPE_EN, gpe>>3, en | mask);
+ }
return (0);
}