On Thu, Oct 13, 2022 at 6:51 PM Mike Larkin <[email protected]> wrote:
>
> On Wed, Oct 12, 2022 at 11:37:05AM +0100, Igor Petruk wrote:
> > On Wed, Oct 12, 2022 at 11:33 AM Mike Larkin <[email protected]> wrote:
> > >
> > > On Wed, Sep 28, 2022 at 10:38:43AM +0200, Sebastian Oswald wrote:
> > > > On Tue, 27 Sep 2022 08:03:59 -0700
> > > > Mike Larkin <[email protected]> wrote:
> > > >
> > > > >On Tue, Sep 27, 2022 at 11:02:50AM +0200, Sebastian Oswald wrote:
> > > > >> On Mon, 26 Sep 2022 17:57:23 -0700
> > > > >> Mike Larkin <[email protected]> wrote:
> > > > >>
> > > > >> >On Mon, Sep 26, 2022 at 05:40:04PM +0200, Sebastian Oswald wrote:
> > > > >> >> >Synopsis:      High interrupt load from acpi0 on Intel N5105 
> > > > >> >> >platform
> > > > >> >> >Category:      system
> > > > >> >> >Environment:
> > > > >> >>         System      : OpenBSD 7.1
> > > > >> >>         Details     : OpenBSD 7.1 (GENERIC.MP) #465: Mon Apr 11
> > > > >> >> 18:03:57 MDT 2022
> > > > >> >> [email protected]:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> > > > >> >>
> > > > >> >>         Architecture: OpenBSD.amd64
> > > > >> >>         Machine     : amd64
> > > > >> >>
> > > > >> >> >Description:
> > > > >> >>         On multiple (3), freshly installed systems based on 
> > > > >> >> Jasper Lake
> > > > >> >>         Celeron N5105 platform, CPU0 has high interrupt rate at 
> > > > >> >> idle.
> > > > >> >>
> > > > >> >> >How-To-Repeat:
> > > > >> >>         Installed 7.1 from current usb image, reboot.
> > > > >> >>
> > > > >> >> # top | head -n6
> > > > >> >> load averages:  0.99,  0.97,  0.92    a-vpn1.gassner.lan 17:38:58
> > > > >> >> 26 processes: 25 idle, 1 on processor  up  8:01
> > > > >> >> CPU0 states:  0.0% user,  0.0% nice, 14.3% sys,  0.5% spin, 77.0% 
> > > > >> >> intr,  8.3% idle
> > > > >> >> CPU1 states:  0.0% user,  0.0% nice,  0.0% sys,  0.0% spin,  0.0% 
> > > > >> >> intr, 99.9% idle
> > > > >> >> CPU2 states:  0.0% user,  0.0% nice,  0.0% sys,  0.0% spin,  0.0% 
> > > > >> >> intr, 99.9% idle
> > > > >> >> CPU3 states:  0.1% user,  0.0% nice,  0.0% sys,  0.1% spin,  0.0% 
> > > > >> >> intr, 99.9% idle
> > > > >> >>
> > > > >> >>
> > > > >> >> This output is from a freshly rebooted system; rates for 
> > > > >> >> irq96/acpi are
> > > > >> >> always way above 8000:
> > > > >> >>
> > > > >> >> # vmstat -i
> > > > >> >> interrupt                       total     rate
> > > > >> >> irq0/clock                      20105      394
> > > > >> >> irq0/ipi                         8656      169
> > > > >> >> irq144/com0                        86        1
> > > > >> >> irq96/acpi0                    445306     8731
> > > > >> >> irq145/inteldrm0                 1137       22
> > > > >> >> irq100/nvme0                    33913      664
> > > > >> >> irq114/igc0:0                      74        1
> > > > >> >> irq115/igc0:1                     222        4
> > > > >> >> irq116/igc0:2                      41        0
> > > > >> >> irq117/igc0:3                      34        0
> > > > >> >> irq118/igc0                         2        0
> > > > >> >> Total                          509576     9991
> > > > >> >>
> > > > >> >
> > > > >> >Could be stuck GPE.
> > > > >> >
> > > > >> >In acpi.c, around line 2273:
> > > > >> >
> > > > >> >        dnprintf(10, "handling GPE %.2x\n", gpe);
> > > > >> >
> > > > >> >change that to
> > > > >> >
> > > > >> >        printf("handling GPE %.2x\n", gpe);
> > > > >> >
> > > > >> >And see which GPE keeps firing. It's likely gonna make the system 
> > > > >> >somewhat
> > > > >> >slower since you'll be spamming dmesg like crazy.
> > > > >> >
> > > > >> >then report back what GPE you found firing.
> > > > >>
> > > > >> Thank you for the quick reply.
> > > > >>
> > > > >> With that patch applied, immediately during boot stdout gets spammed
> > > > >> with "handling GPE 6f".
> > > > >>
> > > > >> From doing a quick search, this seems to be usually caused by a 
> > > > >> broken
> > > > >> ACPI implementation on the BIOS side?
> > > > >> I already contacted the vendor to check for a newer BIOS version.
> > > > >>
> > > > >> In the meantime or if there isn't any patched BIOS available, is 
> > > > >> there
> > > > >> a way to find out what event '6f' correlates to and disable/ignore
> > > > >> handling of that interrupt?
> > > > >>
> > > > >
> > > > ><snip>
> > > > >
> > > > >Seems to be a common problem with this machine, not only on OpenBSD. 
> > > > >Google
> > > > >_L6F GPE AL6F and you'll see that everyone else with the issue needed 
> > > > >to
> > > > >hack their AML or get a BIOS update. Looks like shoddy AML from 
> > > > >AMIbios.
> > > > >
> > > > >If you want to disable it, you'll need to do that in the GPE handler in
> > > > >acpi.c.
> > > > >
> > > > >-ml
> > > >
> > > >
> > > > Yes, I also found a bunch on this topic, usually for other cheap
> > > > Mainboards (mostly asrock). I don't have high hopes to get a
> > > > patched BIOS from the vendor of those appliances, so I started looking
> > > > into ways of 'fixing' (ignoring) that GPE on the OS side.
> > > > Apparently most OSes have some way to override the DSDT; e.g.
> > > > FreeBSD can override the AML at boot pretty easily:
> > > > https://docs.freebsd.org/en/books/handbook/config/#_overriding_the_default_aml
> > > > Is there any such mechanism in OpenBSD?
> > >
> > > no
> > >
> > > > /var/db/acpi/DSDT.2 on these systems actually contains the same code as
> > > > mentioned here:
> > > > https://forums.freebsd.org/threads/disabling-gpe6-gpe-flooding-prevention.56963/#post-324358
> > > > (interestingly, FreeBSD doesn't show the same behavior; total interrupt
> > > > rate according to 'vmstat -i' is <100 at idle)
> > > >
> > > > Otherwise, how could disabling that GPE in acpi.c look like?
> > > > Sorry to bother you with that, I'm merely a sysadmin with some very
> > > > rudimentary coding skills (i.e. I can roughly follow what some code
> > > > might be doing as long as it isn't too complex).
> > > >
> > >
> > > I'm not sure this is a beginner task. But you could write a function like
> > > acpi_enable_onegpe and instead make it clear the gpe and then call that
> > > from acpi_attach.
> > >
> > > Generally, it's not worth the effort trying to fix broken hardware like 
> > > this.
> > > Because how do you know there isn't other brokenness elsewhere?
> > >
> > > > Thanks,
> > > > Sebastian
> > >
> > >
> >
> > Hi Mike,
> >
> > When you say acpi_attach, do you mean this one:
> > https://github.com/openbsd/src/blob/9f172165b574c19186ae3a65383c7fa8c8839f78/sys/arch/amd64/amd64/acpi_machdep.c#L90
> > ?
> >
> > I am thinking of adding this mitigation to my kernel temporarily.
> > Assuming I write `acpi_disable_onegpe`, I am thinking where to call it
> > for 0x6F.
> >
> > Thanks, Igor.
> >
>
> Yeah, you could try and put that call at the end of acpi_attach until we can
> figure out a better solution. LMK if it works.

Hi,

I have added acpi_disable_onegpe.

It did not quite work in acpi_attach. I assume it is either because later
on all GPEs are enabled and disabled multiple times.
Or maybe because the struct is passed to another thread almost
at the end of acpi_attach.

When I added it after this line, it worked:
https://github.com/openbsd/src/blob/62d244ed99f17c1263ee095bc7d8fa1f61df02fd/sys/dev/acpi/acpi.c#L2665.

Now the OS works very smoothly, as expected. I don't know if
it was an optimal place, just added it somewhere where it made
an approximate sense.

Thanks,
Igor.

Reply via email to