On Wed, Oct 12, 2022 at 11:37:05AM +0100, Igor Petruk wrote:
> On Wed, Oct 12, 2022 at 11:33 AM Mike Larkin <[email protected]> wrote:
> >
> > On Wed, Sep 28, 2022 at 10:38:43AM +0200, Sebastian Oswald wrote:
> > > On Tue, 27 Sep 2022 08:03:59 -0700
> > > Mike Larkin <[email protected]> wrote:
> > >
> > > >On Tue, Sep 27, 2022 at 11:02:50AM +0200, Sebastian Oswald wrote:
> > > >> On Mon, 26 Sep 2022 17:57:23 -0700
> > > >> Mike Larkin <[email protected]> wrote:
> > > >>
> > > >> >On Mon, Sep 26, 2022 at 05:40:04PM +0200, Sebastian Oswald wrote:
> > > >> >> >Synopsis:      High interrupt load from acpi0 on Intel N5105 
> > > >> >> >platform
> > > >> >> >Category:      system
> > > >> >> >Environment:
> > > >> >>         System      : OpenBSD 7.1
> > > >> >>         Details     : OpenBSD 7.1 (GENERIC.MP) #465: Mon Apr 11
> > > >> >> 18:03:57 MDT 2022
> > > >> >> [email protected]:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> > > >> >>
> > > >> >>         Architecture: OpenBSD.amd64
> > > >> >>         Machine     : amd64
> > > >> >>
> > > >> >> >Description:
> > > >> >>         On multiple (3), freshly installed systems based on Jasper 
> > > >> >> Lake
> > > >> >>         Celeron N5105 platform, CPU0 has high interrupt rate at 
> > > >> >> idle.
> > > >> >>
> > > >> >> >How-To-Repeat:
> > > >> >>         Installed 7.1 from current usb image, reboot.
> > > >> >>
> > > >> >> # top | head -n6
> > > >> >> load averages:  0.99,  0.97,  0.92    a-vpn1.gassner.lan 17:38:58
> > > >> >> 26 processes: 25 idle, 1 on processor  up  8:01
> > > >> >> CPU0 states:  0.0% user,  0.0% nice, 14.3% sys,  0.5% spin, 77.0% 
> > > >> >> intr,  8.3% idle
> > > >> >> CPU1 states:  0.0% user,  0.0% nice,  0.0% sys,  0.0% spin,  0.0% 
> > > >> >> intr, 99.9% idle
> > > >> >> CPU2 states:  0.0% user,  0.0% nice,  0.0% sys,  0.0% spin,  0.0% 
> > > >> >> intr, 99.9% idle
> > > >> >> CPU3 states:  0.1% user,  0.0% nice,  0.0% sys,  0.1% spin,  0.0% 
> > > >> >> intr, 99.9% idle
> > > >> >>
> > > >> >>
> > > >> >> This output is from a freshly rebooted system; rates for irq96/acpi 
> > > >> >> are
> > > >> >> always way above 8000:
> > > >> >>
> > > >> >> # vmstat -i
> > > >> >> interrupt                       total     rate
> > > >> >> irq0/clock                      20105      394
> > > >> >> irq0/ipi                         8656      169
> > > >> >> irq144/com0                        86        1
> > > >> >> irq96/acpi0                    445306     8731
> > > >> >> irq145/inteldrm0                 1137       22
> > > >> >> irq100/nvme0                    33913      664
> > > >> >> irq114/igc0:0                      74        1
> > > >> >> irq115/igc0:1                     222        4
> > > >> >> irq116/igc0:2                      41        0
> > > >> >> irq117/igc0:3                      34        0
> > > >> >> irq118/igc0                         2        0
> > > >> >> Total                          509576     9991
> > > >> >>
> > > >> >
> > > >> >Could be stuck GPE.
> > > >> >
> > > >> >In acpi.c, around line 2273:
> > > >> >
> > > >> >        dnprintf(10, "handling GPE %.2x\n", gpe);
> > > >> >
> > > >> >change that to
> > > >> >
> > > >> >        printf("handling GPE %.2x\n", gpe);
> > > >> >
> > > >> >And see which GPE keeps firing. It's likely gonna make the system 
> > > >> >somewhat
> > > >> >slower since you'll be spamming dmesg like crazy.
> > > >> >
> > > >> >then report back what GPE you found firing.
> > > >>
> > > >> Thank you for the quick reply.
> > > >>
> > > >> With that patch applied, immediately during boot stdout gets spammed
> > > >> with "handling GPE 6f".
> > > >>
> > > >> From doing a quick search, this seems to be usually caused by a broken
> > > >> ACPI implementation on the BIOS side?
> > > >> I already contacted the vendor to check for a newer BIOS version.
> > > >>
> > > >> In the meantime or if there isn't any patched BIOS available, is there
> > > >> a way to find out what event '6f' correlates to and disable/ignore
> > > >> handling of that interrupt?
> > > >>
> > > >
> > > ><snip>
> > > >
> > > >Seems to be a common problem with this machine, not only on OpenBSD. 
> > > >Google
> > > >_L6F GPE AL6F and you'll see that everyone else with the issue needed to
> > > >hack their AML or get a BIOS update. Looks like shoddy AML from AMIbios.
> > > >
> > > >If you want to disable it, you'll need to do that in the GPE handler in
> > > >acpi.c.
> > > >
> > > >-ml
> > >
> > >
> > > Yes, I also found a bunch on this topic, usually for other cheap
> > > Mainboards (mostly asrock). I don't have high hopes to get a
> > > patched BIOS from the vendor of those appliances, so I started looking
> > > into ways of 'fixing' (ignoring) that GPE on the OS side.
> > > Apparently most OSes have some way to override the DSDT; e.g.
> > > FreeBSD can override the AML at boot pretty easily:
> > > https://docs.freebsd.org/en/books/handbook/config/#_overriding_the_default_aml
> > > Is there any such mechanism in OpenBSD?
> >
> > no
> >
> > > /var/db/acpi/DSDT.2 on these systems actually contains the same code as
> > > mentioned here:
> > > https://forums.freebsd.org/threads/disabling-gpe6-gpe-flooding-prevention.56963/#post-324358
> > > (interestingly, FreeBSD doesn't show the same behavior; total interrupt
> > > rate according to 'vmstat -i' is <100 at idle)
> > >
> > > Otherwise, how could disabling that GPE in acpi.c look like?
> > > Sorry to bother you with that, I'm merely a sysadmin with some very
> > > rudimentary coding skills (i.e. I can roughly follow what some code
> > > might be doing as long as it isn't too complex).
> > >
> >
> > I'm not sure this is a beginner task. But you could write a function like
> > acpi_enable_onegpe and instead make it clear the gpe and then call that
> > from acpi_attach.
> >
> > Generally, it's not worth the effort trying to fix broken hardware like 
> > this.
> > Because how do you know there isn't other brokenness elsewhere?
> >
> > > Thanks,
> > > Sebastian
> >
> >
>
> Hi Mike,
>
> When you say acpi_attach, do you mean this one:
> https://github.com/openbsd/src/blob/9f172165b574c19186ae3a65383c7fa8c8839f78/sys/arch/amd64/amd64/acpi_machdep.c#L90
> ?
>
> I am thinking of adding this mitigation to my kernel temporarily.
> Assuming I write `acpi_disable_onegpe`, I am thinking where to call it
> for 0x6F.
>
> Thanks, Igor.
>

Yeah, you could try and put that call at the end of acpi_attach until we can
figure out a better solution. LMK if it works.

Reply via email to