On Sat, Jan 11, 2020 at 08:16:23PM -0500, James Hastings wrote:
> > On Sat, Jan 11, 2020 at 12:42:50 +0100, Mark Kettenis wrote:
> > > Date: Fri, 10 Jan 2020 13:37:38 -0500
> > > From: Bryan Steele <[email protected]>
> > >
> > > On Fri, Jan 10, 2020 at 02:59:10AM -0500, James Hastings wrote:
> > > > > On Sat, Jan 04, 2020 at 06:23:44PM +0100, Mark Kettenis wrote:
> > > > > > Date: Sat, 4 Jan 2020 12:03:14 -0500
> > > > > > From: Bryan Steele <[email protected]>
> > > > > >
> > > > > > On Sat, Jan 04, 2020 at 11:30:56AM -0500, Bryan Steele wrote:
> > > > > > > On Sat, Jan 04, 2020 at 05:07:04PM +0100, Mark Kettenis wrote:
> > > > > > > > > Date: Sat, 4 Jan 2020 10:52:24 -0500
> > > > > > > > > From: Bryan Steele <[email protected]>
> > > > > > > > >
> > > > > > > > > I noticed an unusually high interrupt rate for amdgpio0 on my
> > > > > > > > > Huawei
> > > > > > > > > Matebook D laptop. I'm suspecting this may be partially why
> > > > > > > > > it apmd -A
> > > > > > > > > has been struggling, as the CPU is constantly busy so it
> > > > > > > > > never has a
> > > > > > > > > chance to scale down.
> > > > > > > > >
> > > > > > > > > Any ideas?
> > > > > > > >
> > > > > > > > Please send acpidump output (all files in /var/db/acpi).
> > > > > > > >
> > > > > > > > Try to figure out which GPIO pin is causing the interrupt.
> > > > > > > > That may
> > > > > > > > be tricky since the interrupt fires again and again, so if you
> > > > > > > > add a
> > > > > > > > printf in amdgpio_intr() your machine will become unusable.
> > > > > > > > Maybe
> > > > > > > > just print the pin every 10000 times:
> > > > > > > >
> > > > > > > > static count = 0;
> > > > > > > >
> > > > > > > > ....
> > > > > > > >
> > > > > > > > if ((count++ % 10000) == 0)
> > > > > > > > printf("st %llx\n", status)
> > > > > > > >
> > > > > > > > Cheers,
> > > > > > > >
> > > > > > > > Mark
> > > > > > >
> > > > > > > Thanks, for some reason it vanished with on a cold boot. I'm not
> > > > > > > sure
> > > > > > > what it was that caused it. In case it helps, sending the acpidump
> > > > > > > anyway.
> > > > > > >
> > > > > > > If I can figure out what caused it, I'll try your other
> > > > > > > suggestion.
> > > > > > >
> > > > > > > Sigh.. :-)
> > > > > >
> > > > > > Aha! I must of accidentally bumped the Touchscreen at some point,
> > > > > > doing
> > > > > > that causes the amdgpio0 rate to spike.
> > > > > >
> > > > > > I had sent a diff to add AMD controller support to dwiic(4) months
> > > > > > ago,
> > > > > > I could never get interrupts to work, only polling mode. Perhaps
> > > > > > this
> > > > > > issue explains some of that. I don't have this diff in my tree at
> > > > > > the
> > > > > > moment, had to restore from backup.
> > > > > >
> > > > > > Managed to login and type shutdown, lol.
> > > > > >
> > > > > > ..
> > > > > > st 1f00000000000008
> > > > >
> > > > > Very likely. The AML defines an I2C device with:
> > > > >
> > > > > Name (_DDN, "Raydium Touchscreen") // _DDN: DOS Device
> > > > > Name
> > > > >
> > > > > that uses a GPIO interrupt that matches the lowest bit set in the
> > > > > status register.
> > > > >
> > > > > This suggest we may need to be a little bit more careful and mask
> > > > > interrupts for which we don't have an interrupt handler.
> > > > That is my fault. Does this diff stop your interrupt storm?
> > >
> > > Yes. This works, thanks!
> > >
> > > ok brynet@ (if Mark agrees)
> >
> > I fear that this isn't the right approach. Some of the GPIO pins
> > might be used for SMIs. And it isn't clear to me whether disabling
> > interrupts will also stop SMIs from being generated.
> >
> > A better strategy would be to have the interrupt handler disable pins
> > for which it sees an interrupt pending when there is no interrupt
> > handler registered. I believe that is what Linux does.
> >
> > James, is that something you'd like to work on?
> >
> > Also, I don't think that this will fix the issue that claudo@ and I
> > are seeing on the m715q. There I'm starting to suspect that the
> > problem is that the interrupt is shared with a quirky PCI device.
> > There a BIOS update (which stops amdgpio(4) from attaching) may
> > actually be the only reasonable fix.
> >
> > Cheers,
> >
> > Mark
>
> I changed the interrupt routine to mask pins that do not have an
> interrupt handler registered.
no problems on t495 with
amdgpio0 at acpi0: GPIO uid 0 addr 0xfed81500/0x400 irq 7, 184 pins
>
> Index: dev/acpi/amdgpio.c
> ===================================================================
> RCS file: /cvs/src/sys/dev/acpi/amdgpio.c,v
> retrieving revision 1.1
> diff -u -p -u -r1.1 amdgpio.c
> --- dev/acpi/amdgpio.c 23 Dec 2019 08:05:42 -0000 1.1
> +++ dev/acpi/amdgpio.c 11 Jan 2020 23:56:02 -0000
> @@ -260,20 +260,28 @@ int
> amdgpio_pin_intr(struct amdgpio_softc *sc, int pin)
> {
> uint32_t reg;
> + int rc = 0;
>
> reg = bus_space_read_4(sc->sc_memt, sc->sc_memh, pin * 4);
> - if (!(reg & AMDGPIO_CONF_INT_STS) ||
> - !(reg & AMDGPIO_CONF_INT_MASK))
> - return 0;
> + if (reg & AMDGPIO_CONF_INT_STS) {
> + if (sc->sc_pin_ih[pin].ih_func) {
> + sc->sc_pin_ih[pin].ih_func(sc->sc_pin_ih[pin].ih_arg);
>
> - if (sc->sc_pin_ih[pin].ih_func)
> - sc->sc_pin_ih[pin].ih_func(sc->sc_pin_ih[pin].ih_arg);
> + /* Clear interrupt */
> + reg = bus_space_read_4(sc->sc_memt, sc->sc_memh,
> + pin * 4);
> + bus_space_write_4(sc->sc_memt, sc->sc_memh,
> + pin * 4, reg);
> + rc = 1;
> + } else {
> + /* Mask unhandled interrupt */
> + reg &= ~(AMDGPIO_CONF_INT_MASK | AMDGPIO_CONF_INT_EN);
> + bus_space_write_4(sc->sc_memt, sc->sc_memh,
> + pin * 4, reg);
> + }
> + }
>
> - /* Clear interrupt */
> - reg = bus_space_read_4(sc->sc_memt, sc->sc_memh, pin * 4);
> - bus_space_write_4(sc->sc_memt, sc->sc_memh, pin * 4, reg);
> -
> - return 1;
> + return rc;
> }
>
> int
>
>