[Cc: +Benjamin]

Dear Michal,


Thank you for your reply.

Am 12.05.26 um 10:22 schrieb Michal Pecio:
On Tue, 12 May 2026 08:17:08 +0200, Paul Menzel wrote:>
I honestly don't know what to do with this. I think I would start with
looking whether xhci_shutdown() in the old kernel manages to halt it
successfully or if it also fails, and what's the USBSTS there.

It seems that you can get such information by enabling dynamic debug

    echo 'module xhci_hcd +p' >/proc/dynamic_debug/control

and capturing old kernel's log up to kexec() through a serial cable.

Unfortunately, nothing is logged over the serial console (BMC SOL) after
running `sudo kexec -e` or `sudo systemctl reboot`. I just see:

      [69530.180531343,5] OPAL: Switch to big-endian OS
      [69538.407292205,5] OPAL: Switch to little-endian OS

Which is the OPAL firmware, so it might be involved? No idea, if it
touches the xHCI controller.

So some FW involvement is potentially possible.

BTW, another method of doing kexec is to setup a crash kernel and
then trigger panic with /proc/sysrq-trigger.

This probably won't run xhci_shutdown(). Not sure about OPAL FW.
Is the outcome any different?
Is the motivation to try to not get the OPAL message to rule out any involvement.

I have to check, how to set the crash kernel up.

But strangely no xHCI messages are there –  also after booting with
Petitboot and initialized xHCI controller? No idea, if it points to,
that during kexec or shutdown nothing is power off?

With `sudo systemctl reboot` only the line below are logged:

      [  121.811384] libvirt-guests.sh[3366]: Running guests on default URI:
      [  121.811988] libvirt-guests.sh[3376]: no running guests.
      [ … (systemd service stop notifications)]
      [  136.254846] systemd-shutdown[1]: Waiting for process: watch_ldconfig
      [  218.549684] reboot: Restarting system
      [69760.484679183,5] OPAL: Reboot request...
        3.55778|Ignoring boot flags, incorrect version 0x0
        3.59881|ISTEP  6. 3

Only "reboot: Restarting system" looks like it's kernel. Maybe you need
to tweak loglevel before rebooting or kexecing? Try to get more kernel
messages showing over serial during operation, then kexec.

I actually did set the log level by adding `debug` to the Linux kernel command line, and with

    $ echo 9 | sudo tee /proc/sysrq-trigger
    9

and it was confirmed:

    sysrq: Changing Loglevel
    sysrq: Loglevel set to 9

Unfortunately, no more messages.

As a further data point, adding `ppc_pci_reset_phbs` to the command line also gets xhci_hcd to initialize the TI xHCI host controller:

    $ lspci -nn -s 0021:0d:00.0
0021:0d:00.0 USB controller [0c03]: Texas Instruments TUSB73x0 SuperSpeed USB 3.0 xHCI Host Controller [104c:8241] (rev 02)


    [   14.050249]   Issue PHB reset ...
    […]
    [   19.339822] ehci_hcd: block sizes: qh 144 qtd 96 itd 192 sitd 96
    [   19.339919] ohci_hcd: block sizes: ed 112 td 96
    [   19.340538] xhci_hcd 0021:0d:00.0: xHCI Host Controller

No log `xHCI HW did not halt within 32000 usec status = 0x0` (or 0x10 with the other patch). In `arch/powerpc/platforms/powernv/pci-ioda.c`, reading the comment in `pnv_pci_init_ioda_phb()` suggests, that PHB should be reset also in the kexec case:

        /*
         * If we're running in kdump kernel, the previous kernel never
         * shutdown PCI devices correctly. We already got IODA table
         * cleaned out. So we have to issue PHB reset to stop all PCI
         * transactions from previous kernel. The ppc_pci_reset_phbs
         * kernel parameter will force this reset too. Additionally,
         * if the IODA reset above failed then use a bigger hammer.
         * This can happen if we get a PHB fatal error in very early
         * boot.
         */
        if (is_kdump_kernel() || pci_reset_phbs || rc) {
                pr_info("  Issue PHB reset ...\n");
                pnv_eeh_phb_reset(hose, EEH_RESET_FUNDAMENTAL);
                pnv_eeh_phb_reset(hose, EEH_RESET_DEACTIVATE);
        }

At least, I’d assume that kdump and kexec are similar, that both do not shut down PCI devices? (Commit 361f2a2a1536 (powrpc/powernv: Reset PHB in kdump kernel) from 2024 adds (some) the code above.)


Kind regards,

Paul

Reply via email to