On 11/12/25 9:15 PM, Timothy Pearson wrote:

----- Original Message -----
From: "Narayana Murty N" <[email protected]>
To: "mahesh" <[email protected]>, "Oliver" <[email protected]>, "Madhavan Srinivasan" 
<[email protected]>, "Michael
Ellerman" <[email protected]>, "npiggin" <[email protected]>, "christophe leroy" 
<[email protected]>
Cc: "Bjorn Helgaas" <[email protected]>, "Timothy Pearson" 
<[email protected]>, "linuxppc-dev"
<[email protected]>, "linux-kernel" <[email protected]>, 
"vaibhav" <[email protected]>,
"Shivaprasad G Bhat" <[email protected]>, [email protected]
Sent: Wednesday, December 10, 2025 8:25:59 AM
Subject: [PATCH v2 1/1] powerpc/eeh: fix recursive pci_lock_rescan_remove 
locking in EEH event handling
The recent commit 1010b4c012b0 ("powerpc/eeh: Make EEH driver device
hotplug safe") restructured the EEH driver to improve synchronization
with the PCI hotplug layer.

However, it inadvertently moved pci_lock_rescan_remove() outside its
intended scope in eeh_handle_normal_event(), leading to broken PCI
error reporting and improper EEH event triggering. Specifically,
eeh_handle_normal_event() acquired pci_lock_rescan_remove() before
calling eeh_pe_bus_get(), but eeh_pe_bus_get() itself attempts to
acquire the same lock internally, causing nested locking and disrupting
normal EEH event handling paths.

This patch adds a boolean parameter do_lock to _eeh_pe_bus_get(),
with two public wrappers:
    eeh_pe_bus_get() with locking enabled.
    eeh_pe_bus_get_nolock() that skips locking.

Callers that already hold pci_lock_rescan_remove() now use
eeh_pe_bus_get_nolock() to avoid recursive lock acquisition.

Additionally, pci_lock_rescan_remove() calls are restored to the correct
position—after eeh_pe_bus_get() and immediately before iterating affected
PEs and devices. This ensures EEH-triggered PCI removes occur under proper
bus rescan locking without recursive lock contention.

The eeh_pe_loc_get() function has been split into two functions:
    eeh_pe_loc_get(struct eeh_pe *pe) which retrieves the loc for given PE.
    eeh_pe_loc_get_bus(struct pci_bus *bus) which retrieves the location
    code for given bus.
Conceptually the patch sounds OK, but given the complexity of these subsystems 
it's difficult to forsee all interactions.  Was the patch verified not to break 
NVMe hotplug on PowerNV systems using actual hardware?  If not, I will need to 
do so before sending an ack.  Thanks!
It has not been specifically tested for NVMe hotplug on PowerNV hardware.

However, this change does not remove or relax any of the existing locking

around EEH handling, so the NVMe hotplug paths should continue to see

the same serialization as before.

If you have a convenient setup for NVMe hotplug on PowerNV, additional testing

there would definitely be helpful before merging.

Thanks,
Narayana Murty


Reply via email to