On 9/28/20 4:17 AM, Sinan Kaya wrote:
On 9/27/2020 10:43 PM, Kuppuswamy, Sathyanarayanan wrote:FATAL + no-hotplug - In this case, link will still be reseted. But currently driver state is not properly restored. So I attempted to restore it using pci_reset_bus(). status = reset_link(dev); - if (status != PCI_ERS_RESULT_RECOVERED) { + if (status == PCI_ERS_RESULT_RECOVERED) { + status = PCI_ERS_RESULT_NEED_RESET; ... if (status == PCI_ERS_RESULT_NEED_RESET) { /* - * TODO: Should call platform-specific - * functions to reset slot before calling - * drivers' slot_reset callbacks? + * TODO: Optimize the call to pci_reset_bus() + * + * There are two components to pci_reset_bus(). + * + * 1. Do platform specific slot/bus reset. + * 2. Save/Restore all devices in the bus. + * + * For hotplug capable devices and fatal errors, + * device is already in reset state due to link + * reset. So repeating platform specific slot/bus + * reset via pci_reset_bus() call is redundant. So + * can optimize this logic and conditionally call + * pci_reset_bus(). */ + pci_reset_bus(dev);I think we have to go to remove/rescan for this case as you also mentioned above. There is no state to save. All BAR assignments are gone. Entire device programming is also lost. I don't think pci_reset_bus() can recover from this situation safely. It will make things worse by saving/restoring the hardware default state. This should remove/rescan logic should be inside DPC's slot_reset() function BTW. Not here.
Since there is no state restoration for FATAL errors, I am wondering whether calls to ->error_detected(), ->mmio_enabled() and ->slot_reset() are required? Let me know your comments about following pseudo code. if (fatal error & hotplug_supported) do nothing // if fatal triggered by DPC, clear DPC state. if (fatal error & no-hotplug) perform slot_reset and renumerate affected devices.
-- Sathyanarayanan Kuppuswamy Linux Kernel Developer

