在 2025/7/26 15:51, Lukas Wunner 写道:
On Fri, Jul 25, 2025 at 04:09:21PM -0500, Bjorn Helgaas wrote:
@@ -319,8 +319,7 @@ int pciehp_check_link_status(struct controller *ctrl)
return -1;
}
- pcie_capability_read_word(pdev, PCI_EXP_LNKSTA2, &linksta2);
- __pcie_update_link_speed(ctrl->pcie->port->subordinate, lnk_status,
linksta2);
+ pcie_update_link_speed(ctrl->pcie->port->subordinate, PCIE_HOTPLUG);
It kind of bugs me that the hot-add flow reads LNKSTA three times and
generates both pci_hp_event LINK_UP and link_event tracepoints:
pciehp_handle_presence_or_link_change
link_active = pciehp_check_link_active()
pcie_capability_read_word(PCI_EXP_LNKSTA)
if (link_active)
ctrl_info(ctrl, "Slot(%s): Link Up\n")
This LNKSTA read decides whether to bring up the slot.
It can't be eliminated.
trace_pci_hp_event(PCI_HOTPLUG_LINK_UP)
pciehp_enable_slot
__pciehp_enable_slot
board_added
pciehp_check_link_status
pcie_capability_read_word(PCI_EXP_LNKSTA)
This is sort of a final check whether the link is (still) active
before commencing device enumeration. Doesn't look like it can
safely be eliminated either.
pcie_update_link_speed
pcie_capability_read_word(PCI_EXP_LNKSTA)
pcie_capability_read_word(PCI_EXP_LNKSTA2)
trace_pcie_link_event(<REASON>)
This third register read is introduced by the present patch and is
indeed somewhat a step back, given that pciehp_check_link_status()
currently deliberately calls __pcie_update_link_speed() to pass
the already read LNKSTA value.
Hi Lukas, and Bjorn:
Thanks for the excellent technical analysis!
You're absolutely right. I introduced an unnecessary regression by
adding a third LNKSTA read when pciehp_check_link_status() already has
the LNKSTA value and was specifically designed to pass it to
__pcie_update_link_speed().
I'm wondering if the tracepoint can be moved down to
__pcie_update_link_speed()?
Yes, that's a much better approach. Will fix it in next version.
And maybe we need both a bare LINK_UP event and a link_event with all
the details, but again it seems a little weird to me that there are
two tracepoints when there's really only one event and we know all the
link_event information from the very first LNKSTA read.
One of the reasons is that a "Link Down" event would have to
contain dummy values for link speed etc, so it seemed cleaner
to separate the hotplug event from the link speed event.
Thanks,
Lukas
I agree with Lukas and I completely agree with this separation. The two
tracepoints serve different purposes:
- pci_hp_event: Pure hotplug state changes (LINK_UP/LINK_DOWN,
CARD_PRESENT/CARD_NOT_PRESENT)
- pcie_link_event: Actual link parameter information when meaningful
values exist
For LINK_DOWN events, we don't have meaningful speed/width values, so
forcing them into a single tracepoint would indeed require dummy/invalid
values, making the interface confusing.
Thanks for the clear technical guidance and for catching my regression!
Best regards,
Shuai