On Tue, Jun 24, 2014 at 2:34 PM, Rajat Jain <[email protected]> wrote:
> Hello,
>
> Sorry, I missed this email.
>
> Please see below.
>
>> -----Original Message-----
>> From: [email protected] [mailto:linux-pci-
>> [email protected]] On Behalf Of Guenter Roeck
>> Sent: Tuesday, June 17, 2014 3:56 PM
>> To: Bjorn Helgaas; Myron Stowe
>> Cc: [email protected]; Kenji Kaneshige; linux-
>> [email protected]; Rajat Jain
>> Subject: Re: [PATCH] PCI: pciehp: Include the Data Link Layer State Changed
>> bit when clearing the Slot Status register's event bits
>>
>> On 06/17/2014 02:07 PM, Bjorn Helgaas wrote:
>> > On Tue, Jun 17, 2014 at 1:27 PM, Myron Stowe
>> <[email protected]> wrote:
>> >> During PCIe hot-plug initialization - pciehp_probe - data structures
>> >> related to slot capabilities are set up.  As part of this set up,
>> >> ISRs are put in place to handle slot events and all event bits are cleared
>> out.
>> >>
>> >> This patch adds the Data Link Layer State Changed
>> >> (PCI_EXP_SLTSTA_DLLSC) Slot Status bit to the event bits that are
>> >> cleared out during initialization.
>> >
>> > I think this is related to e48f1b67f668 ("PCI: pciehp: Use link change
>> > notifications for hot-plug and removal").  Prior to that, pcie_isr()
>> > didn't look at the PCI_EXP_SLTSTA_DLLSC bit.
>> >
>> > Apparently there's a non-public report of spurious messages like this
>> > at boot-time, with no actual hotplug events:
>> >
>> >    pciehp 0000:82:04.0:pcie24: slot(4): Link Up event
>> >    pciehp 0000:82:04.0:pcie24: Device 0000:83:00.0 already exists at
>> > 0000:83:00, cannot hot-add
>> >    pciehp 0000:82:04.0:pcie24: Cannot add device at 0000:83:00
>> >
>>
>> I think I have seen that message once in a while in our systems.
>> Rajat, didn't we talk about this a while ago ?
>
> Essentially my hiccup was that I was not sure whether the driver should or 
> should not take care of the link change events that have happened BEFORE the 
> driver gets loaded. This has more impact if the pciehp is built as a kernel 
> module.
>
> As an example:
>
> It is common for the platform init code built into the kernel, to take the 
> PCI devices on the board out of reset. And that can happen after the PCI 
> enumeration but before the pciehp driver gets loaded. Thus in this condition, 
> with this patch, the pciehp will ignore the linkup event, thus device will 
> not be visible. The only way (other than a rescan) to do hot-add the device 
> would be to apply-and-remove-reset-signal to the device again. At which point 
> pciehp may give a warning about about an attempt to remove a non-existent 
> card, and then will proceed fine with hot-add.

When you saw these problems, was pciehp a module?  We changed it
(c10cc483bf3f in v3.11) so it can't be a module any more, but if there
can still be a significant window between enumeration and pciehp init,
I'd like to understand more about it.

> The patch looks good, only wanted to make sure that we understand and agree 
> that the pciehp should NOT process and link events that have happened before 
> the pciehp was loaded.

Currently (before this patch), we clear the Attention Button Pressed,
Power Fault Detected, MRL Sensor Changed, Presence Detect Changed, and
Command Completed bits, but we *don't* clear Data Link Layer State
Changed.

That asymmetry seems hard to justify.  For example, if a card were
inserted after enumeration but before pciehp is initialized, we'd miss
the PDC indication, so I think we would fail to notice the new device.
That seems basically the same as missing the linkup event in your
example.  In both cases, I think we *should* notice the new device, so
the fact that we don't is a problem.

But since pciehp can't be a module any more, the window between PCI
enumeration and pciehp initialization should be relatively small.  I
think the best way to fix this would be to integrate pciehp more
closely into PCI enumeration, e.g., by doing pcie_init() at the point
where we discover the bridge, before we enumerate any devices below
the bridge.  This is somewhat tangled up with acpihp, so I don't know
how complicated it would be to do this.

So I guess my argument is:

  - Ignoring events that happen before pciehp init decreases our
dependency on BIOS
  - Handling all events consistently is very important
  - We currently have a problem with missing pre-pciehp events, but
there is a way to fix this

> Acked-by: Rajat Jain <[email protected]>

Thanks for taking a look at this!

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to