On 02/22/2018 04:50 AM, Bjorn Helgaas wrote:
On Wed, Feb 21, 2018 at 04:25:08PM +0530, George Cherian wrote:
On 02/21/2018 03:24 PM, Lukas Wunner wrote:
On Wed, Feb 21, 2018 at 02:58:13PM +0530, George Cherian wrote:
I will explain the setup used
To the Cavium ThunderX RC the following PLX device is connected.
PLX Technology, Inc. PEX 8747 48-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s)
There is no device connected downstream to the PLX switch.
AFAIU the pcie_port driver probes PLX and enters autosuspend after 100ms
since pci_bridge_d3_possible() returns true.
And later pci_sysfs_init() ends up doing a config access of PLX which fails
with a "synchronous external abort"
Thanks for the details!
This one *should* be fixed by this patch:
Any chance you could try that out?
I did try your patch and it works fine on the above failing setup.
Then you're missing a pci_config_pm_runtime_get() in pci_sysfs_init() or
further down in the call stack, rather than a quirk which just papers
over the issue.
I have found another configuration where this fails.
Following is the configuration
1) Connected a PCIe Intel i40 card under the root port.
2) unbind the i40 driver and bind with vfio-pci driver.
3) Run lspci in a loop. "lspci -s xx:xx.xx -vvv"
I get the same synchronous external abort.
In this case the vfio-pci driver probe it moves the device (i40) to
D3hot provided disable_idle_d3 is not set. lspci tries to do
the config_access which fails with synchronous external abort when
the root port transitions to D3hot.
This one sounds like we're missing something in this path:
It *looks* like rpm_resume() should resume parent devices, i.e., the
root port, but I don't know that code at all. Maybe Rafael or Lukas
could confirm that?
pci_config_pm_runtime_get() knows that config space is always
accessible unless the device is in D3cold, so if the target device is
in D3hot, it will leave it there. I assume that if/when rpm_resume()
resumes the parent bridges, it will resume them all the way to D0.
the stack trace for this issue looks like this
I have tried adding pci_config_pm_runtime_get/put pair inside
pci_vpd_read(), which I guess might be needed, in case the device goes
to D3cold. But having said that it didnt fix the problem in our platform.
I'm *really* glad you're finding these issues, because on most
platforms we would just silently read invalid data (all ones) and the
caller would have no idea what's going wrong.