* Jan Kiszka <[email protected]> [2018-01-12 13:13:25 +0100]:
> On 2018-01-12 01:22, Gustavo Lima Chaves wrote:
> > Try our best to follow more closely what the specification mandates for
> > function-level reset (section 6.6.2 of PCI Express Base Specification,
> > Revision 3.0). The previous approach was writing blindly on the command
> > register (setting int. disable bit), but that would lead to an
> > unsupported request on functions that do not expect that right away.
> >
> > We now do checks for FLR support, proceed to the previous command
> > register write, wait for any pending transactions for some cycles (used
> > as a placeholder for the suggested 100ms timeout), raise the 'initiate
> > FLR' bit on the right register, wait on the same kind of loop again and,
> > finally, clear the transactions pending bit forcefully, as mandated.
> >
> > This should leave us with better support for PCI devices out there.
> >
> > Signed-off-by: Gustavo Lima Chaves <[email protected]>
> > ---
> > hypervisor/include/jailhouse/pci.h | 5 +++
> > hypervisor/pci.c | 82
> > +++++++++++++++++++++++++++++++++++---
> > 2 files changed, 82 insertions(+), 5 deletions(-)
> >
> > diff --git a/hypervisor/include/jailhouse/pci.h
> > b/hypervisor/include/jailhouse/pci.h
> > index 6d256613..00d52099 100644
> > --- a/hypervisor/include/jailhouse/pci.h
> > +++ b/hypervisor/include/jailhouse/pci.h
> > @@ -40,7 +40,12 @@
> > #define PCI_CAP_EXPRESS 0x10
> >
> > #define PCI_CAP_PCIE (0x10 | JAILHOUSE_PCI_EXT_CAP)
> > +#define PCIE_DEV_CAPS_REG 0x04
> > +# define PCI_DEV_CAPS_FLR (1 << 28)
> > #define PCIE_CONTROL_REG 0x08
> > +# define PCI_CONTROL_INIT_FLR (1 << 15)
> > +#define PCIE_STATUS_REG 0x0a
> > +# define PCI_STATUS_PENDING (1 << 5)
> >
> > #define PCIE_AER_FLAGS (PCIE_DEVCTL_CERE | PCIE_DEVCTL_NFERE | \
> > PCIE_DEVCTL_FERE | PCIE_DEVCTL_URRE)
> > diff --git a/hypervisor/pci.c b/hypervisor/pci.c
> > index 30b733c9..f74f6399 100644
> > --- a/hypervisor/pci.c
> > +++ b/hypervisor/pci.c
> > @@ -21,6 +21,8 @@
> >
> > #define MSIX_VECTOR_CTRL_DWORD 3
> >
> > +#define PCI_RESET_TIMEOUT 250
>
> Where was this value derived from? What is its unit (it's apparently
> "loops" or "cycles")?
The spec suggests:
"After an FLR has been initiated by writing a 1b to the Initiate
Function Level Reset bit, the Function must complete the FLR within
100 ms. If software initiates an FLR when the Transactions Pending
bit is 1b, then software must not initialize the Function until
allowing adequate time for any associated Completions to arrive, or
to achieve reasonable certainty that any remaining Completions will
never arrive. For this purpose, it is recommended that software
allow as much time as provided by the pre-FLR value for Completion
Timeout on the device. If Completion Timeouts were disabled on the
Function when FLR was issued, then the delay is system dependent but
must be no less than 100 ms."
Also:
"1. Software that's performing the FLR synchronizes with other
software that might potentially access the Function directly, and
ensures such accesses do not occur during this algorithm.
2. Software clears the entire Command register, disabling the
Function from issuing any new Requests.
3. Software polls the Transactions Pending bit in the Device Status
register either until it is clear or until it has been long enough
that software is reasonably certain that Completions associated with
any remaining outstanding Transactions will never arrive. On many
platforms, the Transactions Pending bit will usually clear within a
few milliseconds, so software might choose to poll during this
initial period using a tight software loop. On rare cases when the
Transactions Pending bit does not clear by this time, software will
need to poll for a much longer platform-specific period (potentially
seconds), so software might choose to conduct this polling using a
timer-based interrupt polling mechanism.
4. Software initiates the FLR.
5. Software waits 100 ms.
6. Software reconfigures the Function and enables it for normal
operation."
Since we don't have timers at hypervisor level, I though it would be
reasonable to start with a cycles count on some arbitrary but fair
amount. Suggestions?
>
> Style: indention.
>
> > +
> > #define for_each_configured_pci_device(dev, cell) \
> > for ((dev) = (cell)->pci_devices; \
> > (dev) - (cell)->pci_devices < (cell)->config->num_pci_devices; \
> > @@ -618,10 +620,13 @@ void pci_prepare_handover(void)
> > }
> > }
> >
> > -void pci_reset_device(struct pci_device *device)
> > +static void pci_reset_device_internal(struct pci_device *device)
> > {
> > const struct jailhouse_pci_capability *cap;
> > + unsigned timeout = PCI_RESET_TIMEOUT;
> > + bool found = false;
> > unsigned int n;
> > + u32 val;
> >
> > memset(&device->msi_registers, 0, sizeof(device->msi_registers));
> > for (n = 0; n < device->info->num_msix_vectors; n++) {
> > @@ -635,13 +640,81 @@ void pci_reset_device(struct pci_device *device)
> > return;
> > }
> >
> > + for_each_pci_cap(cap, device, n) {
> > + if (cap->id == PCI_CAP_PCIE) {
> > + found = true;
> > + break;
> > + }
> > + }
> > +
> > + if (!found)
> > + return;
>
> Another use case for pci_get_cap().
Done.
>
> > +
> > + val = pci_read_config(device->info->bdf,
> > + cap->start + PCIE_DEV_CAPS_REG, 4);
> > +
> > + if ((val & PCI_DEV_CAPS_FLR) != PCI_DEV_CAPS_FLR) {
> > + printk("Skipping function level reset for "
> > + "device %02x:%02x.%x: no support\n",
> > + PCI_BDF_PARAMS(device->info->bdf));
> > + return;
> > + }
> > +
> > /*
> > - * Silence INTx of the physical device by setting the mask bit.
> > - * This is a deviation from the specified reset state.
> > + * "When a 0 is written to this register, the device is
> > + * logically disconnected from the PCI bus for all accesses
> > + * except configuration accesses."
> > + *
> > + * The bit that we set on that mask will silence INTx of the
> > + * physical device.
> > */
> > pci_write_config(device->info->bdf, PCI_CFG_COMMAND,
> > PCI_CMD_INTX_OFF, 2);
> >
> > + /*
> > + * Wait for any pending transaction to end. Ideally we should
> > + * count for 100ms, but we count arbitrary loops for now
> > + * instead, by lack of timer notion in the hypervisor.
> > + */
> > + do {
> > + val = pci_read_config(device->info->bdf,
> > + cap->start + PCIE_STATUS_REG, 2);
> > + cpu_relax();
> > + timeout--;
> > + } while ((val & PCI_STATUS_PENDING) == PCI_STATUS_PENDING
> > + || timeout);
>
> while (val & PCI_STATUS_PENDING || timeout > 0)
>
> But what if timeout reaches 0? Just ignore STATUS_PENDING??
Just like the spec suggests, yes, we may abort if the devices takes
too long (that should not happen in first place).
>
> > +
> > + /*
> > + * Set "initiate function level reset" bit and wait a little
> > + * bit more.
> > + */
> > + val = pci_read_config(device->info->bdf,
> > + cap->start + PCIE_CONTROL_REG, 2);
>
> Redundant, we have the current value already.
>From control reg? Where?
>
> > + pci_write_config(device->info->bdf, cap->start + PCIE_CONTROL_REG,
> > + val & PCI_CONTROL_INIT_FLR, 2);
> > + timeout = PCI_RESET_TIMEOUT;
> > + while (timeout) {
> > + cpu_relax();
> > + timeout--;
> > + }
>
> These two timeout loops have totally different time spans because one is
> throttled by PCI accesses while the other is just a trivial counter. I
> suspect we can either drop that second loop completely or have to
> properly throttle it as well.
>
> What does the spec say here?
You're right about the timing differences. What do you suggest to make
them more equal? Non used cap reads inside the second one? Maybe I'll
read the status reg inside it already.
>
> > +
> > + /*
> > + * "The Transactions Pending bit must be cleared upon completion
> > + * of the FLR."
> > + */
> > + val = pci_read_config(device->info->bdf,
> > + cap->start + PCIE_STATUS_REG, 2);
>
> Again: don't we have the current value already?
The status could have changed. I'll do as stated in the last comment,
for now.
>
> > + pci_write_config(device->info->bdf, cap->start + PCIE_STATUS_REG,
> > + val & ~PCI_STATUS_PENDING, 2);
> > +}
> > +
> > +void pci_reset_device(struct pci_device *device)
> > +{
> > + const struct jailhouse_pci_capability *cap;
> > + unsigned int n;
> > +
> > + pci_reset_device_internal(device);
> > +
> > for_each_pci_cap(cap, device, n) {
> > if (cap->id == PCI_CAP_MSI || cap->id == PCI_CAP_MSIX)
> > /* Disable MSI/MSI-X by clearing the control word. */
> > @@ -937,7 +1010,6 @@ void pci_shutdown(void)
> > pci_restore_msix(device, cap);
> >
> > if (device->cell != &root_cell)
> > - pci_write_config(device->info->bdf, PCI_CFG_COMMAND,
> > - PCI_CMD_INTX_OFF, 2);
> > + pci_reset_device_internal(device);
> > }
> > }
> >
>
> Jan
>
> --
> Siemens AG, Corporate Technology, CT RDA IOT SES-DE
> Corporate Competence Center Embedded Linux
>
> --
> You received this message because you are subscribed to the Google Groups
> "Jailhouse" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.
--
Gustavo Lima Chaves
Intel - Open Source Technology Center
--
You received this message because you are subscribed to the Google Groups
"Jailhouse" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.