Re: [PATCH 4/5] pci: reset: follow more closely PCIe's spec for function-level reset

Gustavo Lima Chaves Fri, 12 Jan 2018 13:07:47 -0800

* Jan Kiszka <[email protected]> [2018-01-12 13:13:25 +0100]:

> On 2018-01-12 01:22, Gustavo Lima Chaves wrote:
> > Try our best to follow more closely what the specification mandates for
> > function-level reset (section 6.6.2 of PCI Express Base Specification,
> > Revision 3.0). The previous approach was writing blindly on the command
> > register (setting int. disable bit), but that would lead to an
> > unsupported request on functions that do not expect that right away.
> >
> > We now do checks for FLR support, proceed to the previous command
> > register write, wait for any pending transactions for some cycles (used
> > as a placeholder for the suggested 100ms timeout), raise the 'initiate
> > FLR' bit on the right register, wait on the same kind of loop again and,
> > finally, clear the transactions pending bit forcefully, as mandated.
> >
> > This should leave us with better support for PCI devices out there.
> >
> > Signed-off-by: Gustavo Lima Chaves <[email protected]>
> > ---
> >  hypervisor/include/jailhouse/pci.h |  5 +++
> >  hypervisor/pci.c                   | 82 
> > +++++++++++++++++++++++++++++++++++---
> >  2 files changed, 82 insertions(+), 5 deletions(-)
> >
> > diff --git a/hypervisor/include/jailhouse/pci.h 
> > b/hypervisor/include/jailhouse/pci.h
> > index 6d256613..00d52099 100644
> > --- a/hypervisor/include/jailhouse/pci.h
> > +++ b/hypervisor/include/jailhouse/pci.h
> > @@ -40,7 +40,12 @@
> >  #define PCI_CAP_EXPRESS    0x10
> >
> >  #define PCI_CAP_PCIE               (0x10 | JAILHOUSE_PCI_EXT_CAP)
> > +#define PCIE_DEV_CAPS_REG  0x04
> > +# define PCI_DEV_CAPS_FLR  (1 << 28)
> >  #define PCIE_CONTROL_REG   0x08
> > +# define PCI_CONTROL_INIT_FLR      (1 << 15)
> > +#define PCIE_STATUS_REG            0x0a
> > +# define PCI_STATUS_PENDING        (1 << 5)
> >
> >  #define PCIE_AER_FLAGS     (PCIE_DEVCTL_CERE | PCIE_DEVCTL_NFERE | \
> >                      PCIE_DEVCTL_FERE | PCIE_DEVCTL_URRE)
> > diff --git a/hypervisor/pci.c b/hypervisor/pci.c
> > index 30b733c9..f74f6399 100644
> > --- a/hypervisor/pci.c
> > +++ b/hypervisor/pci.c
> > @@ -21,6 +21,8 @@
> >
> >  #define MSIX_VECTOR_CTRL_DWORD             3
> >
> > +#define PCI_RESET_TIMEOUT 250
>
> Where was this value derived from? What is its unit (it's apparently
> "loops" or "cycles")?


The spec suggests:

  "After an FLR has been initiated by writing a 1b to the Initiate
  Function Level Reset bit, the Function must complete the FLR within
  100 ms. If software initiates an FLR when the Transactions Pending
  bit is 1b, then software must not initialize the Function until
  allowing adequate time for any associated Completions to arrive, or
  to achieve reasonable certainty that any remaining Completions will
  never arrive. For this purpose, it is recommended that software
  allow as much time as provided by the pre-FLR value for Completion
  Timeout on the device. If Completion Timeouts were disabled on the
  Function when FLR was issued, then the delay is system dependent but
  must be no less than 100 ms."

Also:

  "1. Software that's performing the FLR synchronizes with other
  software that might potentially access the Function directly, and
  ensures such accesses do not occur during this algorithm.

  2. Software clears the entire Command register, disabling the
  Function from issuing any new Requests.

  3. Software polls the Transactions Pending bit in the Device Status
  register either until it is clear or until it has been long enough
  that software is reasonably certain that Completions associated with
  any remaining outstanding Transactions will never arrive. On many
  platforms, the Transactions Pending bit will usually clear within a
  few milliseconds, so software might choose to poll during this
  initial period using a tight software loop. On rare cases when the
  Transactions Pending bit does not clear by this time, software will
  need to poll for a much longer platform-specific period (potentially
  seconds), so software might choose to conduct this polling using a
  timer-based interrupt polling mechanism.

  4. Software initiates the FLR.

  5. Software waits 100 ms.

  6. Software reconfigures the Function and enables it for normal
  operation."

Since we don't have timers at hypervisor level, I though it would be
reasonable to start with a cycles count on some arbitrary but fair
amount. Suggestions?

>
> Style: indention.
>
> > +
> >  #define for_each_configured_pci_device(dev, cell)                  \
> >     for ((dev) = (cell)->pci_devices;                               \
> >          (dev) - (cell)->pci_devices < (cell)->config->num_pci_devices; \
> > @@ -618,10 +620,13 @@ void pci_prepare_handover(void)
> >     }
> >  }
> >
> > -void pci_reset_device(struct pci_device *device)
> > +static void pci_reset_device_internal(struct pci_device *device)
> >  {
> >     const struct jailhouse_pci_capability *cap;
> > +   unsigned timeout = PCI_RESET_TIMEOUT;
> > +   bool found = false;
> >     unsigned int n;
> > +   u32 val;
> >
> >     memset(&device->msi_registers, 0, sizeof(device->msi_registers));
> >     for (n = 0; n < device->info->num_msix_vectors; n++) {
> > @@ -635,13 +640,81 @@ void pci_reset_device(struct pci_device *device)
> >             return;
> >     }
> >
> > +   for_each_pci_cap(cap, device, n) {
> > +         if (cap->id == PCI_CAP_PCIE) {
> > +               found = true;
> > +               break;
> > +         }
> > +   }
> > +
> > +   if (!found)
> > +           return;
>
> Another use case for pci_get_cap().

Done.

>
> > +
> > +   val = pci_read_config(device->info->bdf,
> > +                         cap->start + PCIE_DEV_CAPS_REG, 4);
> > +
> > +   if ((val & PCI_DEV_CAPS_FLR) != PCI_DEV_CAPS_FLR) {
> > +           printk("Skipping function level reset for "
> > +                  "device %02x:%02x.%x: no support\n",
> > +                  PCI_BDF_PARAMS(device->info->bdf));
> > +           return;
> > +   }
> > +
> >     /*
> > -    * Silence INTx of the physical device by setting the mask bit.
> > -    * This is a deviation from the specified reset state.
> > +    * "When a 0 is written to this register, the device is
> > +    * logically disconnected from the PCI bus for all accesses
> > +    * except configuration accesses."
> > +    *
> > +    * The bit that we set on that mask will silence INTx of the
> > +    * physical device.
> >      */
> >     pci_write_config(device->info->bdf, PCI_CFG_COMMAND,
> >                      PCI_CMD_INTX_OFF, 2);
> >
> > +   /*
> > +    * Wait for any pending transaction to end. Ideally we should
> > +    * count for 100ms, but we count arbitrary loops for now
> > +    * instead, by lack of timer notion in the hypervisor.
> > +    */
> > +   do {
> > +           val = pci_read_config(device->info->bdf,
> > +                                 cap->start + PCIE_STATUS_REG, 2);
> > +           cpu_relax();
> > +           timeout--;
> > +   } while ((val & PCI_STATUS_PENDING) == PCI_STATUS_PENDING
> > +            || timeout);
>
> while (val & PCI_STATUS_PENDING || timeout > 0)
>
> But what if timeout reaches 0? Just ignore STATUS_PENDING??

Just like the spec suggests, yes, we may abort if the devices takes
too long (that should not happen in first place).

>
> > +
> > +   /*
> > +    * Set "initiate function level reset" bit and wait a little
> > +    * bit more.
> > +    */
> > +   val = pci_read_config(device->info->bdf,
> > +                         cap->start + PCIE_CONTROL_REG, 2);
>
> Redundant, we have the current value already.

>From control reg? Where?

>
> > +   pci_write_config(device->info->bdf, cap->start + PCIE_CONTROL_REG,
> > +                    val & PCI_CONTROL_INIT_FLR, 2);
> > +   timeout = PCI_RESET_TIMEOUT;
> > +   while (timeout) {
> > +           cpu_relax();
> > +           timeout--;
> > +   }
>
> These two timeout loops have totally different time spans because one is
> throttled by PCI accesses while the other is just a trivial counter. I
> suspect we can either drop that second loop completely or have to
> properly throttle it as well.
>
> What does the spec say here?

You're right about the timing differences. What do you suggest to make
them more equal? Non used cap reads inside the second one? Maybe I'll
read the status reg inside it already.

>
> > +
> > +   /*
> > +    * "The Transactions Pending bit must be cleared upon completion
> > +    * of the FLR."
> > +    */
> > +   val = pci_read_config(device->info->bdf,
> > +                         cap->start + PCIE_STATUS_REG, 2);
>
> Again: don't we have the current value already?

The status could have changed. I'll do as stated in the last comment,
for now.

>
> > +   pci_write_config(device->info->bdf, cap->start + PCIE_STATUS_REG,
> > +                    val & ~PCI_STATUS_PENDING, 2);
> > +}
> > +
> > +void pci_reset_device(struct pci_device *device)
> > +{
> > +   const struct jailhouse_pci_capability *cap;
> > +   unsigned int n;
> > +
> > +   pci_reset_device_internal(device);
> > +
> >     for_each_pci_cap(cap, device, n) {
> >             if (cap->id == PCI_CAP_MSI || cap->id == PCI_CAP_MSIX)
> >                     /* Disable MSI/MSI-X by clearing the control word. */
> > @@ -937,7 +1010,6 @@ void pci_shutdown(void)
> >                             pci_restore_msix(device, cap);
> >
> >             if (device->cell != &root_cell)
> > -                   pci_write_config(device->info->bdf, PCI_CFG_COMMAND,
> > -                                    PCI_CMD_INTX_OFF, 2);
> > +                   pci_reset_device_internal(device);
> >     }
> >  }
> >
>
> Jan
>
> --
> Siemens AG, Corporate Technology, CT RDA IOT SES-DE
> Corporate Competence Center Embedded Linux
>
> --
> You received this message because you are subscribed to the Google Groups 
> "Jailhouse" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.

--
Gustavo Lima Chaves
Intel - Open Source Technology Center

-- 
You received this message because you are subscribed to the Google Groups 
"Jailhouse" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [PATCH 4/5] pci: reset: follow more closely PCIe's spec for function-level reset

Reply via email to