On 3/26/19 10:00 PM, Bjorn Helgaas wrote:
> [+cc Srinath, Marta, LKML]
> 
> On Mon, Mar 11, 2019 at 04:31:03PM +0300, Sergey Miroshnichenko wrote:
>>  CPU0                                      CPU1
>>
>>  pci_enable_device_mem()                   pci_enable_device_mem()
>>    pci_enable_bridge()                       pci_enable_bridge()
>>      pci_is_enabled()
>>        return false;
>>      atomic_inc_return(enable_cnt)
>>      Start actual enabling the bridge
>>      ...                                       pci_is_enabled()
>>      ...                                         return true;
>>      ...                                   Start memory requests <-- FAIL
>>      ...
>>      Set the PCI_COMMAND_MEMORY bit <-- Must wait for this
>>
>> This patch protects the pci_enable/disable_device() and pci_enable_bridge()
>> with mutexes.
> 
> This is a subtle issue that we've tried to fix before, but we've never
> had a satisfactory solution, so I hope you've figured out the right
> fix.
> 
> I'll include some links to previous discussion.  This patch is very
> similar to [2], which we didn't actually apply.  We did apply the
> patch from [3] as 40f11adc7cd9 ("PCI: Avoid race while enabling
> upstream bridges"), but it caused the regressions reported in [4,5],
> so we reverted it with 0f50a49e3008 ("Revert "PCI: Avoid race while
> enabling upstream bridges"").
> 

Thanks for the links, I wasn't aware of these discussions and patches!

On PowerNV this issue is partially hidden by db2173198b95 
("powerpc/powernv/pci: Work
around races in PCI bridge enabling"), and on x86 BIOS pre-initializes all the 
bridges, so
it doesn't reproduce until hotplugging in a hotplugged bridge.

This patch is indeed similar to 40f11adc7cd9 ("PCI: Avoid race while enabling 
upstream
bridges"), but instead of a single static mutex it adds per-device mutexes and 
prevents
the dev->enable_cnt from incrementing too early. So it's not needed anymore to 
carefully
select a moment safe enough to enable the device.

Serge

> I think the underlying design problem is that we have a driver for
> device B calling pci_enable_device(), and it is changing the state of
> device A (an upstream bridge).  The model generally is that a driver
> should only touch the device it is bound to.
> 
> It's tricky to get the locking right when several children of device A
> all need to operate on A.
> 
> That's all to say I'll have to think carefully about this particular
> patch, so I'll go on to the others and come back to this one.
> 
> Bjorn
> 
> [1] 
> https://lore.kernel.org/linux-pci/1494256190-28993-1-git-send-email-srinath.man...@broadcom.com/T/#u
>     [RFC PATCH] pci: Concurrency issue in NVMe Init through PCIe switch
> 
> [2] 
> https://lore.kernel.org/linux-pci/1496135297-19680-1-git-send-email-srinath.man...@broadcom.com/T/#u
>     [RFC PATCH v2] pci: Concurrency issue in NVMe Init through PCIe switch
> 
> [3] 
> https://lore.kernel.org/linux-pci/1501858648-22228-1-git-send-email-srinath.man...@broadcom.com/T/#u
>     [RFC PATCH v3] pci: Concurrency issue during pci enable bridge
> 
> [4] 
> https://lore.kernel.org/linux-pci/150547971091.977464.16294045866179907260.stgit@buzz/T/#u
>     [PATCH bisected regression in 4.14] PCI: fix race while enabling upstream 
> bridges concurrently
> 
> [5] 
> https://lore.kernel.org/linux-wireless/04c9b578-693c-1dc6-9f0f-904580231...@kernel.dk/T/#u
>     iwlwifi firmware load broken in current -git
> 
> [6] 
> https://lore.kernel.org/linux-pci/744877924.5841545.1521630049567.javamail.zim...@kalray.eu/T/#u
>     [RFC PATCH] nvme: avoid race-conditions when enabling devices
> 
>> Signed-off-by: Sergey Miroshnichenko <s.miroshniche...@yadro.com>
>> ---
>>  drivers/pci/pci.c   | 26 ++++++++++++++++++++++----
>>  drivers/pci/probe.c |  1 +
>>  include/linux/pci.h |  1 +
>>  3 files changed, 24 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>> index f006068be209..895201d4c9e6 100644
>> --- a/drivers/pci/pci.c
>> +++ b/drivers/pci/pci.c
>> @@ -1615,6 +1615,8 @@ static void pci_enable_bridge(struct pci_dev *dev)
>>      struct pci_dev *bridge;
>>      int retval;
>>  
>> +    mutex_lock(&dev->enable_mutex);
>> +
>>      bridge = pci_upstream_bridge(dev);
>>      if (bridge)
>>              pci_enable_bridge(bridge);
>> @@ -1622,6 +1624,7 @@ static void pci_enable_bridge(struct pci_dev *dev)
>>      if (pci_is_enabled(dev)) {
>>              if (!dev->is_busmaster)
>>                      pci_set_master(dev);
>> +            mutex_unlock(&dev->enable_mutex);
>>              return;
>>      }
>>  
>> @@ -1630,11 +1633,14 @@ static void pci_enable_bridge(struct pci_dev *dev)
>>              pci_err(dev, "Error enabling bridge (%d), continuing\n",
>>                      retval);
>>      pci_set_master(dev);
>> +    mutex_unlock(&dev->enable_mutex);
>>  }
>>  
>>  static int pci_enable_device_flags(struct pci_dev *dev, unsigned long flags)
>>  {
>>      struct pci_dev *bridge;
>> +    /* Enable-locking of bridges is performed within the 
>> pci_enable_bridge() */
>> +    bool need_lock = !dev->subordinate;
>>      int err;
>>      int i, bars = 0;
>>  
>> @@ -1650,8 +1656,13 @@ static int pci_enable_device_flags(struct pci_dev 
>> *dev, unsigned long flags)
>>              dev->current_state = (pmcsr & PCI_PM_CTRL_STATE_MASK);
>>      }
>>  
>> -    if (atomic_inc_return(&dev->enable_cnt) > 1)
>> +    if (need_lock)
>> +            mutex_lock(&dev->enable_mutex);
>> +    if (pci_is_enabled(dev)) {
>> +            if (need_lock)
>> +                    mutex_unlock(&dev->enable_mutex);
>>              return 0;               /* already enabled */
>> +    }
>>  
>>      bridge = pci_upstream_bridge(dev);
>>      if (bridge)
>> @@ -1666,8 +1677,10 @@ static int pci_enable_device_flags(struct pci_dev 
>> *dev, unsigned long flags)
>>                      bars |= (1 << i);
>>  
>>      err = do_pci_enable_device(dev, bars);
>> -    if (err < 0)
>> -            atomic_dec(&dev->enable_cnt);
>> +    if (err >= 0)
>> +            atomic_inc(&dev->enable_cnt);
>> +    if (need_lock)
>> +            mutex_unlock(&dev->enable_mutex);
>>      return err;
>>  }
>>  
>> @@ -1910,15 +1923,20 @@ void pci_disable_device(struct pci_dev *dev)
>>      if (dr)
>>              dr->enabled = 0;
>>  
>> +    mutex_lock(&dev->enable_mutex);
>>      dev_WARN_ONCE(&dev->dev, atomic_read(&dev->enable_cnt) <= 0,
>>                    "disabling already-disabled device");
>>  
>> -    if (atomic_dec_return(&dev->enable_cnt) != 0)
>> +    if (atomic_dec_return(&dev->enable_cnt) != 0) {
>> +            mutex_unlock(&dev->enable_mutex);
>>              return;
>> +    }
>>  
>>      do_pci_disable_device(dev);
>>  
>>      dev->is_busmaster = 0;
>> +
>> +    mutex_unlock(&dev->enable_mutex);
>>  }
>>  EXPORT_SYMBOL(pci_disable_device);
>>  
>> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
>> index 2ec0df04e0dc..977a127ce791 100644
>> --- a/drivers/pci/probe.c
>> +++ b/drivers/pci/probe.c
>> @@ -2267,6 +2267,7 @@ struct pci_dev *pci_alloc_dev(struct pci_bus *bus)
>>      INIT_LIST_HEAD(&dev->bus_list);
>>      dev->dev.type = &pci_dev_type;
>>      dev->bus = pci_bus_get(bus);
>> +    mutex_init(&dev->enable_mutex);
>>  
>>      return dev;
>>  }
>> diff --git a/include/linux/pci.h b/include/linux/pci.h
>> index 77448215ef5b..cb2760a31fe2 100644
>> --- a/include/linux/pci.h
>> +++ b/include/linux/pci.h
>> @@ -419,6 +419,7 @@ struct pci_dev {
>>      unsigned int    no_vf_scan:1;           /* Don't scan for VFs after IOV 
>> enablement */
>>      pci_dev_flags_t dev_flags;
>>      atomic_t        enable_cnt;     /* pci_enable_device has been called */
>> +    struct mutex    enable_mutex;
>>  
>>      u32             saved_config_space[16]; /* Config space saved at 
>> suspend time */
>>      struct hlist_head saved_cap_space;
>> -- 
>> 2.20.1
>>

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to