On Fri, 2014-03-28 at 12:58 -0400, Konrad Rzeszutek Wilk wrote:
> On Wed, Mar 26, 2014 at 04:09:21PM -0600, Alex Williamson wrote:
> > On Wed, 2014-03-26 at 10:21 -0600, Alex Williamson wrote:
> > > On Wed, 2014-03-26 at 23:06 +0800, Alexander Graf wrote:
> > > > 
> > > > > Am 26.03.2014 um 22:40 schrieb Konrad Rzeszutek Wilk 
> > > > > <konrad.w...@oracle.com>:
> > > > > 
> > > > >> On Wed, Mar 26, 2014 at 01:40:32AM +0000, Stuart Yoder wrote:
> > > > >> Hi Greg,
> > > > >> 
> > > > >> We (Linaro, Freescale, Virtual Open Systems) are trying get an issue
> > > > >> closed that has been perculating for a while around creating a 
> > > > >> mechanism
> > > > >> that will allow kernel drivers like vfio can bind to devices of any 
> > > > >> type.
> > > > >> 
> > > > >> This thread with you:
> > > > >> http://www.spinics.net/lists/kvm-arm/msg08370.html
> > > > >> ...seems to have died out, so am trying to get your response
> > > > >> and will summarize again.  Vfio drivers in the kernel (regardless of
> > > > >> bus type) need to bind to devices of any type.  The driver's function
> > > > >> is to simply export hardware resources of any type to user space.
> > > > >> 
> > > > >> There are several approaches that have been proposed:
> > > > > 
> > > > > You seem to have missed the one I proposed.
> > > > >> 
> > > > >>   1.  new_id -- (current approach) the user explicitly registers
> > > > >>       each new device type with the vfio driver using the new_id
> > > > >>       mechanism.
> > > > >> 
> > > > >>       Problem: multiple drivers will be resident that handle the
> > > > >>       same device type...and there is nothing user space hotplug
> > > > >>       infrastructure can do to help.
> > > > >> 
> > > > >>   2.  "any id" -- the vfio driver could specify a wildcard match
> > > > >>       of some kind in its ID match table which would allow it to
> > > > >>       match and bind to any possible device id.  However,
> > > > >>       we don't want the vfio driver grabbing _all_ devices...just 
> > > > >> the ones we
> > > > >>       explicitly want to pass to user space.
> > > > >> 
> > > > >>       The proposed patch to support this was to create a new flag
> > > > >>       "sysfs_bind_only" in struct device_driver.  When this flag
> > > > >>       is set, the driver can only bind to devices via the sysfs
> > > > >>       bind file.  This would allow the wildcard match to work.
> > > > >> 
> > > > >>       Patch is here:
> > > > >>       https://lkml.org/lkml/2013/12/3/253
> > > > >> 
> > > > >>   3.  "Driver initiated explicit bind" -- with this approach the
> > > > >>       vfio driver would create a private 'bind' sysfs object
> > > > >>       and the user would echo the requested device into it:
> > > > >> 
> > > > >>       echo 0001:03:00.0 > /sys/bus/pci/drivers/vfio-pci/vfio_bind
> > > > >> 
> > > > >>       In order to make that work, the driver would need to call
> > > > >>       driver_probe_device() and thus we need this patch:
> > > > >>       https://lkml.org/lkml/2014/2/8/175
> > > > > 
> > > > > 4). Use the 'unbind' (from the original device) and 'bind' to vfio 
> > > > > driver.
> > > > 
> > > > This is approach 2, no?
> > > > 
> > > > > 
> > > > > Which I think is what is currently being done. Why is that not 
> > > > > sufficient?
> > > > 
> > > > How would 'bind to vfio driver' look like?
> > > > 
> > > > > The only thing I see in the URL is " That works, but it is ugly."
> > > > > There is some mention of race but I don't see how - if you do the 
> > > > > 'unbind'
> > > > > on the original driver and then bind the BDF to the VFIO how would 
> > > > > you get
> > > > > a race?
> > > > 
> > > > Typically on PCI, you do a
> > > > 
> > > >   - add wildcard (pci id) match to vfio driver
> > > >   - unbind driver
> > > >   -> reprobe
> > > >   -> device attaches to vfio driver because it is the least recent match
> > > >   - remove wildcard match from vfio driver
> > > > 
> > > > If in between you hotplug add a card of the same type, it gets attached 
> > > > to vfio - even though the logical "default driver" would be the device 
> > > > specific driver.
> > > 
> > > I've mentioned drivers_autoprobe in the past, but I'm not sure we're
> > > really factoring it into the discussion.  drivers_autoprobe allows us to
> > > toggle two points:
> > > 
> > > a) When a new device is added whether we automatically give drivers a
> > > try at binding to it
> > > 
> > > b) When a new driver is added whether it gets to try to bind to anything
> > > in the system
> > > 
> > > So we do have a mechanism to avoid the race, but the problem is that it
> > > becomes the responsibility of userspace to:
> > > 
> > > 1) turn off drivers_autoprobe
> > > 2) unbind/new_id/bind/remove_id
> > > 3) turn on drivers_autoprobe
> > > 4) call drivers_probe for anything added between 1) & 3)
> > > 
> > > Is the question about the ugliness of the current solution whether it's
> > > unreasonable to ask userspace to do this?
> > > 
> > > What we seem to be asking for above is more like an autoprobe flag per
> > > driver where there's some way for this special driver to opt out of auto
> > > probing.  Option 2. in Stuart's list does this by short-cutting ID
> > > matching so that a "match" is only found when using the sysfs bind path,
> > > option 3. enables a way for a driver to expose their own sysfs entry
> > > point for binding.  The latter feels particularly chaotic since drivers
> > > get to make-up their own bind mechanism.
> > > 
> > > Another twist I'll throw in is that devices can be hot added to IOMMU
> > > groups that are in-use by userspace.  When that happens we'd like to be
> > > able to disable driver autoprobe of the device to avoid a host driver
> > > automatically binding to the device.  I wonder if instead of looking at
> > > the problem from the driver perspective, if we were to instead look at
> > > it from the device perspective if we might find a solution that would
> > > address both.  For instance, if devices had a driver_probe_id property
> > > that was by default set to their bus specific ID match ("$VENDOR
> > > $DEVICE" on PCI) could we use that to write new match IDs so that a
> > > device could only bind to a given driver?  Effectively we could then
> > > bind either using the current method of adding to the list of IDs a
> > > driver will match of changing the ID that a device would match.  Does
> > > that get us anywhere?  Thanks,
> > 
> > Here's one way this might work for PCI; note that we can do this
> > entirely in the bus driver for PCI.  Bind/unbind would go like this:
> > 
> > # bind device to vfio-pci
> > echo vfio-pci > /sys/bus/pci/devices/0000\:03\:00.0/preferred_driver
> > echo 0000:03:00.0 > /sys/bus/pci/devices/0000\:03\:00.0/driver/unbind
> > echo 0000:03:00.0 > /sys/bus/pci/drivers_probe
> > 
> > # bind device back to host driver
> > echo > /sys/bus/pci/devices/0000\:03\:00.0/preferred_driver
> > echo 0000:03:00.0 > /sys/bus/pci/devices/0000\:03\:00.0/driver/unbind
> > echo 0000:03:00.0 > /sys/bus/pci/drivers_probe
> > 
> > When preferred_driver is set for a device it will match and bind only to
> > a driver with a matching name.  This also means we can write random
> > strings here to avoid a device being bound to any driver if we want.
> > 
> > In the example patch below I've put the preferred_driver in the struct
> > pci_dev, but if this mechanism were adopted by multiple devices perhaps
> > we could add it to struct device.  Would something like this work for
> > platform devices?
> > 
> > Note 1, the below is just the core PCI driver change to support this,
> > there's some trivial collateral damage from changing an exported
> > function not shown here for brevity.
> > 
> > Note 2, PCI passes a struct pci_device_id to the driver probe function
> > which would be NULL in the preferred driver case of the example below.
> > We'd need to dynamically create one of these when calling the probe
> > function to make this practical for drivers that use that data.  Thanks,
> 
> That is I think a much easier way. Thought I would just call
> it 'override' instead of preferred_driver, since well, that is its
> intent.
> 
> Thank you for prototyping it!

I've realized since this first draft that returning NULL for the
pci_device_id would be unexpected for a number of drivers and probably
cause null pointer dereferences.  This is an implementation detail
though, we probably want a static "any ID" pci_device_id to return in
the case that there are no static table or dynid matches yet we still
want the override to match.  This should result in a smaller patch.
I'll wait for feasibility from the platform folks before I do another
revision though.  Thanks,

Alex
 
> > Signed-off-by: Alex Williamson <alex.william...@redhat.com>
> > 
> > diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> > index d911e0c..9425920 100644
> > --- a/drivers/pci/pci-driver.c
> > +++ b/drivers/pci/pci-driver.c
> > @@ -203,17 +203,23 @@ ATTRIBUTE_GROUPS(pci_drv);
> >   * Deprecated, don't use this as it will not catch any dynamic ids
> >   * that a driver might want to check for.
> >   */
> > -const struct pci_device_id *pci_match_id(const struct pci_device_id *ids,
> > -                                    struct pci_dev *dev)
> > +int pci_match_id(const struct pci_device_id *ids, struct pci_dev *dev,
> > +            const struct pci_device_id **id)
> >  {
> > +   if (id)
> > +           *id = NULL;
> > +
> >     if (ids) {
> >             while (ids->vendor || ids->subvendor || ids->class_mask) {
> > -                   if (pci_match_one_device(ids, dev))
> > -                           return ids;
> > +                   if (pci_match_one_device(ids, dev)) {
> > +                           if (id)
> > +                                   *id = ids;
> > +                           return 1;
> > +                   }
> >                     ids++;
> >             }
> >     }
> > -   return NULL;
> > +   return 0;
> >  }
> >  
> >  /**
> > @@ -225,22 +231,30 @@ const struct pci_device_id *pci_match_id(const struct 
> > pci_device_id *ids,
> >   * system is in its list of supported devices.  Returns the matching
> >   * pci_device_id structure or %NULL if there is no match.
> >   */
> > -static const struct pci_device_id *pci_match_device(struct pci_driver *drv,
> > -                                               struct pci_dev *dev)
> > +static int pci_match_device(struct pci_driver *drv, struct pci_dev *dev,
> > +                       const struct pci_device_id **id)
> >  {
> >     struct pci_dynid *dynid;
> >  
> > +   if (id)
> > +           *id = NULL;
> > +
> > +   if (dev->preferred_driver)
> > +           return !strcmp(drv->name, dev->preferred_driver);
> > +
> >     /* Look at the dynamic ids first, before the static ones */
> >     spin_lock(&drv->dynids.lock);
> >     list_for_each_entry(dynid, &drv->dynids.list, node) {
> >             if (pci_match_one_device(&dynid->id, dev)) {
> >                     spin_unlock(&drv->dynids.lock);
> > -                   return &dynid->id;
> > +                   if (id)
> > +                           *id = &dynid->id;
> > +                   return 1;
> >             }
> >     }
> >     spin_unlock(&drv->dynids.lock);
> >  
> > -   return pci_match_id(drv->id_table, dev);
> > +   return pci_match_id(drv->id_table, dev, id);
> >  }
> >  
> >  struct drv_dev_and_id {
> > @@ -342,8 +356,7 @@ __pci_device_probe(struct pci_driver *drv, struct 
> > pci_dev *pci_dev)
> >     if (!pci_dev->driver && drv->probe) {
> >             error = -ENODEV;
> >  
> > -           id = pci_match_device(drv, pci_dev);
> > -           if (id)
> > +           if (pci_match_device(drv, pci_dev, &id))
> >                     error = pci_call_probe(drv, pci_dev, id);
> >             if (error >= 0)
> >                     error = 0;
> > @@ -1272,17 +1285,12 @@ static int pci_bus_match(struct device *dev, struct 
> > device_driver *drv)
> >  {
> >     struct pci_dev *pci_dev = to_pci_dev(dev);
> >     struct pci_driver *pci_drv;
> > -   const struct pci_device_id *found_id;
> >  
> >     if (!pci_dev->match_driver)
> >             return 0;
> >  
> >     pci_drv = to_pci_driver(drv);
> > -   found_id = pci_match_device(pci_drv, pci_dev);
> > -   if (found_id)
> > -           return 1;
> > -
> > -   return 0;
> > +   return pci_match_device(pci_drv, pci_dev, NULL);
> >  }
> >  
> >  /**
> > diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
> > index 4e0acef..d6075f8 100644
> > --- a/drivers/pci/pci-sysfs.c
> > +++ b/drivers/pci/pci-sysfs.c
> > @@ -222,6 +222,46 @@ static ssize_t enabled_show(struct device *dev,
> >  }
> >  static DEVICE_ATTR_RW(enabled);
> >  
> > +static ssize_t preferred_driver_store(struct device *dev,
> > +                                 struct device_attribute *attr,
> > +                                 const char *buf, size_t count)
> > +{
> > +   struct pci_dev *pdev = to_pci_dev(dev);
> > +   char *preferred_driver, *old = pdev->preferred_driver;
> > +
> > +   if (count > PATH_MAX)
> > +           return -EINVAL;
> > +
> > +   preferred_driver = kstrndup(buf, count, GFP_KERNEL);
> > +   if (!preferred_driver)
> > +           return -ENOMEM;
> > +
> > +   while (strlen(preferred_driver) &&
> > +          preferred_driver[strlen(preferred_driver) - 1] == '\n')
> > +           preferred_driver[strlen(preferred_driver) - 1] = '\0';
> > +
> > +   if (strlen(preferred_driver)) {
> > +           pdev->preferred_driver = preferred_driver;
> > +   } else {
> > +           kfree(preferred_driver);
> > +           pdev->preferred_driver = NULL;
> > +   }
> > +                   
> > +   if (old)
> > +           kfree(old);
> > +
> > +   return count;
> > +}
> > +
> > +static ssize_t preferred_driver_show(struct device *dev,
> > +                                struct device_attribute *attr, char *buf)
> > +{
> > +   struct pci_dev *pdev = to_pci_dev(dev);
> > +
> > +   return sprintf(buf, "%s\n", pdev->preferred_driver);
> > +}
> > +static DEVICE_ATTR_RW(preferred_driver);
> > +
> >  #ifdef CONFIG_NUMA
> >  static ssize_t
> >  numa_node_show(struct device *dev, struct device_attribute *attr, char 
> > *buf)
> > @@ -521,6 +561,7 @@ static struct attribute *pci_dev_attrs[] = {
> >  #if defined(CONFIG_PM_RUNTIME) && defined(CONFIG_ACPI)
> >     &dev_attr_d3cold_allowed.attr,
> >  #endif
> > +   &dev_attr_preferred_driver.attr,
> >     NULL,
> >  };
> >  
> > diff --git a/include/linux/pci.h b/include/linux/pci.h
> > index aab57b4..6fecb0a 100644
> > --- a/include/linux/pci.h
> > +++ b/include/linux/pci.h
> > @@ -365,6 +365,7 @@ struct pci_dev {
> >  #endif
> >     phys_addr_t rom; /* Physical address of ROM if it's not from the BAR */
> >     size_t romlen; /* Length of ROM if it's not from the BAR */
> > +   char *preferred_driver; /* Preferred driver, supercedes ID matching */
> >  };
> >  
> >  static inline struct pci_dev *pci_physfn(struct pci_dev *dev)
> > @@ -1111,8 +1112,8 @@ int pci_add_dynid(struct pci_driver *drv,
> >               unsigned int subvendor, unsigned int subdevice,
> >               unsigned int class, unsigned int class_mask,
> >               unsigned long driver_data);
> > -const struct pci_device_id *pci_match_id(const struct pci_device_id *ids,
> > -                                    struct pci_dev *dev);
> > +int pci_match_id(const struct pci_device_id *ids, struct pci_dev *dev,
> > +            const struct pci_device_id **id);
> >  int pci_scan_bridge(struct pci_bus *bus, struct pci_dev *dev, int max,
> >                 int pass);
> >  
> > 
> > 
> > _______________________________________________
> > iommu mailing list
> > iommu@lists.linux-foundation.org
> > https://lists.linuxfoundation.org/mailman/listinfo/iommu



_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Reply via email to