On 26/02/13 03:05PM, Ira Weiny wrote:
> John Groves wrote:
> > From: John Groves <[email protected]>
> > 
> > The new fsdev driver provides pages/folios initialized compatibly with
> > fsdax - normal rather than devdax-style refcounting, and starting out
> > with order-0 folios.
> > 
> > When fsdev binds to a daxdev, it is usually (always?) switching from the
> > devdax mode (device.c), which pre-initializes compound folios according
> > to its alignment. Fsdev uses fsdev_clear_folio_state() to switch the
> > folios into a fsdax-compatible state.
> > 
> > A side effect of this is that raw mmap doesn't (can't?) work on an fsdev
> > dax instance. Accordingly, The fsdev driver does not provide raw mmap -
> > devices must be put in 'devdax' mode (drivers/dax/device.c) to get raw
> > mmap capability.
> > 
> > In this commit is just the framework, which remaps pages/folios compatibly
> > with fsdax.
> > 
> > Enabling dax changes:
> > 
> > - bus.h: add DAXDRV_FSDEV_TYPE driver type
> > - bus.c: allow DAXDRV_FSDEV_TYPE drivers to bind to daxdevs
> > - dax.h: prototype inode_dax(), which fsdev needs
> > 
> > Suggested-by: Dan Williams <[email protected]>
> > Suggested-by: Gregory Price <[email protected]>
> > Signed-off-by: John Groves <[email protected]>
> > ---
> >  MAINTAINERS          |   8 ++
> >  drivers/dax/Makefile |   6 ++
> >  drivers/dax/bus.c    |   4 +
> >  drivers/dax/bus.h    |   1 +
> >  drivers/dax/fsdev.c  | 242 +++++++++++++++++++++++++++++++++++++++++++
> >  fs/dax.c             |   1 +
> >  include/linux/dax.h  |   5 +
> >  7 files changed, 267 insertions(+)
> >  create mode 100644 drivers/dax/fsdev.c
> > 
> 
> [snip]
> 
> > +
> > +static int fsdev_dax_probe(struct dev_dax *dev_dax)
> > +{
> > +   struct dax_device *dax_dev = dev_dax->dax_dev;
> > +   struct device *dev = &dev_dax->dev;
> > +   struct dev_pagemap *pgmap;
> > +   u64 data_offset = 0;
> > +   struct inode *inode;
> > +   struct cdev *cdev;
> > +   void *addr;
> > +   int rc, i;
> > +
> > +   if (static_dev_dax(dev_dax))  {
> > +           if (dev_dax->nr_range > 1) {
> > +                   dev_warn(dev, "static pgmap / multi-range device 
> > conflict\n");
> > +                   return -EINVAL;
> > +           }
> > +
> > +           pgmap = dev_dax->pgmap;
> > +   } else {
> > +           size_t pgmap_size;
> > +
> > +           if (dev_dax->pgmap) {
> > +                   dev_warn(dev, "dynamic-dax with pre-populated page 
> > map\n");
> > +                   return -EINVAL;
> > +           }
> > +
> > +           pgmap_size = struct_size(pgmap, ranges, dev_dax->nr_range - 1);
> > +           pgmap = devm_kzalloc(dev, pgmap_size,  GFP_KERNEL);
> > +           if (!pgmap)
> > +                   return -ENOMEM;
> > +
> > +           pgmap->nr_range = dev_dax->nr_range;
> > +           dev_dax->pgmap = pgmap;
> > +
> > +           for (i = 0; i < dev_dax->nr_range; i++) {
> > +                   struct range *range = &dev_dax->ranges[i].range;
> > +
> > +                   pgmap->ranges[i] = *range;
> > +           }
> > +   }
> > +
> > +   for (i = 0; i < dev_dax->nr_range; i++) {
> > +           struct range *range = &dev_dax->ranges[i].range;
> > +
> > +           if (!devm_request_mem_region(dev, range->start,
> > +                                   range_len(range), dev_name(dev))) {
> > +                   dev_warn(dev, "mapping%d: %#llx-%#llx could not reserve 
> > range\n",
> > +                            i, range->start, range->end);
> > +                   return -EBUSY;
> > +           }
> > +   }
> 
> All of the above code is AFAICT exactly the same as the dev_dax driver.
> Isn't there a way to make this common?
> 
> The rest of the common code is simple enough.

dev_dax_probe() and fsdev_dax_probe() do indeed have some "same code" - 
range validity checking and pgmap setup, from the top of probe through 
the for loop above. After that they're different. Also, I just did a scan 
and the probe function seems like the only remaining common code between 
device.c and fsdev.c.

These are separate kmods; that code could certainly be factored out and 
shared, but it would need to go somewhere common (maybe bus.c)?

So both device.c and fsdev.c would call bus.c:dax_prepare_pgmap() or
some such.

I feel like this might not be worth factoring out, but I'm happy to do it
if you and/or the dax team prefer it factored out and shared.

> 
> > +
> > +   /*
> > +    * FS-DAX compatible mode: Use MEMORY_DEVICE_FS_DAX type and
> > +    * do NOT set vmemmap_shift. This leaves folios at order-0,
> > +    * allowing fs-dax to dynamically create compound folios as needed
> > +    * (similar to pmem behavior).
> > +    */
> > +   pgmap->type = MEMORY_DEVICE_FS_DAX;
> > +   pgmap->ops = &fsdev_pagemap_ops;
> > +   pgmap->owner = dev_dax;
> > +
> > +   /*
> > +    * CRITICAL DIFFERENCE from device.c:
> > +    * We do NOT set vmemmap_shift here, even if align > PAGE_SIZE.
> > +    * This ensures folios remain order-0 and are compatible with
> > +    * fs-dax's folio management.
> > +    */
> > +
> > +   addr = devm_memremap_pages(dev, pgmap);
> > +   if (IS_ERR(addr))
> > +           return PTR_ERR(addr);
> > +
> > +   /*
> > +    * Clear any stale compound folio state left over from a previous
> > +    * driver (e.g., device_dax with vmemmap_shift).
> > +    */
> > +   fsdev_clear_folio_state(dev_dax);
> > +
> > +   /* Detect whether the data is at a non-zero offset into the memory */
> > +   if (pgmap->range.start != dev_dax->ranges[0].range.start) {
> > +           u64 phys = dev_dax->ranges[0].range.start;
> > +           u64 pgmap_phys = dev_dax->pgmap[0].range.start;
> > +
> > +           if (!WARN_ON(pgmap_phys > phys))
> > +                   data_offset = phys - pgmap_phys;
> > +
> > +           pr_debug("%s: offset detected phys=%llx pgmap_phys=%llx 
> > offset=%llx\n",
> > +                  __func__, phys, pgmap_phys, data_offset);
> > +   }
> > +
> > +   inode = dax_inode(dax_dev);
> > +   cdev = inode->i_cdev;
> > +   cdev_init(cdev, &fsdev_fops);
> > +   cdev->owner = dev->driver->owner;
> > +   cdev_set_parent(cdev, &dev->kobj);
> > +   rc = cdev_add(cdev, dev->devt, 1);
> > +   if (rc)
> > +           return rc;
> > +
> > +   rc = devm_add_action_or_reset(dev, fsdev_cdev_del, cdev);
> > +   if (rc)
> > +           return rc;
> > +
> > +   run_dax(dax_dev);
> > +   return devm_add_action_or_reset(dev, fsdev_kill, dev_dax);
> > +}
> > +
> 
> [snip]
> 
> > diff --git a/include/linux/dax.h b/include/linux/dax.h
> > index 9d624f4d9df6..fe1315135fdd 100644
> > --- a/include/linux/dax.h
> > +++ b/include/linux/dax.h
> > @@ -51,6 +51,10 @@ struct dax_holder_operations {
> >  
> >  #if IS_ENABLED(CONFIG_DAX)
> >  struct dax_device *alloc_dax(void *private, const struct dax_operations 
> > *ops);
> > +
> > +#if IS_ENABLED(CONFIG_DEV_DAX_FS)
> > +struct dax_device *inode_dax(struct inode *inode);
> > +#endif
> 
> I don't understand why this hunk is added here but then removed in a later
> patch?  Why can't this be placed below? ...
> 
> >  void *dax_holder(struct dax_device *dax_dev);
> >  void put_dax(struct dax_device *dax_dev);
> >  void kill_dax(struct dax_device *dax_dev);
> > @@ -153,6 +157,7 @@ static inline void fs_put_dax(struct dax_device 
> > *dax_dev, void *holder)
> >  #if IS_ENABLED(CONFIG_FS_DAX)
> >  int dax_writeback_mapping_range(struct address_space *mapping,
> >             struct dax_device *dax_dev, struct writeback_control *wbc);
> > +int dax_folio_reset_order(struct folio *folio);
> 
> ... Here?

Done, thanks - good catch. That was just sloppy factoring into a series on
my part.

> 
> Ira
> 
> [snip]

Thanks for the reviewing Ira!

Regards,
John


Reply via email to