On Wed, Apr 13, 2016 at 04:41:18PM -0400, Brian Foster wrote:
> On Wed, Apr 13, 2016 at 02:33:52PM -0400, Brian Foster wrote:
> > On Wed, Apr 13, 2016 at 10:44:42AM -0700, Darrick J. Wong wrote:
> > > On Tue, Apr 12, 2016 at 12:42:48PM -0400, Brian Foster wrote:
> > > > From: Joe Thornber <[email protected]>
> > > > 
> > > > Experimental reserve interface for XFS guys to play with.
> > > > 
> > > > I have big reservations (no pun intended) about this patch.
> > > > 
> > > > [BF:
> > > >  - Support for reservation reduction.
> > > >  - Support for space provisioning.
> > > >  - Condensed to a single function.]
> > > > 
> > > > Not-Signed-off-by: Joe Thornber <[email protected]>
> > > > Not-Signed-off-by: Mike Snitzer <[email protected]>
> > > > ---
> > > >  drivers/md/dm-thin.c | 181 
> > > > ++++++++++++++++++++++++++++++++++++++++++++++++---
> > > >  1 file changed, 171 insertions(+), 10 deletions(-)
> > > > 
> > > > diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
> > > > index 92237b6..32bc5bd 100644
> > > > --- a/drivers/md/dm-thin.c
> > > > +++ b/drivers/md/dm-thin.c
> > ...
> > > > @@ -4271,6 +4343,94 @@ static void thin_io_hints(struct dm_target *ti, 
> > > > struct queue_limits *limits)
> > > >         limits->max_discard_sectors = 2048 * 1024 * 16; /* 16G */
> > > >  }
> > > >  
> > > > +static int thin_provision_space(struct dm_target *ti, sector_t offset,
> > > > +                               sector_t len, sector_t *res)
> > > > +{
> > > > +       struct thin_c *tc = ti->private;
> > > > +       struct pool *pool = tc->pool;
> > > > +       sector_t end;
> > > > +       dm_block_t pblock;
> > > > +       dm_block_t vblock;
> > > > +       int error;
> > > > +       struct dm_thin_lookup_result lookup;
> > > > +
> > > > +       if (!is_factor(offset, pool->sectors_per_block))
> > > > +               return -EINVAL;
> > > > +
> > > > +       if (!len || !is_factor(len, pool->sectors_per_block))
> > > > +               return -EINVAL;
> > > > +
> > > > +       if (res && !is_factor(*res, pool->sectors_per_block))
> > > > +               return -EINVAL;
> > > > +
> > > > +       end = offset + len;
> > > > +
> > > > +       while (offset < end) {
> > > > +               vblock = offset;
> > > > +               do_div(vblock, pool->sectors_per_block);
> > > > +
> > > > +               error = dm_thin_find_block(tc->td, vblock, true, 
> > > > &lookup);
> > > > +               if (error == 0)
> > > > +                       goto next;
> > > > +               if (error != -ENODATA)
> > > > +                       return error;
> > > > +
> > > > +               error = alloc_data_block(tc, &pblock);
> > > 
> > > So this means that if fallocate wants to BDEV_RES_PROVISION N blocks, it 
> > > must
> > > first increase the reservation (BDEV_RES_MOD) by N blocks to avoid using 
> > > up
> > > space that was previously reserved by some other caller.  I think?
> > > 
> > 
> > Yes, assuming this is being called from a filesystem using the
> > reservation mechanism.
> > 
> > > > +               if (error)
> > > > +                       return error;
> > > > +
> > > > +               error = dm_thin_insert_block(tc->td, vblock, pblock);
> > > 
> > > Having reserved and mapped blocks, what happens when we try to read them?
> > > Do we actually get zeroes, or does the read go straight through to 
> > > whatever
> > > happens to be in the disk blocks?  I don't think it's correct that we 
> > > could
> > > BDEV_RES_PROVISION and end up with stale credit card numbers from some 
> > > other
> > > thin device.
> > > 
> > 
> > Agree, but I'm not really sure how this works in thinp tbh. fallocate
> > wasn't really on my mind when doing this. I was simply trying to cobble
> > together what I could to facilitate making progress on the fs parts
> > (e.g., I just needed a call that allocated blocks and consumed
> > reservation in the process).
> > 
> > Skimming through the dm-thin code, it looks like a (configurable) block
> > zeroing mechanism can be triggered from somewhere around
> > provision_block()->schedule_zero(), depending on whether the incoming
> > write overwrites the newly allocated block. If that's the case, then I
> > suspect that means reads would just fall through to the block and return
> > whatever was on disk. This code would probably need to tie into that
> > zeroing mechanism one way or another to deal with that issue. (Though
> > somebody who actually knows something about dm-thin should verify that.
> > :)
> > 
> 
> BTW, if that mechanism is in fact doing I/O, that might not be the
> appropriate solution for fallocate. Perhaps we'd have to consider an
> unwritten flag or some such in dm-thin, if possible.

The hard part is that we don't know if the caller actually has a way to
prevent userspace from seeing the stale contents (filesystems) or if we'd
be leaking data straight to userspace (user program calling fallocate).

(And yeah, here we go with NO_HIDE_STALE again...)

--D

> 
> Brian
> 
> > Brian
> > 
> > > (PS: I don't know enough about thinp to know if this has already been 
> > > taken
> > > care of.  I didn't see anything, but who knows what I missed. :))
> > > 
> > > --D
> > > 
> > > > +               if (error)
> > > > +                       return error;
> > > > +
> > > > +               if (res && *res)
> > > > +                       *res -= pool->sectors_per_block;
> > > > +next:
> > > > +               offset += pool->sectors_per_block;
> > > > +       }
> > > > +
> > > > +       return 0;
> > > > +}
> > > > +
> > > > +static int thin_reserve_space(struct dm_target *ti, int mode, sector_t 
> > > > offset,
> > > > +                             sector_t len, sector_t *res)
> > > > +{
> > > > +       struct thin_c *tc = ti->private;
> > > > +       struct pool *pool = tc->pool;
> > > > +       sector_t blocks;
> > > > +       unsigned long flags;
> > > > +       int error;
> > > > +
> > > > +       if (mode == BDEV_RES_PROVISION)
> > > > +               return thin_provision_space(ti, offset, len, res);
> > > > +
> > > > +       /* res required for get/set */
> > > > +       error = -EINVAL;
> > > > +       if (!res)
> > > > +               return error;
> > > > +
> > > > +       if (mode == BDEV_RES_GET) {
> > > > +               spin_lock_irqsave(&tc->pool->lock, flags);
> > > > +               *res = tc->reserve_count * pool->sectors_per_block;
> > > > +               spin_unlock_irqrestore(&tc->pool->lock, flags);
> > > > +               error = 0;
> > > > +       } else if (mode == BDEV_RES_MOD) {
> > > > +               /*
> > > > +               * @res must always be a factor of the pool's blocksize; 
> > > > upper
> > > > +               * layers can rely on the bdev's minimum_io_size for 
> > > > this.
> > > > +               */
> > > > +               if (!is_factor(*res, pool->sectors_per_block))
> > > > +                       return error;
> > > > +
> > > > +               blocks = *res;
> > > > +               (void) sector_div(blocks, pool->sectors_per_block);
> > > > +
> > > > +               error = set_reserve_count(tc, blocks);
> > > > +       }
> > > > +
> > > > +       return error;
> > > > +}
> > > > +
> > > >  static struct target_type thin_target = {
> > > >         .name = "thin",
> > > >         .version = {1, 18, 0},
> > > > @@ -4285,6 +4445,7 @@ static struct target_type thin_target = {
> > > >         .status = thin_status,
> > > >         .iterate_devices = thin_iterate_devices,
> > > >         .io_hints = thin_io_hints,
> > > > +       .reserve_space = thin_reserve_space,
> > > >  };
> > > >  
> > > >  /*----------------------------------------------------------------*/
> > > > -- 
> > > > 2.4.11
> > > > 
> > > > _______________________________________________
> > > > xfs mailing list
> > > > [email protected]
> > > > http://oss.sgi.com/mailman/listinfo/xfs
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-block" in
> > > the body of a message to [email protected]
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-block" in
> > the body of a message to [email protected]
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
dm-devel mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/dm-devel

Reply via email to