On 2014-02-18 19:37, Ryusuke Konishi wrote:
> On Mon, 17 Feb 2014 23:39:51 +0100, Andreas Rohner wrote:
>> This patch adds the nilfs_sufile_trim_fs function, which takes a
>> fstrim_range structure and calls blkdev_issue_discard for every
>> clean segment in the specified range. The range is truncated to sector
>> boundaries.
>>
>> Signed-off-by: Andreas Rohner <[email protected]>
>> ---
>> fs/nilfs2/sufile.c | 144
>> +++++++++++++++++++++++++++++++++++++++++++++++++++++
>> fs/nilfs2/sufile.h | 1 +
>> 2 files changed, 145 insertions(+)
>>
>> diff --git a/fs/nilfs2/sufile.c b/fs/nilfs2/sufile.c
>> index 3127e9f..3605cc9 100644
>> --- a/fs/nilfs2/sufile.c
>> +++ b/fs/nilfs2/sufile.c
>> @@ -870,6 +870,150 @@ ssize_t nilfs_sufile_get_suinfo(struct inode *sufile,
>> __u64 segnum, void *buf,
>> }
>>
>> /**
>> + * nilfs_sufile_trim_fs() - trim ioctl handle function
>> + * @sufile: inode of segment usage file
>> + * @range: fstrim_range structure
>> + *
>> + * start: First Byte to trim
>> + * len: number of Bytes to trim from start
>> + * minlen: minimum extent length in Bytes
>> + *
>> + * Decription: nilfs_sufile_trim_fs goes through all segments containing
>> bytes
>> + * from start to start+len. start is rounded up to the next sector boundary
>> + * and start+len is rounded down. For each clean segment
>> blkdev_issue_discard
>> + * function is invoked to trim it.
>> + *
>> + * Return Value: On success, 0 is returned or negative error code,
>> otherwise.
>> + */
>> +int nilfs_sufile_trim_fs(struct inode *sufile, struct fstrim_range *range)
>> +{
>> + struct the_nilfs *nilfs = sufile->i_sb->s_fs_info;
>> + struct buffer_head *su_bh;
>> + struct nilfs_segment_usage *su;
>> + void *kaddr;
>> + size_t n, i, susz = NILFS_MDT(sufile)->mi_entry_size;
>> + sector_t seg_start, seg_end, real_start, real_end,
>> + start = 0, nblocks = 0;
>> + u64 segnum, end, minlen, trimmed = 0;
>> + int ret = 0;
>> + unsigned int sect_size, sects_per_block;
>> +
>> + sect_size = bdev_logical_block_size(nilfs->ns_bdev);
>> + sects_per_block = (1 << nilfs->ns_blocksize_bits) / sect_size;
>
>> + real_start = (range->start + sect_size - 1) / sect_size;
>> + real_end = (range->start + range->len) / sect_size;
>
> Why not use start_sect, end_sect instead of real_start, real_end?
> real_{start,end} are not intuitive to me.
Yes that looks better.
> We need to use do_div() for these divisions, and DIV_ROUND_UP_ULL()
> macro should be applied to round up the start sector.
>
> Moreover, we have to avoid overflow in "range->start + range->len".
> Actually, range->len is usually set to UULONG_MAX.
Ah yes I forgot to test that case.
> So, these will be as follows:
>
> u64 len = range->len;
>
> ...
>
> do_div(len, sect_size);
> if (!len)
> goto out;
>
> ...
> start_sect = DIV_ROUND_UP_ULL(range->start, sect_size);
> end_sect = start_sect + len - 1; /* this end_sect is inclusive */
I don't get why this has to be inclusive. To me this seems to be a
matter of taste. I think it is much easier to reason about this stuff
and more "natural", if start_sect is inclusive and end_sect is
exclusive. Then segnum is inclusive and end is exclusive. It is just
like with pointer arithmetic.
> Note that, we also should care about large range->start to avoid
> overflow in substitution to start_sect (sector_t) since sector_t may
> be 32-bit. We should check it before the division.
>
> Here, I recant my earlier comment. We should do the following check
> in this function to clarify that the overflow issue is handled
> properly.
Ok.
> u64 max_byte =
> ((u64)nilfs->ns_nsegments * nilfs->ns_blocks_per_segments)
> << nilfs->ns_blocksize_bits;
>
> ...
> if (range.len < nilfs->ns_blocksize || range.start >= max_byte)
> return -EINVAL;
> ...
> (divisions)
>
>> + segnum = nilfs_get_segnum_of_block(nilfs, real_start / sects_per_block);
>> + end = nilfs_get_segnum_of_block(nilfs, ((real_end + sects_per_block - 1)
>> + / sects_per_block) + nilfs->ns_blocks_per_segment - 1);
>
> It would be better to use the following intermediate variables to
> improve readability of these calculations.
Ok.
> And, these calculations need sector_div() and DIV_ROUND_UP_SECTOR_T()
> macro:
>
> start_block = start_sect;
> sector_div(start_block, sects_per_block);
>
> end_block = DIV_ROUND_UP_SECTOR_T(end_sect, sects_per_block);
>
> segnum = nilfs_get_segnum_of_block(nilfs, start_block);
> end = nilfs_get_segnum_of_block(
> nilfs, end_block + nilfs->ns_blocks_per_segment - 1);
>
>> + minlen = range->minlen / sect_size;
>
> And, this one needs do_div():
>
> minlen = range->minlen;
> do_div(minlen, sect_size);
>
>> +
>> +
>> + if (end > nilfs->ns_nsegments)
>> + end = nilfs->ns_nsegments;
>> + if (segnum >= nilfs->ns_nsegments || end <= segnum)
>> + goto out;
>> +
>> + down_read(&NILFS_MDT(sufile)->mi_sem);
>> +
>> + while (segnum < end) {
>> + n = nilfs_sufile_segment_usages_in_block(sufile, segnum,
>> + end - 1);
>> +
>> + ret = nilfs_sufile_get_segment_usage_block(sufile, segnum, 0,
>> + &su_bh);
>> + if (ret < 0) {
>> + if (ret != -ENOENT)
>> + goto out_sem;
>> + /* hole */
>> + segnum += n;
>> + continue;
>> + }
>> +
>> + kaddr = kmap_atomic(su_bh->b_page);
>> + su = nilfs_sufile_block_get_segment_usage(sufile, segnum,
>> + su_bh, kaddr);
>> + for (i = 0; i < n; ++i, ++segnum, su = (void *)su + susz) {
>> + if (!nilfs_segment_usage_clean(su))
>> + continue;
>> +
>> + nilfs_get_segment_range(nilfs, segnum, &seg_start,
>> + &seg_end);
>> +
>> + if (!nblocks) {
>> + /* start new extent */
>> + start = seg_start;
>> + nblocks = seg_end - seg_start + 1;
>> + continue;
>> + }
>> +
>> + if (start + nblocks == seg_start) {
>> + /* add to previous extent */
>> + nblocks += seg_end - seg_start + 1;
>> + continue;
>> + }
>> +
>> + /* discard previous extent */
>> + start *= sects_per_block;
>> + nblocks *= sects_per_block;
>> + if (start < real_start) {
>> + nblocks -= real_start - start;
>> + start = real_start;
>> + }
>
>> + if (start + nblocks > real_end)
>> + nblocks = real_end - start;
>
> Why do you need this adjustment during discarding "previous" extent ?
You are right I don't need it.
>> + if (nblocks >= minlen) {
>> + kunmap_atomic(kaddr);
>> +
>> + ret = blkdev_issue_discard(nilfs->ns_bdev,
>> + start, nblocks, GFP_NOFS, 0);
>> + if (ret < 0) {
>> + put_bh(su_bh);
>> + goto out_sem;
>> + }
>> +
>> + trimmed += nblocks;
>> + kaddr = kmap_atomic(su_bh->b_page);
>> + su = nilfs_sufile_block_get_segment_usage(
>> + sufile, segnum, su_bh, kaddr);
>> + }
>> +
>> + /* start new extent */
>> + start = seg_start;
>> + nblocks = seg_end - seg_start + 1;
>> + }
>> + kunmap_atomic(kaddr);
>> + put_bh(su_bh);
>> + }
>> +
>> +
>> + if (nblocks) {
>> + /* discard last extent */
>> + start *= sects_per_block;
>> + nblocks *= sects_per_block;
>> + if (start < real_start) {
>> + nblocks -= real_start - start;
>> + start = real_start;
>> + }
>> + if (start + nblocks > real_end)
>> + nblocks = real_end - start;
>> +
>> + if (nblocks >= minlen) {
>> + ret = blkdev_issue_discard(nilfs->ns_bdev, start,
>> + nblocks, GFP_NOFS, 0);
>> + if (!ret)
>> + trimmed += nblocks;
>> + }
>> + }
>> +
>> +out_sem:
>> + up_read(&NILFS_MDT(sufile)->mi_sem);
>> +out:
>> + range->len = trimmed * sect_size;
>> + return ret;
>> +}
>> +
>> +/**
>> * nilfs_sufile_read - read or get sufile inode
>> * @sb: super block instance
>> * @susize: size of a segment usage entry
>> diff --git a/fs/nilfs2/sufile.h b/fs/nilfs2/sufile.h
>> index e84bc5b..2434abd 100644
>> --- a/fs/nilfs2/sufile.h
>> +++ b/fs/nilfs2/sufile.h
>> @@ -65,6 +65,7 @@ void nilfs_sufile_do_set_error(struct inode *, __u64,
>> struct buffer_head *,
>> int nilfs_sufile_resize(struct inode *sufile, __u64 newnsegs);
>> int nilfs_sufile_read(struct super_block *sb, size_t susize,
>> struct nilfs_inode *raw_inode, struct inode **inodep);
>> +int nilfs_sufile_trim_fs(struct inode *sufile, struct fstrim_range *range);
>>
>> /**
>> * nilfs_sufile_scrap - make a segment garbage
>> --
>> 1.9.0
>
> Please try to compile this patch both for 32-bit kernel and 64-bit
> kernel to test if the patch is architecture independent.
Ok.
With all the proper division macros it gets very complicated. I think it
would simplify things if we just truncate to block size instead of
sector size. Then we could use simple bit shifts. It would look
something like this:
if (range->len < nilfs->ns_blocksize ||
range->start >= max_byte)
return -EINVAL;
/* sector_t could be 32 bit */
if (range->len > max_byte)
range->len = max_byte;
sects_per_block = (1 << nilfs->ns_blocksize_bits) /
bdev_logical_block_size(nilfs->ns_bdev);
start_block = (range->start + nilfs->ns_blocksize - 1) >>
nilfs->ns_blocksize_bits;
end_block = start_block + (range->len >>
nilfs->ns_blocksize_bits);
segnum = nilfs_get_segnum_of_block(nilfs, start_block);
end = nilfs_get_segnum_of_block(nilfs, end_block +
nilfs->ns_blocks_per_segment - 1);
minlen = range->minlen >> nilfs->ns_blocksize_bits;
What do you think?
Regards,
Andreas Rohner
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html