Thanks for your thorough tests and investigation.

james harvey wrote:
Added rjo...@redhat.com for having written a lot of the original code
involved here, thread from 2014 here:
https://sourceforge.net/p/ntfs-3g/mailman/ntfs-3g-devel/?viewmonth=201407

Replies below.

On Wed, Jan 9, 2019 at 11:29 AM Jean-Pierre André
<jean-pierre.an...@wanadoo.fr> wrote:
james harvey wrote:
Not in the logs, but I ran more tests modifying the error message to
include discard_granularity and vol->cluster_size.  On a thin volume,
65536 and 4096.  On a thick volume, 512 and 4096.
So.
To support this situation, I added some code to align
the trimming bounds and remove the granularity check.
This may of course leave some unused block un-trimmed,
but the granularity has to be obeyed, even if the device
block is smaller than the granularity imposed at some
level.

See the attached patch-v2
For when discard_alignment is 0, this patch seems good to me.

Test 1 - Using an LVM thin pool with discard_granularity/chunk_size
128MB, I used ntfs-3g to create a 5GB volume.  Not using quick, after
writing zeros to drive, the volume showed 100% usage in LVM.  Mounting
it and running fstrim showed 4.4 GiB trimmed.  LVM showed a
corresponding drop in usage.  With a 128MB discard_granularity, that
seems reasonable to me, not knowing the exact specifics of now ntfs is
laid out.

On a newly formatted ntfs partition, two main zones
are used : about 10% in the beginning (for MFT) and
64MB in the middle (for LogFile), and also a few scattered
small zone, mostly near the beginning.

Test 2 - I made a copy of my win7 filesystem, on an LVM thin pool with
discard_granularity/chunk_size 1MB, so discard_alignment would be 0.
It's a 90G volume, showing 71G used.  Mounting it and running fstrim
showed 13.5 GiB trimmed.  LVM showed a corresponding drop in usage.
With a 1MB discard_granularity, it seems reasonable to me that only
13.5 of the 20G free is discarded.  I then dropped caches (echo 3 >
/proc/sys/vm/drop_caches), mounted the original volume, and ran "diff
--recursive" in an attempt to make sure that nothing was discarded
improperly.  It found no differences.  This was on a Windows 7 system
volume with 225,461 files/directories, so that seems like a pretty
thorough test to me, given the topology.

Test 3 - Repeated test 2, but with a thin pool
discard_granularity/chunk_size 64KB.  fstrim shows 18.3 GiB trimmed.
LVM shows a corresponding drop.  I was expecting a bit more, but seems
reasonable enough that 8.5% of the free FS 4K chunks reside within a
discard 16K chunk with something else.  Ran another diff, still no
differences.  Here, I also changed the VM to use the LVM volume, and
was able to boot and use it without apparent problems.

The allocation policy on ntfs tries to keep some space
available after the end of files, so that a file can be grown
without extra fragmentation. This likely leads to unused
space being scattered and poor trimming rate with a big
granularity.

...
if groups of consecutive clusters all being empty were looked for, to
make up an entire discard_granularity, and if so, discarding those,
the alignment would presumably need to be considered from the
beginning of the underlying device (the LVM thin volume), rather than
just within the partition.
ntfs-3g does not reliably know where the ntfs partition
sits on the device (there is a field in the boot sector for
that, but it is not always fed in by formaters). The
proposed patch assumes that the partition starts at
a multiple of the granularity. Maybe this is where the
discard_alignment comes into play, and maybe this
should be taken into account when rounding the trim
zones. I am not sure this is what discard_alignment is
for ("The discard_alignment parameter indicates how
many bytes the beginning of the partition is offset
from the internal allocation unit's natural alignment").
It looks to me like that's what discard_alignment is for.

The word "internal" is not clear to me. Could mean
internal to partition, or underlying device, the latter
being the only one meaningful for trimming.

I emailed the dm-devel (LVM related) list to make sure all of this is
on the right track.  See
https://www.redhat.com/archives/dm-devel/2019-January/msg00057.html --
One of their main dev's confirmed that LVM doesn't track smaller
amounts than discard_granularity, so only aligned chunks of
discard_granularity are discarded.  He also said that if
discard_granularity is 64K, and an unaligned trim of 64K is sent at
position 32K, nothing happens.  So, it's sounding to me like getting
this right is to efficiently discard, rather than prevent data loss.

I was thinking that LVM is calculating discard_alignment incorrectly.
I emailed the linux kernel list and tagged Martin Petersen in it,
being the contact for it.  See https://lkml.org/lkml/2019/1/9/1190 --
His reply seems to confirm for me that LVM is calculating it
incorrectly.  I reported that today on the dm-devel thread linked in
previous paragraph.

To me, and from my reading of what Martin said, discard_granularity
should be the offset from the beginning of the internal allocation
unit that holds the beginning of the block device (here, a partition),
to the beginning of the block device.  But, LVM/dm/kpartx seems to be
calculating it in reverse, instead giving the offset from where the
block device (partition) starts to the beginning of the NEXT
allocation unit.

If an LVM volume has discard_granularity/chunk_size is 128MB, and
within it there's a partition table setting partition 1 at 1MB, I'd
expect discard_alignment to be 1MB, but LVM gives 127MB.

So, if I'm right on this, I won't be able to test a patch including
support for discard_alignment!=0 until LVM/dm/kpartx (not sure which)
is fixed.

So for now, the safe thing is to keep the check on
discard_alignment and return "not supported" when
partition is not properly aligned.

Attached is ioctl.c.patch-v3, which I will push to repo
next week, unless some new issue emerges.

Jean-Pierre

--- libntfs-3g/ioctl.c.ref      2017-03-23 10:42:44.000000000 +0100
+++ libntfs-3g/ioctl.c  2019-01-10 19:03:13.459672400 +0100
@@ -3,7 +3,7 @@
  *
  *      This module is part of ntfs-3g library
  *
- * Copyright (c) 2014-2015 Jean-Pierre Andre
+ * Copyright (c) 2014-2019 Jean-Pierre Andre
  * Copyright (c) 2014      Red Hat, Inc.
  *
  * This program/include file is free software; you can redistribute it and/or
@@ -225,6 +225,24 @@
        return 0;
 }
 
+static inline LCN align_up(ntfs_volume *vol, LCN lcn, u64 granularity)
+{
+       u64 aligned;
+
+       aligned = (lcn << vol->cluster_size_bits) + granularity - 1;
+       aligned -= aligned % granularity;
+       return (aligned >> vol->cluster_size_bits);
+}
+
+static inline u64 align_down(ntfs_volume *vol, u64 count, u64 granularity)
+{
+       u64 aligned;
+
+       aligned = count << vol->cluster_size_bits;
+       aligned -= aligned % granularity;
+       return (aligned >> vol->cluster_size_bits);
+}
+
 #define FSTRIM_BUFSIZ 4096
 
 /* Trim the filesystem.
@@ -255,11 +273,11 @@
         * XXX We could fix these limitations in future.
         */
        if (start != 0 || len != (uint64_t)-1) {
-               ntfs_log_debug("fstrim: setting start or length is not 
supported\n");
+               ntfs_log_error("fstrim: setting start or length is not 
supported\n");
                return -EINVAL;
        }
        if (minlen > vol->cluster_size) {
-               ntfs_log_debug("fstrim: minlen > cluster size is not 
supported\n");
+               ntfs_log_error("fstrim: minlen > cluster size is not 
supported\n");
                return -EINVAL;
        }
 
@@ -269,7 +287,7 @@
         * different.
         */
        if (!NDevBlock(vol->dev)) {
-               ntfs_log_debug("fstrim: not supported for non-block-device\n");
+               ntfs_log_error("fstrim: not supported for non-block-device\n");
                return -EOPNOTSUPP;
        }
 
@@ -278,15 +296,12 @@
        if (ret)
                return ret;
        if (discard_alignment != 0) {
-               ntfs_log_debug("fstrim: backing device is not aligned for 
discards\n");
-               return -EOPNOTSUPP;
-       }
-       if (discard_granularity > vol->cluster_size) {
-               ntfs_log_debug("fstrim: discard granularity of backing device 
is larger than cluster size\n");
+               ntfs_log_error("fstrim: backing device is not aligned for 
discards\n");
                return -EOPNOTSUPP;
        }
+
        if (discard_max_bytes == 0) {
-               ntfs_log_debug("fstrim: backing device does not support discard 
(discard_max_bytes == 0)\n");
+               ntfs_log_error("fstrim: backing device does not support discard 
(discard_max_bytes == 0)\n");
                return -EOPNOTSUPP;
        }
 
@@ -323,11 +338,14 @@
                }
 
                /* Trim the clusters in large as possible blocks, but
-                * not larger than discard_max_bytes.
+                * not larger than discard_max_bytes, and compatible
+                * with the supported trim granularity.
                 */
                for (start_lcn = start_buf; start_lcn < end_buf; ++start_lcn) {
                        if (!ntfs_bit_get(buf, start_lcn-start_buf)) {
                                LCN end_lcn;
+                               LCN aligned_lcn;
+                               u64 aligned_count;
 
                                /* Cluster 'start_lcn' is not in use,
                                 * find end of this run.
@@ -338,14 +356,25 @@
                                          < discard_max_bytes &&
                                        !ntfs_bit_get(buf, end_lcn-start_buf))
                                        end_lcn++;
+                               aligned_lcn = align_up(vol, start_lcn,
+                                               discard_granularity);
+                               if (aligned_lcn >= end_lcn)
+                                       aligned_count = 0;
+                               else {
+                                       aligned_count = 
+                                               align_down(vol,
+                                                       end_lcn - aligned_lcn,
+                                                       discard_granularity);
+                               }
+                               if (aligned_count) {
+                                       ret = fstrim_clusters(vol,
+                                               aligned_lcn, aligned_count);
+                                       if (ret)
+                                               goto free_out;
 
-                               ret = fstrim_clusters(vol,
-                                               start_lcn, end_lcn-start_lcn);
-                               if (ret)
-                                       goto free_out;
-
-                               *trimmed += (end_lcn - start_lcn)
+                                       *trimmed += aligned_count
                                                << vol->cluster_size_bits;
+                               }
                                start_lcn = end_lcn-1;
                        }
                }
_______________________________________________
ntfs-3g-devel mailing list
ntfs-3g-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ntfs-3g-devel

Reply via email to