Re: [ntfs-3g-devel] fstrim: With ntfs-3g, "the discard operation is not supported"

Jean-Pierre André Wed, 09 Jan 2019 08:30:01 -0800

james harvey wrote:

On Wed, Jan 9, 2019 at 4:24 AM Jean-Pierre André
<jean-pierre.an...@wanadoo.fr> wrote:

There are no recent changes in the trimming code, so you can
keep the version from Tuxera used by arch, or the latest code
from the repository on
https://sourceforge.net/p/ntfs-3g/ntfs-3g/ci/edge/tree/
For a test build, you may want to use the basic sequence
without installing (and without disturbing your environment)
./configure
make
Just use the script src/ntfs-3g for mounting.
Note : from the repository you first have to generate the
./configure by running ./autogen.sh If you are more familiar
with the arch building procedure, you probably do not have
to make any change to it.

Sounds good.  I went ahead and used the repo with your patch, without
installing.  Thanks for the link.

This is the correct procedure, just replace ntfs-3g by
src/ntfs-3g (src being the directory used for compiling),
and no need to use both ntfs-3g and lowntfs-3g : the log
has shown the error was issued by ntfs-3g.

Also the strace is probably not useful any more.

Attached is a patch to help understanding the issue, with
a more precise logging, and without stopping if a lower
level returns an error. The aim here is to determine if the
error has something to do with thin provisioning. For this
to be meaningful, you need to create a file and remove it
before trimming, for instance :
dd if=/dev/random of=/tmp/ntfs/random-data bs=16384 count=64
rm /tmp/ntfs/random-data
(using bs=16384 reduces the size of log).

Maybe a test without thin provisioning would be useful.
Jean-Pierre

Good call for a non-thin test.  That succeeds.  Having the volume be
thin appears to cause a "fstrim: discard granularity of backing device
is larger than cluster size".

Thin log: https://termbin.com/6org

Non-thin (still through LVM) log: https://termbin.com/dn3u

Not in the logs, but I ran more tests modifying the error message to
include discard_granularity and vol->cluster_size.  On a thin volume,
65536 and 4096.  On a thick volume, 512 and 4096.


So.
To support this situation, I added some code to align
the trimming bounds and remove the granularity check.
This may of course leave some unused block un-trimmed,
but the granularity has to be obeyed, even if the device
block is smaller than the granularity imposed at some
level.

See the attached patch-v2

Looks like those are from
/sys/dev/block/<major>:<minor>/queue/discard_granularity, which are as
expected 512 and 65536 for thick and thin LVM volumes.

Doing some looking, I ran across a patch for the kernel's
drivers/scsi/sd.c which discusses block layer rounding and aligning,
devices that guarantee zeros after discard, and if the block layer
leaves parts of the requested block range out, that it could lead to
data corruption.  It also discusses that dm thinp is unable to deal
with partial discard blocks, and they work around the problem by
setting the granularity to match the logical block size when LBPRZ
(guaranteed zeros after discard) is enabled.  Link:
https://patchwork.kernel.org/patch/7614861/  --- This was later fixed
by a patch, because it had a typo causing discard granularity to be
sent to 1: 
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/drivers/scsi/sd.c?id=f4327a95dd080ed6aecb185478a88ce1ee4fa3c4

/queue/discard_zeroes_data for thin or thick LVM volumes, at least on
my setup, is showing 0.  I'm not sure if LVM ever guarantees zeros
after discard.  There is a volume attribute for zeroing, but at least
in "man lvs" that's documented in notes:8 as "Newly-allocated data
blocks are overwritten with blocks of (z)eroes before use", so nothing
to do with after discard.

/queue/minimum_io_size is 65536 for thin, 512 for thick.

I found a thread regarding a problem of that scsi patch here:
https://www.spinics.net/lists/linux-scsi/msg106824.html


Maybe in the case of thin lvm, the shown granularity
may be improved in the future, as the result of those
actions. Then ntfs-3g will also benefit from the improvements.

Going to be away for about 12 hours.  Not sure if the proper fix is if
discard_granularity > vol->cluster_size, to look for aligned blocks of
size discard_granularity where all clusters within it are discardable.


What I am proposing does this.

My initial thought was if discard_zeroes_data was 0, perhaps
vol->cluster_size could be used instead, but the scsi patch notes did
say "dm thinp is unable to deal with partial discard blocks", if
that's still true.


The cluster size is defined when formatting the partition,
and it could be not much relevant for trimming (the cluster
size must be a multiple of sector size, but on a ssd,
I suppose the sector size is an artifact, and the trimming
block is greater than the sector size).

Maybe looking at how another fs handles this, like btrfs, would be
helpful.  It's at
https://github.com/torvalds/linux/blob/master/fs/btrfs/ioctl.c line
490, "btrfs_ioctl_fitrim()".


[From your other email]

if groups of consecutive clusters all being empty were looked for, to
make up an entire discard_granularity, and if so, discarding those,
the alignment would presumably need to be considered from the
beginning of the underlying device (the LVM thin volume), rather than
just within the partition.


ntfs-3g does not reliably know where the ntfs partition
sits on the device (there is a field in the boot sector for
that, but it is not always fed in by formaters). The
proposed patch assumes that the partition starts at
a multiple of the granularity. Maybe this is where the
discard_alignment comes into play, and maybe this
should be taken into account when rounding the trim
zones. I am not sure this is what discard_alignment is
for ("The discard_alignment parameter indicates how
many bytes the beginning of the partition is offset
from the internal allocation unit's natural alignment").

Jean-Pierre

--- libntfs-3g/ioctl.c.ref      2017-03-23 10:42:44.000000000 +0100
+++ libntfs-3g/ioctl.c  2019-01-09 16:19:14.017170400 +0100
@@ -225,6 +225,24 @@
        return 0;
 }
 
+static inline LCN align_up(ntfs_volume *vol, LCN lcn, u64 granularity)
+{
+       u64 aligned;
+
+       aligned = (lcn << vol->cluster_size_bits) + granularity - 1;
+       aligned -= aligned % granularity;
+       return (aligned >> vol->cluster_size_bits);
+}
+
+static inline u64 align_down(ntfs_volume *vol, u64 count, u64 granularity)
+{
+       u64 aligned;
+
+       aligned = count << vol->cluster_size_bits;
+       aligned -= aligned % granularity;
+       return (aligned >> vol->cluster_size_bits);
+}
+
 #define FSTRIM_BUFSIZ 4096
 
 /* Trim the filesystem.
@@ -243,7 +261,9 @@
        u8 *buf = NULL;
        LCN start_buf;
        int ret;
+int successes, unsupported, other;
 
+successes = unsupported = other = 0;
        ntfs_log_debug("fstrim: start=%llu len=%llu minlen=%llu\n",
                (unsigned long long) start,
                (unsigned long long) len,
@@ -255,11 +275,11 @@
         * XXX We could fix these limitations in future.
         */
        if (start != 0 || len != (uint64_t)-1) {
-               ntfs_log_debug("fstrim: setting start or length is not 
supported\n");
+               ntfs_log_error("fstrim: setting start or length is not 
supported\n");
                return -EINVAL;
        }
        if (minlen > vol->cluster_size) {
-               ntfs_log_debug("fstrim: minlen > cluster size is not 
supported\n");
+               ntfs_log_error("fstrim: minlen > cluster size is not 
supported\n");
                return -EINVAL;
        }
 
@@ -269,7 +289,7 @@
         * different.
         */
        if (!NDevBlock(vol->dev)) {
-               ntfs_log_debug("fstrim: not supported for non-block-device\n");
+               ntfs_log_error("fstrim: not supported for non-block-device\n");
                return -EOPNOTSUPP;
        }
 
@@ -278,15 +298,12 @@
        if (ret)
                return ret;
        if (discard_alignment != 0) {
-               ntfs_log_debug("fstrim: backing device is not aligned for 
discards\n");
-               return -EOPNOTSUPP;
-       }
-       if (discard_granularity > vol->cluster_size) {
-               ntfs_log_debug("fstrim: discard granularity of backing device 
is larger than cluster size\n");
+               ntfs_log_error("fstrim: backing device is not aligned for 
discards\n");
                return -EOPNOTSUPP;
        }
+
        if (discard_max_bytes == 0) {
-               ntfs_log_debug("fstrim: backing device does not support discard 
(discard_max_bytes == 0)\n");
+               ntfs_log_error("fstrim: backing device does not support discard 
(discard_max_bytes == 0)\n");
                return -EOPNOTSUPP;
        }
 
@@ -328,6 +345,8 @@
                for (start_lcn = start_buf; start_lcn < end_buf; ++start_lcn) {
                        if (!ntfs_bit_get(buf, start_lcn-start_buf)) {
                                LCN end_lcn;
+                               LCN aligned_lcn;
+                               u64 aligned_count;
 
                                /* Cluster 'start_lcn' is not in use,
                                 * find end of this run.
@@ -338,14 +357,30 @@
                                          < discard_max_bytes &&
                                        !ntfs_bit_get(buf, end_lcn-start_buf))
                                        end_lcn++;
+                               aligned_lcn = align_up(vol, start_lcn,
+                                               discard_granularity);
+                               if (aligned_lcn >= end_lcn)
+                                       aligned_count = 0;
+                               else {
+                                       aligned_count = 
+                                               align_down(vol,
+                                                       end_lcn - aligned_lcn,
+                                                       discard_granularity);
+                               }
+                               if (aligned_count) {
+                                       ret = fstrim_clusters(vol,
+                                               aligned_lcn, aligned_count);
+if (ret == -EOPNOTSUPP) unsupported++;
+else if (ret) other++;
+else successes++;
+/*
+                                       if (ret)
+                                               goto free_out;
+*/
 
-                               ret = fstrim_clusters(vol,
-                                               start_lcn, end_lcn-start_lcn);
-                               if (ret)
-                                       goto free_out;
-
-                               *trimmed += (end_lcn - start_lcn)
+                                       *trimmed += aligned_count
                                                << vol->cluster_size_bits;
+                               }
                                start_lcn = end_lcn-1;
                        }
                }
@@ -353,6 +388,7 @@
 
        ret = 0;
 free_out:
+ntfs_log_error("trim successes %d unsupported %d other %d\n", successes, 
unsupported, other);
        free(buf);
        return ret;
 }

_______________________________________________
ntfs-3g-devel mailing list
ntfs-3g-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ntfs-3g-devel

Re: [ntfs-3g-devel] fstrim: With ntfs-3g, "the discard operation is not supported"

Reply via email to