Hello,

On Mon, 13 Apr 2020, Rich Freeman wrote:
>So, "trimming" isn't something a drive does really.  It is a logical
>command issued to the drive.
>
>The fundamental operations the drive does at the physical layer are:
>1. Read a block
>2. Write a block that is empty
>3. Erase a large group of blocks to make them empty
[..]
>Now, in this hypothetical case here is how the drive handles a TRIM
>command.  If it gets the logical instruction "TRIM block 1" what it
>does is:
>
>1. Look at the mapping table to determine that logical block 1 is at
>physical block 123001.
>2. Mark physical block 123001 as unused-but-dirty in the mapping table.
>
>That's all it does.  There are four ways that a drive can get marked
>as unused on an SSD:
>1. Out of the box all blocks are unused-but-clean.  (costs no
>operations that you care about)
>2. The trim command marks a block as unused-but-dirty. (costs no operations)
>3. Block overwrites mark the old block as unused-but-dirty. (costs a
>write operation, but you were writing data anyway)
>4. Task 2 can mark blocks as unused-but-dirty. (costs a bunch of reads
>and writes)
>
>Basically the goal of TRIM is to do more of #2 and less of #4 above,
>which is an expensive read-write defragmentation process.  Plus #4
>also increases drive wear since it involves copying data.

Beautifully summarized Rich! But I'd like to add two little aspects:

First of all: "physical write blocks" in the physical flash are 128kB
or something in that size range, not 4kB or even 512B ... Haven't
read, but looking enticing neither

https://en.wikipedia.org/wiki/Write_amplification

nor

https://en.wikipedia.org/wiki/Trim_(computing)

I hope they cover it ;)

Anyway, a write to a single (used) logical 512B block
involves:

1. read existing data of the phy-block-group (e.g. 128KB)
2. write data of logical block to the right spot of in-mem block-group
3. write in-mem block-group to (a different, unused) phy-block-group
4. update all logical block pointers to new phy-block-group as needed
5. mark old phy-block-group as unused

And whatnot.

And second: fstrim just makes the OS (via the Filesystem driver via
the SATA/NVME/SCSI driver through some hoops), or the Filesystem when
mounted with 'discard' via the drivers, tell the SSD one simple thing
about logical blocks that a deleted file used to use (in the TRIM
ATA/SCSI/SATA/NVME command, wikipedite for where TRIM is specced ;):

    "Hey, SSD, here's a list of LBAs (logical blocks) I no longer need.
    You may hencewith treat them as empty/unused."

Without it, the SSD has no idea about those blocks being unneeded and
treats blocks, once written to, as used blocks, doing the _tedious_
Copy-on-Write when a write hits one of those logical blocks, even if
those were deleted on the filesystem level years ago... see above WP
articles. Without TRIM, the SSD only gets to know the fact, when the
driver (the FS) writes again to the same logical block ...

With TRIM, the SSD-Controller knows what logical blocks it can treat
as unused, and do much better wear-leveling. So, it's sort of a
"trickle down 'unlink()' to the SSD"-feature. On the logical-block
level, mind you. But for the SSD, that can be quite a "relief"
regarding space for wear-leveling.

And what takes time when doing a "large" TRIM is transmitting a
_large_ list of blocks to the SSD via the TRIM command. That's why
e.g. those ~6-7GiB trims I did just before (see my other mail) took a
couple of seconds for 13GiB ~ 25M LBAs ~ a whole effin bunch of TRIM
commands (no idea... wait, 1-4kB per TRIM and 4B/LBA is max. 1k
LBAs/TRIM and for 25M LBAs you'll need minimum 25-100k TRIM
commands... go figure ;) no wonder it takes a second or few ;)

Oh, and yes, on rotating rust, all that does not matter. You'd just
let the data rot and write at 512B (or now 4kB) granularity. Well,
those 4k-but-512Bemulated drives (which is about all new ones by now I
think) have to do something like SSDs. But only on the 4kB level. Plus
the SMR shingling stuff of course. When will those implement TRIM?

HTH,
-dnh

-- 
All Hardware Sucks and I do not consider myself to actually have any data
until there's an offsite backup of it.                 -- Anthony de Boer

Reply via email to