On Mon, Apr 17, 2017 at 11:13 AM, Austin S. Hemmelgarn <ahferro...@gmail.com> wrote:
>> What is a high end SSD these days? Built-in NVMe? > > One with a good FTL in the firmware. At minimum, the good Samsung EVO > drives, the high quality Intel ones, and the Crucial MX series, but probably > some others. My choice of words here probably wasn't the best though. It's a confusing market that sorta defies figuring out what we've got. I have a Samsung EVO SATA SSD in one laptop, but then I have a Samsung EVO+ SD Card in an Intel NUC. They use that same EVO branding on an $11 SD Card. And then there's the Samsung Electronics Co Ltd NVMe SSD Controller SM951/PM951 in another laptop. >> So long as this file is not reflinked or snapshot, filefrag shows a >> pile of mostly 4096 byte blocks, thousands. But as they're pretty much >> all continuous, the file fragmentation (extent count) is usually never >> higher than 12. It meanders between 1 and 12 extents for its life. >> >> Except on the system using ssd_spread mount option. That one has a >> journal file that is +C, is not being snapshot, but has over 3000 >> extents per filefrag and btrfs-progs/debugfs. Really weird. > > Given how the 'ssd' mount option behaves and the frequency that most systemd > instances write to their journals, that's actually reasonably expected. We > look for big chunks of free space to write into and then align to 2M > regardless of the actual size of the write, which in turn means that files > like the systemd journal which see lots of small (relatively speaking) > writes will have way more extents than they should until you defragment > them. Nope. The first paragraph applies to NVMe machine with ssd mount option. Few fragments. The second paragraph applies to SD Card machine with ssd_spread mount option. Many fragments. These are different versions of systemd-journald so I can't completely rule out a difference in write behavior. >> Now, systemd aside, there are databases that behave this same way >> where there's a small section contantly being overwritten, and one or >> more sections that grow the data base file from within and at the end. >> If this is made cow, the file will absolutely fragment a ton. And >> especially if the changes are mostly 4KiB block sizes that then are >> fsync'd. >> >> It's almost like we need these things to not fsync at all, and just >> rely on the filesystem commit time... > > Essentially yes, but that causes all kinds of other problems. Drat. -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html