Re: [gentoo-user] Filesystem choice for NVMe SSD

Rich Freeman Sat, 23 Jan 2016 17:14:10 -0800

On Sat, Jan 23, 2016 at 7:44 PM, Andrew Savchenko <birc...@gentoo.org> wrote:
>
> a) EXT4 is a good extremely robust solution. Reliability is out of
> the questioning: on my old box with bad memory banks it kept my data
> safe for years, almost all losses were recoverable. And it has some
> SSD-oriented features like discard support, also stripe and stride
> with can be aligned to erase block size to optimize erase
> operations and reduce wear-out.


I think EXT4 is the conservative solution.  It is also more flexible
than xfs, and a decent performer all around.

> b) In some tests XFS is better than EXT4 ([1] slides 16-18; [2]).
> Though I had data loss on XFS on unclear shutdown events in the
> past. This was about 5 years ago, so XFS robustness should have
> improved, of course, but I still remember the pain :/

Maybe it improved, but xfs probably hasn't changed much at all.  It
isn't really a focus of development as far as I'm aware.  I probably
wouldn't use it.  I used to use it, but was frustrated with its
inability to shrink and the zero-files feature.

> c) F2FS looks very interesting, it has really good flash-oriented
> design [3]. Also it seems to beat EXT4 on PCIe SSD ([3] chapter
> 3.2.2, pages 9-10) and everything other on compile test ([2] page 5)
> which should be close to the type of workload I'm interested in
> (though all tests in [2] have extra raid layer). The only thing
> that bothers me is some data loss reports for F2FS found on the
> net, though all I found is dated back 2012-2014 and F2FS have fsck
> tool now, thus it should be more reliable these days.

So, F2FS is of course very promising on flash.  It should be the most
efficient solution in terms of even wear of your drive.  I'd think
that lots of short-term files like compiling would actually be a good
use case for it, Since discarded files don't need to be rewritten when
it rolls over.  But, I won't argue with the benchmarks.

It probably will improve as well as it matures.  However, it isn't
nearly as mature as ext4 or xfs, so from a data-integrity standpoint
you're definitely at higher risk.  If you're regularly backing up and
don't care about a very low risk of problems, It is probably a good
choice.

> d) I'm not sure about BTRFS, since it is very sophisticated and I'm
> not interested in its advanced features such as snapshots,
> checksums, subvolumes and so on. In some tests [2] it tends to
> achive better performance than ext4, but due to its sophisticated
> nature it is bound to add more code paths and latency than other
> solutions.

The only reason to use btrfs at this point are all those advanced
features, such as being able to copy a directory with 10 million small
files in it in 5 seconds.  I'd think that might be useful for
development, but of course git does the same thing.  Git and btrfs are
actually somewhat similar in principle.  Take git with the ability to
erase blobs and mirror things and that is kind of like btrfs.

Btrfs is also immature, and while data loss on an n-1 longterm like
3.18 is fairly rare it does happen.  It is optimized for ssd but
generally tends to underperform the write-in-place filesystems, mainly
because it isn't optimized, and of course it can't write-in-place.

If you REALLY don't care about the data integrity features and such
then I don't think it is the best solution for you.

> P.S. Is aligning to erase block size really important for NVMe? I
> can't find erase block size for this drive (Samsung MZ-VKV512)
> neither in official docs nor on the net...

Unless the erase blocks are a single sector in size then I'd think
that alignment would matter.  Now, for F2FS alignment probably matters
far less than other filesystems since the only blocks on the entire
drive that may potentially be partially erased are the ones that
border two log regions.  F2FS just writes each block in a region once,
and then trims and entire contiguous region when it fills the previous
region up.  Large contiguous trims with individual blocks being
written once are basically a best-case for flash, which is of course
why it works that way.  You should still ensure it is aligned, but not
much will happen if it isn't I'd think.

For something like ext4 where blocks are constantly overwritten I'd
think that poor alignment is going to really hurt your performance.
Btrfs might be somewhere in-between - it doesn't overwrite data in
place, but it does write all over the disk so it would be constantly
be hitting erase block borders if not aligned.  That is just a
hand-waving argument - I have no idea how they work in practice.

-- 
Rich

Re: [gentoo-user] Filesystem choice for NVMe SSD

Reply via email to