On Sat, Sep 26, 2015 at 12:36:55AM -0700, Jaegeuk Kim <jaeg...@kernel.org> 
wrote:
> > Care to share why? :)
> 
> Mostly, in the flash storages, it is multiple 2MB normally. :)

Well, any value of -s gives me a multiple of 2MB, no? :)

> > Is there anysthing specially good for numbers of two? Or do you just want 
> > top
> > reduce the number of changed variables?
> 
> IMO, likewise flash storages, it needs to investigate the raw device
> characteristics.

Keep in mind that I don't use it for flash, but smr drives.

We already know the raw device characteristics, basically, the zones are
between 15 and 40 or so MB in size (on the seagate 8tb drive), and they
likely don't have "even" sizes at all.

It's also by far not easy to benchmark these things, the disks can
buffer up to 25GB of random writes (and then might need several hours of
cleanup). Failing a linear write incurs a 0.6-1.6s penalty, to be paid
much later. It's a shame that none of the drive companies actually release
any usable info on their drives.

These guys made a hole into the disk and devised a lot of benchmarks to
find out the characteristics of these drives.

https://www.usenix.org/system/files/conference/fast15/fast15-paper-aghayev.pdf

So, the strategy for a fs would be to write linearly, most of the time,
without any gaps. f2fs (at least in 3.18.x) manages to do that very
nicely, which is why I really try to get it working.

But for writing once, any value of -s would probably suffice. There are
two problems when the disk gets full:

a) ipu writes. the drive can't do, so gc might be cheaper.
b) reuse of sections - if sections are reasonably large, if one gets freed
and reused, it should be large to guarantee large linear writes again.

b) is the reason behind me trying large values of -s.

Since I know that f2fs is the only fs that I tested that can have a sustained
write performance on these drives that is near the physical drive
characteristics, all that needs to be done is to see how f2fs performs after
it starts gc'ing.

That's why I am so interested in disk full conditions - writing the disk
linearly once is easy, I can just write a tar to the device. Ensuring that
writes are large linear after deleting and cleaning up is harder.

nilfs is a good example - it should fit smr drives perfectly, until they
are nearly full, after which nilfs still matches smr drives perfectly,
but waiting for 8TB to be shuffled around to delete some files can take
days.  More surprising is that nilfs phenomenally fails with these drives,
performance wise, for reaosns I haven't investigated (my guess is that
nilfs leaves gaps).

> I think this can be used for SMR too.

You can run any blockdevice operation on these drives, but the results
from flashbench will be close to meaningless for them. For example, you
can't distinguish betwene a nonaligned write causing a read-modify write
from an aligned large write, or a partial write, by access time, as they
will probably all have similar access times.

> I think there might be some hints for section size at first and performance
> variation as well.

I think you confuse these drives with flash drives - while they share some
characteristics, they are completely unlike flash. There is no translation
layer, there is no need for wear leveling, zones have widely varying
sizes, appending can be expensive or cheap, depending on the write size.

What these drives need is primarily large linear writes without gaps, and
secondarily any optimisations for rotational media apply. (And for that, f2fs
performs unexpectedly good, given it wasn't meant for rotational media).

Now, if f2fs can be made to (mostly) work bug-free, but with the
characteristics of 3.18.21, and the gc can ensure that reasonably big
areas spanning multiple zones will be reused, then f2fs will be the _ONLY_ fs
able to take care of drive managed smr disks efficiently.

Specifically, these filesystems do NOT work well with these drives:

nilfs, zfs, btrfs, ext4, xfs

And modifications for these filesystems are either far away in the
future, or not targetted at drive managed disks (ext4 already has some
modifications, but they are clearly not very suitable for actual drives,
assuming these drives have a fast area near the start of the disk, which
isn't the case). But these disks are not uncommon (seagate is shipping by
the millions), and will stay with us for quite a while.

-- 
                The choice of a       Deliantra, the free code+content MORPG
      -----==-     _GNU_              http://www.deliantra.net
      ----==-- _       generation
      ---==---(_)__  __ ____  __      Marc Lehmann
      --==---/ / _ \/ // /\ \/ /      schm...@schmorp.de
      -=====/_/_//_/\_,_/ /_/\_\

------------------------------------------------------------------------------
_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

Reply via email to