On Sat, Sep 26, 2015 at 03:53:53PM +0200, Marc Lehmann wrote:
> On Sat, Sep 26, 2015 at 12:36:55AM -0700, Jaegeuk Kim <jaeg...@kernel.org> 
> wrote:
> > > Care to share why? :)
> > 
> > Mostly, in the flash storages, it is multiple 2MB normally. :)
> 
> Well, any value of -s gives me a multiple of 2MB, no? :)
> 
> > > Is there anysthing specially good for numbers of two? Or do you just want 
> > > top
> > > reduce the number of changed variables?
> > 
> > IMO, likewise flash storages, it needs to investigate the raw device
> > characteristics.
> 
> Keep in mind that I don't use it for flash, but smr drives.
> 
> We already know the raw device characteristics, basically, the zones are
> between 15 and 40 or so MB in size (on the seagate 8tb drive), and they
> likely don't have "even" sizes at all.
> 
> It's also by far not easy to benchmark these things, the disks can
> buffer up to 25GB of random writes (and then might need several hours of
> cleanup). Failing a linear write incurs a 0.6-1.6s penalty, to be paid
> much later. It's a shame that none of the drive companies actually release
> any usable info on their drives.
> 
> These guys made a hole into the disk and devised a lot of benchmarks to
> find out the characteristics of these drives.
> 
> https://www.usenix.org/system/files/conference/fast15/fast15-paper-aghayev.pdf
> 
> So, the strategy for a fs would be to write linearly, most of the time,
> without any gaps. f2fs (at least in 3.18.x) manages to do that very
> nicely, which is why I really try to get it working.
> 
> But for writing once, any value of -s would probably suffice. There are
> two problems when the disk gets full:
> 
> a) ipu writes. the drive can't do, so gc might be cheaper.
> b) reuse of sections - if sections are reasonably large, if one gets freed
> and reused, it should be large to guarantee large linear writes again.
> 
> b) is the reason behind me trying large values of -s.

Hmm. It seems that SMR has 20~25GB cache to absorb random writes with a big
block map. Then, it uses a static allocation, which is a kind of very early
stage of FTL shapes though.
Comparing to flash, it seems that SMR degrades the performance significantly
due to internal cleaning overhead, so I could understand that it needs to
control IO patterns very carefully.

So, how about testing -s20, which comes resasonble to me?

+ direct IO can break the alignment too.

> Since I know that f2fs is the only fs that I tested that can have a sustained
> write performance on these drives that is near the physical drive
> characteristics, all that needs to be done is to see how f2fs performs after
> it starts gc'ing.
> 
> That's why I am so interested in disk full conditions - writing the disk
> linearly once is easy, I can just write a tar to the device. Ensuring that
> writes are large linear after deleting and cleaning up is harder.
> 
> nilfs is a good example - it should fit smr drives perfectly, until they
> are nearly full, after which nilfs still matches smr drives perfectly,
> but waiting for 8TB to be shuffled around to delete some files can take
> days.  More surprising is that nilfs phenomenally fails with these drives,
> performance wise, for reaosns I haven't investigated (my guess is that
> nilfs leaves gaps).
> 
> > I think this can be used for SMR too.
> 
> You can run any blockdevice operation on these drives, but the results
> from flashbench will be close to meaningless for them. For example, you
> can't distinguish betwene a nonaligned write causing a read-modify write
> from an aligned large write, or a partial write, by access time, as they
> will probably all have similar access times.
> 
> > I think there might be some hints for section size at first and performance
> > variation as well.
> 
> I think you confuse these drives with flash drives - while they share some
> characteristics, they are completely unlike flash. There is no translation
> layer, there is no need for wear leveling, zones have widely varying
> sizes, appending can be expensive or cheap, depending on the write size.
> 
> What these drives need is primarily large linear writes without gaps, and
> secondarily any optimisations for rotational media apply. (And for that, f2fs
> performs unexpectedly good, given it wasn't meant for rotational media).
> 
> Now, if f2fs can be made to (mostly) work bug-free, but with the
> characteristics of 3.18.21, and the gc can ensure that reasonably big
> areas spanning multiple zones will be reused, then f2fs will be the _ONLY_ fs
> able to take care of drive managed smr disks efficiently.

Hmm. The f2fs has been deployed on smartphones for a couple of years so far.
The main stuffs here would be about tuning it with SMR drives.
It's the time for me to take a look at pretty big partitions. :)

Oh, anyway, have you tried just -s1 for fun?

Thanks,

> 
> Specifically, these filesystems do NOT work well with these drives:
> 
> nilfs, zfs, btrfs, ext4, xfs
> 
> And modifications for these filesystems are either far away in the
> future, or not targetted at drive managed disks (ext4 already has some
> modifications, but they are clearly not very suitable for actual drives,
> assuming these drives have a fast area near the start of the disk, which
> isn't the case). But these disks are not uncommon (seagate is shipping by
> the millions), and will stay with us for quite a while.
> 
> -- 
>                 The choice of a       Deliantra, the free code+content MORPG
>       -----==-     _GNU_              http://www.deliantra.net
>       ----==-- _       generation
>       ---==---(_)__  __ ____  __      Marc Lehmann
>       --==---/ / _ \/ // /\ \/ /      schm...@schmorp.de
>       -=====/_/_//_/\_,_/ /_/\_\

------------------------------------------------------------------------------
_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

Reply via email to