On Sat, Sep 26, 2015 at 12:36:55AM -0700, Jaegeuk Kim <jaeg...@kernel.org> wrote: > > Care to share why? :) > > Mostly, in the flash storages, it is multiple 2MB normally. :)
Well, any value of -s gives me a multiple of 2MB, no? :) > > Is there anysthing specially good for numbers of two? Or do you just want > > top > > reduce the number of changed variables? > > IMO, likewise flash storages, it needs to investigate the raw device > characteristics. Keep in mind that I don't use it for flash, but smr drives. We already know the raw device characteristics, basically, the zones are between 15 and 40 or so MB in size (on the seagate 8tb drive), and they likely don't have "even" sizes at all. It's also by far not easy to benchmark these things, the disks can buffer up to 25GB of random writes (and then might need several hours of cleanup). Failing a linear write incurs a 0.6-1.6s penalty, to be paid much later. It's a shame that none of the drive companies actually release any usable info on their drives. These guys made a hole into the disk and devised a lot of benchmarks to find out the characteristics of these drives. https://www.usenix.org/system/files/conference/fast15/fast15-paper-aghayev.pdf So, the strategy for a fs would be to write linearly, most of the time, without any gaps. f2fs (at least in 3.18.x) manages to do that very nicely, which is why I really try to get it working. But for writing once, any value of -s would probably suffice. There are two problems when the disk gets full: a) ipu writes. the drive can't do, so gc might be cheaper. b) reuse of sections - if sections are reasonably large, if one gets freed and reused, it should be large to guarantee large linear writes again. b) is the reason behind me trying large values of -s. Since I know that f2fs is the only fs that I tested that can have a sustained write performance on these drives that is near the physical drive characteristics, all that needs to be done is to see how f2fs performs after it starts gc'ing. That's why I am so interested in disk full conditions - writing the disk linearly once is easy, I can just write a tar to the device. Ensuring that writes are large linear after deleting and cleaning up is harder. nilfs is a good example - it should fit smr drives perfectly, until they are nearly full, after which nilfs still matches smr drives perfectly, but waiting for 8TB to be shuffled around to delete some files can take days. More surprising is that nilfs phenomenally fails with these drives, performance wise, for reaosns I haven't investigated (my guess is that nilfs leaves gaps). > I think this can be used for SMR too. You can run any blockdevice operation on these drives, but the results from flashbench will be close to meaningless for them. For example, you can't distinguish betwene a nonaligned write causing a read-modify write from an aligned large write, or a partial write, by access time, as they will probably all have similar access times. > I think there might be some hints for section size at first and performance > variation as well. I think you confuse these drives with flash drives - while they share some characteristics, they are completely unlike flash. There is no translation layer, there is no need for wear leveling, zones have widely varying sizes, appending can be expensive or cheap, depending on the write size. What these drives need is primarily large linear writes without gaps, and secondarily any optimisations for rotational media apply. (And for that, f2fs performs unexpectedly good, given it wasn't meant for rotational media). Now, if f2fs can be made to (mostly) work bug-free, but with the characteristics of 3.18.21, and the gc can ensure that reasonably big areas spanning multiple zones will be reused, then f2fs will be the _ONLY_ fs able to take care of drive managed smr disks efficiently. Specifically, these filesystems do NOT work well with these drives: nilfs, zfs, btrfs, ext4, xfs And modifications for these filesystems are either far away in the future, or not targetted at drive managed disks (ext4 already has some modifications, but they are clearly not very suitable for actual drives, assuming these drives have a fast area near the start of the disk, which isn't the case). But these disks are not uncommon (seagate is shipping by the millions), and will stay with us for quite a while. -- The choice of a Deliantra, the free code+content MORPG -----==- _GNU_ http://www.deliantra.net ----==-- _ generation ---==---(_)__ __ ____ __ Marc Lehmann --==---/ / _ \/ // /\ \/ / schm...@schmorp.de -=====/_/_//_/\_,_/ /_/\_\ ------------------------------------------------------------------------------ _______________________________________________ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel