On 2015-11-22 20:43, Mitch Fossen wrote:
Hi all,

I have a btrfs setup of 4x2TB HDDs for /home in btrfs RAID0 on Ubuntu
15.10 (kernel 4.2) and btrfs-progs 4.3.1. Root is on a separate SSD
also running btrfs.

About 6 people use it via ssh and run simulations. One of these
simulations generates a lot of intermediate data that can be discarded
after it is run, it usually ends up being around 100GB to 300GB spread
across dozens of files 500M to 5GB apiece.

The problem is that, when it comes time to do a "rm -rf
~/working_directory" the entire machine locks up and sporadically
allows other IO requests to go through, with a 5 to 10 minute delay
before other requests seem to be served. It can end up taking half an
hour or more to fully remove the offending directory, with the hangs
happening frequently enough to be frustrating. This didn't seem to
happen when the system was using ext4 on LVM.
Based on this description, this sounds to me like an issue with fragmentation.

Is there a way to fix this performance issue or at least mitigate it?
Would using ionice and the CFQ scheduler help? As far as I know Ubuntu
uses deadline by default which ignores ionice values.
This depends on a number of factors. If you are on a new enough kernel, you may actually be using the blk-mq code instead of one of the traditional I/O schedulers, which does honor ionice values, and is generally a lot better than CFQ or deadline at actual fairness and performance. If you aren't running on that code path, then whether deadline or CFQ is better is pretty hard to determine. In general, CFQ needs some serious effort and benchmarking to get reasonable performance out of it. CFQ can beat deadline in performance when properly tuned to the workload (except if you have really small rotational media (smaller than 32G or so), or if you absolutely need deterministic scheduling), but when you don't take the time to tune CFQ, deadline is usually better (except on SSD's, where CFQ is generally better than deadline even without performance tuning).

Alternatively, would balancing and defragging data more often help?
The current mount options are compress=lzo and space_cache, and I will
try it with autodefrag enabled as well to see if that helps.
Balance is not likely to help much, but defragmentation might. I would suggest running the defrag when nobody has any other data on the filesystem, as it will likely cause a severe drop in performance the first time it's run. Autodefrag might help, but it may also make performance worse while writing the files in the first place. You might also try with compress=none, depending on your storage hardware, using in-line compression can actually make things go significantly slower (I see this a lot with SSD's, and also with some high-end storage controllers, and especially when dealing with large data-sets that aren't very compressible).

For now I think I'll recommend that everyone use subvolumes for these
runs and then enable user_subvol_rm_allowed.
As Duncan said, this is probably the best option short term. It is worth noting however that removing a subvolume still has some overhead (which appears to scale linearly with the amount of data in the subvolume). This overhead isn't likely to be an issue however unless a bunch of subvolumes get removed in bulk however.


Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply via email to