On 2015-11-22 20:43, Mitch Fossen wrote:
Based on this description, this sounds to me like an issue with fragmentation.Hi all,I have a btrfs setup of 4x2TB HDDs for /home in btrfs RAID0 on Ubuntu 15.10 (kernel 4.2) and btrfs-progs 4.3.1. Root is on a separate SSD also running btrfs. About 6 people use it via ssh and run simulations. One of these simulations generates a lot of intermediate data that can be discarded after it is run, it usually ends up being around 100GB to 300GB spread across dozens of files 500M to 5GB apiece. The problem is that, when it comes time to do a "rm -rf ~/working_directory" the entire machine locks up and sporadically allows other IO requests to go through, with a 5 to 10 minute delay before other requests seem to be served. It can end up taking half an hour or more to fully remove the offending directory, with the hangs happening frequently enough to be frustrating. This didn't seem to happen when the system was using ext4 on LVM.
This depends on a number of factors. If you are on a new enough kernel, you may actually be using the blk-mq code instead of one of the traditional I/O schedulers, which does honor ionice values, and is generally a lot better than CFQ or deadline at actual fairness and performance. If you aren't running on that code path, then whether deadline or CFQ is better is pretty hard to determine. In general, CFQ needs some serious effort and benchmarking to get reasonable performance out of it. CFQ can beat deadline in performance when properly tuned to the workload (except if you have really small rotational media (smaller than 32G or so), or if you absolutely need deterministic scheduling), but when you don't take the time to tune CFQ, deadline is usually better (except on SSD's, where CFQ is generally better than deadline even without performance tuning).Is there a way to fix this performance issue or at least mitigate it? Would using ionice and the CFQ scheduler help? As far as I know Ubuntu uses deadline by default which ignores ionice values.
Balance is not likely to help much, but defragmentation might. I would suggest running the defrag when nobody has any other data on the filesystem, as it will likely cause a severe drop in performance the first time it's run. Autodefrag might help, but it may also make performance worse while writing the files in the first place. You might also try with compress=none, depending on your storage hardware, using in-line compression can actually make things go significantly slower (I see this a lot with SSD's, and also with some high-end storage controllers, and especially when dealing with large data-sets that aren't very compressible).Alternatively, would balancing and defragging data more often help? The current mount options are compress=lzo and space_cache, and I will try it with autodefrag enabled as well to see if that helps.
As Duncan said, this is probably the best option short term. It is worth noting however that removing a subvolume still has some overhead (which appears to scale linearly with the amount of data in the subvolume). This overhead isn't likely to be an issue however unless a bunch of subvolumes get removed in bulk however.For now I think I'll recommend that everyone use subvolumes for these runs and then enable user_subvol_rm_allowed.
smime.p7s
Description: S/MIME Cryptographic Signature