On Tue, Jun 16, 2015 at 09:06:56AM +0200, Ingvar Bogdahn wrote: > Hi again, > > Benchmarking over time seems a good idea, but what if I see that a > particular database does indeed degrade in performance? How can I > then selectively improve performance for that file, since disabling > cow only works for new empty files?
# mv file file.bak # touch file # chattr +C file # cat file.bak >file # rm file.bak > Is it correct that bundling small random writes into groups of > writes reduces fragmentation? If so, some form of write-caching > should help? No, that's unlikely to help -- you're still fragmenting the original file. Imagine a disk with a file (AAAABAAAACAAAADAAAAEAAAA) on it, and the blocks B, C, D, E being modified. On the disk, you might then end up with: ...AAAA.AAAA.AAAA.AAAA.AAAA.......................EDCB.... Reading this file sequentially is going to involve 8 long seeks, which is the fundamental problem with fragmentation. This kind of behaviour in a CoW filesystem is inevitable; the main question is how to minimise it. autodefrag, as I understand it, looks for high levels of fragmentation in the few blocks near a pending write, and reads and rewrites all of those blocks in one go. (I haven't read the code -- this is based on my understanding of some passing remarks from josef on IRC a while ago, so I might well be mischaracterising it). > I'm still investigating, but one solution might be: > 1) identify which exact tables do have frequent writes > 2) decrease the system-wide write-caching (vm.dirty_background_ratio > and vm.dirty_ratio) to lower levels, because this wastes lots of RAM > by indiscriminately caching writes of the whole system, and tends to > causes spikes where suddenly the entire cache gets written to disk > and block the system. Rather use that RAM selectively to cache only > the critical files. > 4) create a software RAID-1 made up of a ramdisk and a mounted > image, using mdadm. > 5) Setting up mdadm using rather large value for "write-behind=" > 6) put only those tables on that disk-backed ramdisk which do have > frequent writes. > > What do you think? Benchmark it. Also test it for reliability when you pull the power out in the middle of a bunch of writes -- you're caching so much in an ad-hoc manner that I think you're unlikely to be achieving the D part of ACID. Hugo. > Ingvar > > > > Am 15.06.15 um 11:57 schrieb Hugo Mills: > >On Mon, Jun 15, 2015 at 11:34:35AM +0200, Ingvar Bogdahn wrote: > >>Hello there, > >> > >>I'm planing to use btrfs for a medium-sized webserver. It is > >>commonly recommended to set nodatacow for database files to avoid > >>performance degradation. However, apparently nodatacow disables some > >>of my main motivations of using btrfs : checksumming and (probably) > >>incremental backups with send/receive (please correct me if I'm > >>wrong on this). Also, the databases are among the most important > >>data on my webserver, so it is particularly there that I would like > >>those feature working. > >> > >>My question is, are there strategies to avoid nodatacow of databases > >>that are suitable and safe in a production server? > >>I thought about the following: > >>- in mysql/mariadb: setting "innodb_file_per_table" should avoid > >>having few very big database files. > > It's not so much about the overall size of the files, but about the > >write patterns, so this probably won't be useful. > > > >>- in mysql/mariadb: adapting database schema to store blobs into > >>dedicated tables. > > Probably not an issue -- each BLOB is (likely) to be written in a > >single unit, which won't cause the fragmentation problems. > > > >>- btrfs: set autodefrag or some cron job to regularly defrag only > >>database fails to avoid performance degradation due to fragmentation > > Autodefrag is a good idea, and I would suggest trying that first, > >before anything else, to see if it gives you good enough performance > >over time. > > > > Running an explicit defrag will break any CoW copies you have (like > >snapshots), causing them to take up additional space. For example, > >start with a 10 GB subvolume. Snapshot it, and you will still only > >have 10 GB of disk usage. Defrag one (or both) copies, and you'll > >suddenly be using 20 GB. > > > >>- turn on compression on either btrfs or mariadb > > Again, won't help. The issue is not the size of the data, it's the > >write patterns: small random writes into the middle of existing files > >will eventually cause those files to fragment, which causes lots of > >seeks and short reads, which degrades performance. > > > >>Is this likely to give me ok-ish performance? What other > >>possibilities are there? > > I would recommend benchmarking over time with your workloads, and > >seeing how your performance degrades. > > > > Hugo. > > > -- Hugo Mills | emacs: Eighty Megabytes And Constantly Swapping. hugo@... carfax.org.uk | http://carfax.org.uk/ | PGP: E2AB1DE4 |
signature.asc
Description: Digital signature
