On Tue, Jun 16, 2015 at 09:06:56AM +0200, Ingvar Bogdahn wrote:
> Hi again,
> 
> Benchmarking over time seems a good idea, but what if I see that a
> particular database does indeed degrade in performance? How can I
> then selectively improve performance for that file, since disabling
> cow only works for new empty files?

# mv file file.bak
# touch file
# chattr +C file
# cat file.bak >file
# rm file.bak

> Is it correct that bundling small random writes into groups of
> writes reduces fragmentation? If so, some form of write-caching
> should help?

   No, that's unlikely to help -- you're still fragmenting the
original file. Imagine a disk with a file (AAAABAAAACAAAADAAAAEAAAA)
on it, and the blocks B, C, D, E being modified. On the disk, you
might then end up with:

...AAAA.AAAA.AAAA.AAAA.AAAA.......................EDCB....

   Reading this file sequentially is going to involve 8 long seeks,
which is the fundamental problem with fragmentation. This kind of
behaviour in a CoW filesystem is inevitable; the main question is how
to minimise it.

   autodefrag, as I understand it, looks for high levels of
fragmentation in the few blocks near a pending write, and reads and
rewrites all of those blocks in one go. (I haven't read the code -- this
is based on my understanding of some passing remarks from josef on IRC
a while ago, so I might well be mischaracterising it).

> I'm still investigating, but one solution might be:
> 1) identify which exact tables do have frequent writes
> 2) decrease the system-wide write-caching (vm.dirty_background_ratio
> and vm.dirty_ratio) to lower levels, because this wastes lots of RAM
> by indiscriminately caching writes of the whole system, and tends to
> causes spikes where suddenly the entire cache gets written to disk
> and block the system. Rather use that RAM selectively to cache only
> the critical files.
> 4) create a software RAID-1 made up of a ramdisk and a mounted
> image, using mdadm.
> 5) Setting up mdadm using rather large value for "write-behind="
> 6) put only those tables on that disk-backed ramdisk which do have
> frequent writes.
> 
> What do you think?

   Benchmark it. Also test it for reliability when you pull the power
out in the middle of a bunch of writes -- you're caching so much in an
ad-hoc manner that I think you're unlikely to be achieving the D part
of ACID.

   Hugo.

> Ingvar
> 
> 
> 
> Am 15.06.15 um 11:57 schrieb Hugo Mills:
> >On Mon, Jun 15, 2015 at 11:34:35AM +0200, Ingvar Bogdahn wrote:
> >>Hello there,
> >>
> >>I'm planing to use btrfs for a medium-sized webserver. It is
> >>commonly recommended to set nodatacow for database files to avoid
> >>performance degradation. However, apparently nodatacow disables some
> >>of my main motivations of using btrfs : checksumming and (probably)
> >>incremental backups with send/receive (please correct me if I'm
> >>wrong on this). Also, the databases are among the most important
> >>data on my webserver, so it is particularly there that I would like
> >>those feature working.
> >>
> >>My question is, are there strategies to avoid nodatacow of databases
> >>that are suitable and safe in a production server?
> >>I thought about the following:
> >>- in mysql/mariadb: setting "innodb_file_per_table" should avoid
> >>having few very big database files.
> >    It's not so much about the overall size of the files, but about the
> >write patterns, so this probably won't be useful.
> >
> >>- in mysql/mariadb: adapting database schema to store blobs into
> >>dedicated tables.
> >    Probably not an issue -- each BLOB is (likely) to be written in a
> >single unit, which won't cause the fragmentation problems.
> >
> >>- btrfs: set autodefrag or some cron job to regularly defrag only
> >>database fails to avoid performance degradation due to fragmentation
> >    Autodefrag is a good idea, and I would suggest trying that first,
> >before anything else, to see if it gives you good enough performance
> >over time.
> >
> >    Running an explicit defrag will break any CoW copies you have (like
> >snapshots), causing them to take up additional space. For example,
> >start with a 10 GB subvolume. Snapshot it, and you will still only
> >have 10 GB of disk usage. Defrag one (or both) copies, and you'll
> >suddenly be using 20 GB.
> >
> >>- turn on compression on either btrfs or mariadb
> >    Again, won't help. The issue is not the size of the data, it's the
> >write patterns: small random writes into the middle of existing files
> >will eventually cause those files to fragment, which causes lots of
> >seeks and short reads, which degrades performance.
> >
> >>Is this likely to give me ok-ish performance? What other
> >>possibilities are there?
> >    I would recommend benchmarking over time with your workloads, and
> >seeing how your performance degrades.
> >
> >    Hugo.
> >
> 

-- 
Hugo Mills             | emacs: Eighty Megabytes And Constantly Swapping.
hugo@... carfax.org.uk |
http://carfax.org.uk/  |
PGP: E2AB1DE4          |

Attachment: signature.asc
Description: Digital signature

Reply via email to