Re: Storage tuning

Dan Ritter Wed, 30 Jul 2025 05:45:15 -0700

Greg wrote: 
> Hi there,
> 
> I have a "server" running 24/7 with a lot of RAM. I would like to speed up
> disk system by giving much higher priority to reads and delaying writes.
> 
> YES, I KNOW THE RISK!
> 
> As I understand, there are two things to tune:
> 
> 1. I/O Scheduler. The default is mq-deadline. Let say I have md_raid array
> /dev/md0 consisting of /dev/sd[ab]. Should I keep it for both, md0 and
> physical disks? Which I/O scheduler parameters should I change in case of
> md0 and which in case of sd[ab]?
> 
> 2. Kernel runtime parameters. As I understand I should focus on vm.dirty_*
> parameters. My Idea is to set vm.dirty_ratio=70 and
> vm.dirty_expire_centisecs to something like 10 to 60min. Should I change
> anything else?
> 
> PS. Among others, I'm trying to learn something about Linux caching. So
> please stick to above questions.


The first thing you should do is establish whether you have a
problem.

cat /proc/vmstat | egrep "dirty|writeback"

nr_dirty is the number of pages waiting to be written, which
will be written out when vm.dirty_expire_centisecs goes off.

If this number is low while your server is actually doing
things, then there is no point in trying to delay writes further
- it is not pushing enough data to disk to be worthwhile.

What's low? Well, a page is typically 4KB, so anything that is
less than a tenth of a second of writing is going to be fast. If
you have a RAID that can write at 150MB/s (a typical speed for a
single rotating disk) then less than 15MB is negligible. That
would be an nr_dirty around 1000.

If you are using a SATA SSD, a write speed of 500MB/s is a good
assumption, so an nr_dirty exceeding 6000 would hit a 0.1 second
threshold. If you are using PCIe 4 NVMe SSDs, 1-2 GB/s is
plausible, and the nr_dirty would have to be 12000 to 25000.

All this assumes a fairly linear write pattern. If you are using
rotating disks and all your writes are small and to random
locations - the worst case - then an nr_dirty of 100 might be
interesting.

If you want to gather stats for a longer period of time while
you run a typical workload, the command you want is called sar.

First establish that you have a problem.

-dsr-

Re: Storage tuning

Reply via email to