Dne 18. 07. 25 v 9:13 Henry, Andrew napsal(a):
I’m trying to improve the performance of a virtual machine guest running RHEL
9.6. I’ve assigned 3 separate SAS controllers to the VM, and 3 vmdk disks each
connected to their own controller, so there is no kernel bottleneck in the VM
on the controller side when writing through to each vmdk under high load.
In the guest, I’m creating an LV as follows:
lvcreate -n swap -i 3 -I 64k --type=striped -L 16G vg00
mkfs -t xfs -b size=4K -d su=64k,sw=3 -f /dev/vg00/swap
After mounting, I test write speeds using:
time dd if=/dev/zero of=/mnt/testfile1 bs=8192 count=65536 oflag=direct
536870912 bytes (537 MB, 512 MiB) copied, 41.2043 s, 13.0 MB/s
536870912 bytes (537 MB, 512 MiB) copied, 27.3986 s, 19.6 MB/s
536870912 bytes (537 MB, 512 MiB) copied, 22.8359 s, 23.5 MB/s
time dd if=/dev/zero of=/mnt/testfile bs=64k count=163840 oflag=direct
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 10.7506 s, 99.9 MB/s ## only did
one test here
time dd if=/dev/zero of=/mnt/testfile bs=1M count=10240 oflag=direct
10737418240 bytes (11 GB, 10 GiB) copied, 19.158 s, 560 MB/s
10737418240 bytes (11 GB, 10 GiB) copied, 23.0384 s, 466 MB/s
10737418240 bytes (11 GB, 10 GiB) copied, 18.3788 s, 584 MB/s
Hi
This is kind of expected - the bigger the chunk is flushed at once to your
drive - the better the performance gets.
Note - stripping is useful when you have relative 'slow' individual drive and
you have very fast controllers capable of streaming 'parallel' IO operations.
This way you take the advantage.
If you have very fast storage - and you system can't gain any more by
streaming in parallel - just stick with linearly joined storage....
2nd. part is - the faster the storage is - the bigger the individual strip may
likely be needed to be better utilized - i.e. 64k can be too small - maybe use
256K -giving you total strip size of 3 * 256K = 768K - the there is
alignment of writes - where optimal io size can be multiple of this number....
But you need to benchmark this - starting with 'raw' device.
Then add 'DM' layer. Benchmark again which stripe is best.
Then add filesystem on top - and see which stripe configuration again gives
the best result with the stack.
Each layer may 'drop' some amount of speed - but it should be fairly minimal
when the right parameters are selected....
I monitor speeds to the individual discs and the LV itself using:
iostat -xdmt 2
My question(s):
When I create the LV using just one disk (linear lvm), I am seeing the same
write speed results as above. In iostat, I can see that eg. With 1MB block
size in dd, it’s ~500MB/s to the dm-1 device and also ~500MB/s to the physical
disk xvda. When I create the LV as striped, I’m getting the same throughput on
the dm-1 device (~500MB/s), but it is splitting that throughput evenly across
all three PV’s, so about ~170MB/s to each xvd disk, totalling ~500MB/s at the
LV side. This was not my expectation. I expected ~500MB/s to each device with
striped LV, totalling ~1500MB/s on the dm-a device. What am I missing?
Here we don't know your controller is even capable of handling parallel
workload - i.e. if you just you 'dd' on storage itself - no lvm/dm involved -
are you able to 'stream' 3 'dd' commands on each drive in paralel give you
the total bandwidth of 1500MB/s - you need to check this bottleneck first.
Also note - there are 2 ways how to do striping - one is 'old dm' striped
target - and other is slightly more modern --type raid0 striping - see
whichever works better for you...
Second question is:
I’m testing database throughput. I have the following setup:
DB → 8KB block size
XFS → 4KB block size limited by pagesize in Linux Kernel on x86 architecture.
LVM → 64KB block size, and have also tested with 4KB, 8KB, 256KB, 512KB, 1MB,
4MB
Xvd[abc] → 4KB block size on these disks provided by the virtual host
The only part of that equation I can modify is the stripe size on the LVM.
Irrespective of which stripe size I choose, I’m getting similar results with dd
(and fio for that matter) when testing different block sizes with my write
tests. In other words, it doesn’t seem to make any difference whatsoever which
stripe size I choose in the LVM. Is this a limitation cause by my
virtualisation layer? Would I have seen different (better) results if I’d used
3 physical disks connected to a physical host?
I suppose it’s normal to get low speeds with small block sizes and higher
speeds with larger block sizes, but I’m still concerned that 8KB block sizes
are writing to a fast disk system at only 13-20MB/s. Is this common?
Lot's of small write will never be fast - and will be likely even smaller with
striped storage. But you should start first figure where are the bottlenecks
in your system.
Maybe you virtual system does have it self 'a single write pipe' - so any
parallelization is stopped right there - so maybe you should start first with
bare metal....
Regards
Zdenek