Dne 18. 07. 25 v 9:13 Henry, Andrew napsal(a):
I’m trying to improve the performance of a virtual machine guest running RHEL 
9.6.  I’ve assigned 3 separate SAS controllers to the VM, and 3 vmdk disks each 
connected to their own controller, so there is no kernel bottleneck in the VM 
on the controller side when writing through to each vmdk under high load.

In the guest, I’m creating an LV as follows:

lvcreate -n swap -i 3 -I 64k --type=striped -L 16G vg00
mkfs -t xfs -b size=4K -d su=64k,sw=3 -f /dev/vg00/swap

After mounting, I test write speeds using:

time dd if=/dev/zero of=/mnt/testfile1 bs=8192 count=65536 oflag=direct

536870912 bytes (537 MB, 512 MiB) copied, 41.2043 s, 13.0 MB/s
536870912 bytes (537 MB, 512 MiB) copied, 27.3986 s, 19.6 MB/s
536870912 bytes (537 MB, 512 MiB) copied, 22.8359 s, 23.5 MB/s

time dd if=/dev/zero of=/mnt/testfile bs=64k count=163840 oflag=direct

1073741824 bytes (1.1 GB, 1.0 GiB) copied, 10.7506 s, 99.9 MB/s  ## only did 
one test here

time dd if=/dev/zero of=/mnt/testfile bs=1M count=10240 oflag=direct

10737418240 bytes (11 GB, 10 GiB) copied, 19.158 s, 560 MB/s
10737418240 bytes (11 GB, 10 GiB) copied, 23.0384 s, 466 MB/s
10737418240 bytes (11 GB, 10 GiB) copied, 18.3788 s, 584 MB/s

Hi

This is kind of expected - the bigger the chunk is flushed at once to your drive - the better the performance gets.

Note - stripping is useful when you have relative 'slow' individual drive and you have very fast controllers capable of streaming 'parallel' IO operations.
This way you take the advantage.
If you have very fast storage - and you system can't gain any more by streaming in parallel - just stick with linearly joined storage....


2nd. part is - the faster the storage is - the bigger the individual strip may likely be needed to be better utilized - i.e. 64k can be too small - maybe use 256K -giving you total strip size of 3 * 256K = 768K - the there is alignment of writes - where optimal io size can be multiple of this number....

But you need to benchmark this - starting with 'raw' device.
Then add 'DM' layer. Benchmark again which stripe is best.
Then add filesystem on top - and see which stripe configuration again gives the best result with the stack. Each layer may 'drop' some amount of speed - but it should be fairly minimal when the right parameters are selected....

I monitor speeds to the individual discs and the LV itself using:

iostat -xdmt 2


My question(s):

When I create the LV using just one disk (linear lvm), I am seeing the same 
write speed results as above.  In iostat, I can see that eg. With 1MB block 
size in dd, it’s ~500MB/s to the dm-1 device and also ~500MB/s to the physical 
disk xvda.  When I create the LV as striped, I’m getting the same throughput on 
the dm-1 device (~500MB/s), but it is splitting that throughput evenly across 
all three PV’s, so about ~170MB/s to each xvd disk, totalling ~500MB/s at the 
LV side.  This was not my expectation.  I expected ~500MB/s to each device with 
striped LV, totalling ~1500MB/s on the dm-a device.  What am I missing?


Here we don't know your controller is even capable of handling parallel workload - i.e. if you just you 'dd' on storage itself - no lvm/dm involved - are you able to 'stream' 3 'dd' commands on each drive in paralel give you the total bandwidth of 1500MB/s - you need to check this bottleneck first.

Also note - there are 2 ways how to do striping - one is 'old dm' striped target - and other is slightly more modern --type raid0 striping - see whichever works better for you...


Second question is:

I’m testing database throughput.  I have the following setup:

DB  → 8KB block size
XFS  → 4KB block size limited by pagesize in Linux Kernel on x86 architecture.
LVM  →  64KB block size, and have also tested with 4KB, 8KB, 256KB, 512KB, 1MB, 
4MB
Xvd[abc]  →  4KB block size on these disks provided by the virtual host

The only part of that equation I can modify is the stripe size on the LVM.  
Irrespective of which stripe size I choose, I’m getting similar results with dd 
(and fio for that matter) when testing different block sizes with my write 
tests.  In other words, it doesn’t seem to make any difference whatsoever which 
stripe size I choose in the LVM.  Is this a limitation cause by my 
virtualisation layer?  Would I have seen different (better) results if I’d used 
3 physical disks connected to a physical host?

I suppose it’s normal to get low speeds with small block sizes and higher 
speeds with larger block sizes, but I’m still concerned that 8KB block sizes 
are writing to a fast disk system at only 13-20MB/s.  Is this common?


Lot's of small write will never be fast - and will be likely even smaller with striped storage. But you should start first figure where are the bottlenecks in your system.

Maybe you virtual system does have it self 'a single write pipe' - so any parallelization is stopped right there - so maybe you should start first with bare metal....


Regards

Zdenek



Reply via email to