On 2/28/24 13:31, Vitaliy Gusev wrote:
Hi, Matthew.
HI Vitaliy,
Thanks for the pointers.
I still do not know what command line was used for bhyve. I couldn't
find it through the thread, sorry. And I couldn't find virtual disk
size that you used.
Sorry about that. I'll try to get you the exact command line invocation
used to launch the guest process once I have test hardware again.
Could you, please, simplify bonnie++ output, it is hard to decode due
to alignment and use exact numbers for:
READ seq - I see you had 1.6GB/s for the good time and ~500MB/s for
the worst.
WRITE seq - ...
I summarized the output for you. Here it is again:
Fast: ~ 1.6g/s seq write and 1.3g/s seq read
Slow: ~ 451m/s seq write and 402m/s seq read
If you have slow results both for the read and write operations, you
probably should perform testing _only_ for READs and do not do
anything until READs are fine.
Again, if you have slow performance for Ext4 Filesystem in guest VM
placed on the passed disk image, you should try to test on the raw
disk image, i.e. without Ext4, because it could be related.
If you run test inside VM on a filesystem, you can have deal with
filesystem bottlenecks, bugs, fragmentation etc. Do you want to fix
them all? I don’t think so.
For example, if you pass disk image 40G and create Ext4 filesystem,
and during testing the filesystem becomes full over 80%, I/O could be
performed not so fine.
You probably should eliminate that guest filesystem behaviour when you
meet IO performance slowdown.
Also, please look at the TRIM operations when you perform WRITE
testing. It could be also related to the slow write I/O.
The virtual disks were provisioned with either a 128G disk image or a
1TB raw partition, so I don't think space was an issue.
Trim is definitely not an issue. I'm using a tiny fraction of the 32TB
array have tried both heavily under-provisioned HW RAID10 and SW RAID10
using GEOM. The latter was tested after sending full trim resets to all
drives individually.
I will try to incorporate the rest of your feedback into my next round
of testing. If I can find a benchmark tool that works with a raw block
device, that would be ideal.
Thanks,
-Matthew
——
Vitaliy
On 28 Feb 2024, at 21:29, Matthew Grooms <mgro...@shrew.net> wrote:
On 2/27/24 04:21, Vitaliy Gusev wrote:
Hi,
On 23 Feb 2024, at 18:37, Matthew Grooms <mgro...@shrew.net> wrote:
...
The problem occurs when an image file is used on either ZFS or UFS.
The problem also occurs when the virtual disk is backed by a raw
disk partition or a ZVOL. This issue isn't related to a specific
underlying filesystem.
Do I understand right, you ran testing inside VM inside guest VM on
ext4 filesystem? If so you should be aware about additional overhead
in comparison when you were running tests on the hosts.
Hi Vitaliy,
I appreciate you providing the feedback and suggestions. I spent over
a week trying as many combinations of host and guest options as
possible to narrow this issue down to a specific host storage or a
guest device model option. Unfortunately the problem occurred with
every combination I tested while running Linux as the guest. Note, I
only tested RHEL8 & RHEL9 compatible distributions ( Alma & Rocky ).
The problem did not occur when I ran FreeBSD as the guest. The
problem did not occur when I ran KVM in the host and Linux as the guest.
I would suggest to run fio (or even dd) on raw disk device inside
VM, i.e. without filesystem at all. Just do not forget do “echo 3 >
/proc/sys/vm/drop_caches” in Linux Guest VM before you run tests.
The two servers I was using to test with are are no longer available.
However, I'll have two more identical servers arriving in the next
week or so. I'll try to run additional tests and report back here. I
used bonnie++ as that was easily installed from the package repos on
all the systems I tested.
Could you also give more information about:
1. What results did you get (decode bonnie++ output)?
If you look back at this email thread, there are many examples of
running bonnie++ on the guest. I first ran the tests on the host
system using Linux + ext4 and FreeBSD 14 + UFS & ZFS to get a
baseline of performance. Then I ran bonnie++ tests using bhyve as the
hypervisor and Linux & FreeBSD as the guest. The combination of host
and guest storage options included ...
1) block device + virtio blk
2) block device + nvme
3) UFS disk image + virtio blk
4) UFS disk image + nvme
5) ZFS disk image + virtio blk
6) ZFS disk image + nvme
7) ZVOL + virtio blk
8) ZVOL + nvme
In every instance, I observed the Linux guest disk IO often perform
very well for some time after the guest was first booted. Then the
performance of the guest would drop to a fraction of the original
performance. The benchmark test was run every 5 or 10 minutes in a
cron job. Sometimes the guest would perform well for up to an hour
before performance would drop off. Most of the time it would only
perform well for a few cycles ( 10 - 30 mins ) before performance
would drop off. The only way to restore the performance was to reboot
the guest. Once I determined that the problem was not specific to a
particular host or guest storage option, I switched my testing to
only use a block device as backing storage on the host to avoid
hitting any system disk caches.
Here is the test script I used in the cron job ...
#!/bin/sh
FNAME='output.txt'
echo
================================================================================
>> $FNAME
echo Begin @ `/usr/bin/date` >> $FNAME
echo >> $FNAME
/usr/sbin/bonnie++ 2>&1 | /usr/bin/grep -v 'done\|,' >> $FNAME
echo >> $FNAME
echo End @ `/usr/bin/date` >> $FNAME
As you can see, I'm calling bonnie++ with the system defaults. That
uses a data set size that's 2x the guest RAM in an attempt to
minimize the effect of filesystem cache on results. Here is an
example of the output that bonnie++ produces ...
Version 2.00 ------Sequential Output------ --Sequential Input-
--Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
--Seeks--
Name:Size etc /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
/sec %CP
linux-blk 63640M 694k 99 1.6g 99 737m 76 985k 99 1.3g 69
+++++ +++
Latency 11579us 535us 11889us 8597us 21819us
8238us
Version 2.00 ------Sequential Create------ --------Random
Create--------
linux-blk -Create-- --Read--- -Delete-- -Create-- --Read---
-Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
/sec %CP
16 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++
+++++ +++
Latency 7620us 126us 1648us 151us 15us
633us
--------------------------------- speed drop
---------------------------------
Version 2.00 ------Sequential Output------ --Sequential Input-
--Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
--Seeks--
Name:Size etc /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
/sec %CP
linux-blk 63640M 676k 99 451m 99 314m 93 951k 99 402m 99
15167 530
Latency 11902us 8959us 24711us 10185us 20884us
5831us
Version 2.00 ------Sequential Create------ --------Random
Create--------
linux-blk -Create-- --Read--- -Delete-- -Create-- --Read---
-Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
/sec %CP
16 0 96 +++++ +++ +++++ +++ 0 96 +++++
+++ 0 75
Latency 343us 165us 1636us 113us 55us
1836us
In the example above, the benchmark test repeated about 20 times with
results that were similar to the performance shown above the dotted
line ( ~ 1.6g/s seq write and 1.3g/s seq read ). After that, the
performance dropped to what's shown below the dotted line which is
less than 1/4 the original speed ( ~ 451m/s seq write and 402m/s seq
read ).
2. What results expecting?
What I expect is that, when I perform the same test with the same
parameters, the results would stay more or less consistent over
time. This is true when KVM is used as the hypervisor on the same
hardware and guest options. That said, I'm not worried about bhyve
being consistently slower than kvm or a FreeBSD guest being
consistently slower than a Linux guest. I'm concerned that the
performance drop over time is indicative of an issue with how bhyve
interacts with non-freebsd guests.
3. VM configuration, virtio-blk disk size, etc.
4. Full command for tests (including size of test-set), bhyve, etc.
I believe this was answered above. Please let me know if you have
additional questions.
5. Did you pass virtio-blk as 512 or 4K ? If 512, probably you
should try 4K.
The testing performed was not exclusively with virtio-blk.
6. Linux has several read-ahead options for IO schedule, and it
could be related too.
I suppose it's possible that bhyve could be somehow causing the disk
scheduler in the Linux guest to act differently. I'll see if I can
figure out how to disable that in future tests.
Additionally could also you play with “sync=disabled” volume/zvol
option? Of course it is only for write testing.
The testing performed was not exclusively with zvols.
Once I have more hardware available, I'll try to report back with
more testing. It may be interesting to also see how a Windows guest
performs compared to Linux & FreeBSD. I suspect that this issue may
only be triggered when a fast disk array is in use on the host. My
tests use a 16x SSD RAID 10 array. It's also quite possible that the
disk IO slowdown is only a symptom of another issue that's triggered
by the disk IO test ( please see end of my last post related to
scheduler priority observations ). All I can say for sure is that ...
1) There is a problem and it's reproducible across multiple hosts
2) It affects RHEL8 & RHEL9 guests but not FreeBSD guests
3) It is not specific to any host or guest storage option
Thanks,
-Matthew