Re: bhyve disk performance issue

Matthew Grooms Wed, 28 Feb 2024 12:03:16 -0800

On 2/28/24 13:31, Vitaliy Gusev wrote:

Hi,  Matthew.

HI Vitaliy,

Thanks for the pointers.

I still do not know what command line was used for bhyve. I couldn'tfind it through the thread, sorry. And I couldn't find virtual disksize that you used.

Sorry about that. I'll try to get you the exact command line invocationused to launch the guest process once I have test hardware again.

Could you, please, simplify bonnie++ output, it is hard to decode dueto alignment and use exact numbers for:
READ seq - I see you had 1.6GB/s for the good time and ~500MB/s forthe worst.
WRITE seq  - ...

I summarized the output for you. Here it is again:

Fast: ~ 1.6g/s seq write and 1.3g/s seq read
Slow: ~ 451m/s seq write and 402m/s seq read

If you have slow results both for the read and write operations, youprobably should perform testing _only_ for READs and do not doanything until READs are fine.
Again, if you have slow performance for Ext4 Filesystem in guest VMplaced on the passed disk image, you should try to test on the rawdisk image, i.e. without Ext4, because it could be related.
If you run test inside VM on a filesystem, you can have deal withfilesystem bottlenecks, bugs, fragmentation etc. Do you want to fixthem all? I don’t think so.
For example, if you pass disk image 40G and create Ext4 filesystem,and during testing the filesystem becomes full over 80%, I/O could beperformed not so fine.
You probably should eliminate that guest filesystem behaviour when youmeet IO performance slowdown.
Also, please look at the TRIM operations when you perform WRITEtesting. It could be also related to the slow write I/O.

The virtual disks were provisioned with either a 128G disk image or a1TB raw partition, so I don't think space was an issue.

Trim is definitely not an issue. I'm using a tiny fraction of the 32TBarray have tried both heavily under-provisioned HW RAID10 and SW RAID10using GEOM. The latter was tested after sending full trim resets to alldrives individually.

I will try to incorporate the rest of your feedback into my next roundof testing. If I can find a benchmark tool that works with a raw blockdevice, that would be ideal.


Thanks,

-Matthew

——
Vitaliy
On 28 Feb 2024, at 21:29, Matthew Grooms <mgro...@shrew.net> wrote:

On 2/27/24 04:21, Vitaliy Gusev wrote:
Hi,
On 23 Feb 2024, at 18:37, Matthew Grooms <mgro...@shrew.net> wrote:
...
The problem occurs when an image file is used on either ZFS or UFS.The problem also occurs when the virtual disk is backed by a rawdisk partition or a ZVOL. This issue isn't related to a specificunderlying filesystem.
Do I understand right, you ran testing inside VM inside guest VM onext4 filesystem? If so you should be aware about additional overheadin comparison when you were running tests on the hosts.
Hi Vitaliy,
I appreciate you providing the feedback and suggestions. I spent overa week trying as many combinations of host and guest options aspossible to narrow this issue down to a specific host storage or aguest device model option. Unfortunately the problem occurred withevery combination I tested while running Linux as the guest. Note, Ionly tested RHEL8 & RHEL9 compatible distributions ( Alma & Rocky ).The problem did not occur when I ran FreeBSD as the guest. Theproblem did not occur when I ran KVM in the host and Linux as the guest.
I would suggest to run fio (or even dd) on raw disk device insideVM, i.e. without filesystem at all. Just do not forget do “echo 3 >/proc/sys/vm/drop_caches” in Linux Guest VM before you run tests.
The two servers I was using to test with are are no longer available.However, I'll have two more identical servers arriving in the nextweek or so. I'll try to run additional tests and report back here. Iused bonnie++ as that was easily installed from the package repos onall the systems I tested.
Could you also give more information about:

 1. What results did you get (decode bonnie++ output)?
If you look back at this email thread, there are many examples ofrunning bonnie++ on the guest. I first ran the tests on the hostsystem using Linux + ext4 and FreeBSD 14 + UFS & ZFS to get abaseline of performance. Then I ran bonnie++ tests using bhyve as thehypervisor and Linux & FreeBSD as the guest. The combination of hostand guest storage options included ...
1) block device + virtio blk
2) block device + nvme
3) UFS disk image + virtio blk
4) UFS disk image + nvme
5) ZFS disk image + virtio blk
6) ZFS disk image + nvme
7) ZVOL + virtio blk
8) ZVOL + nvme
In every instance, I observed the Linux guest disk IO often performvery well for some time after the guest was first booted. Then theperformance of the guest would drop to a fraction of the originalperformance. The benchmark test was run every 5 or 10 minutes in acron job. Sometimes the guest would perform well for up to an hourbefore performance would drop off. Most of the time it would onlyperform well for a few cycles ( 10 - 30 mins ) before performancewould drop off. The only way to restore the performance was to rebootthe guest. Once I determined that the problem was not specific to aparticular host or guest storage option, I switched my testing toonly use a block device as backing storage on the host to avoidhitting any system disk caches.
Here is the test script I used in the cron job ...

#!/bin/sh
FNAME='output.txt'
echo================================================================================>> $FNAME
echo Begin @ `/usr/bin/date` >> $FNAME
echo >> $FNAME
/usr/sbin/bonnie++ 2>&1 | /usr/bin/grep -v 'done\|,' >> $FNAME
echo >> $FNAME
echo End @ `/usr/bin/date` >> $FNAME
As you can see, I'm calling bonnie++ with the system defaults. Thatuses a data set size that's 2x the guest RAM in an attempt tominimize the effect of filesystem cache on results. Here is anexample of the output that bonnie++ produces ...
Version 2.00 ------Sequential Output------ --Sequential Input---Random- -Per Chr- --Block-- -Rewrite- -Per Chr- --Block----Seeks--Name:Size etc /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CPlinux-blk 63640M 694k 99 1.6g 99 737m 76 985k 99 1.3g 69+++++ +++Latency 11579us 535us 11889us 8597us 21819us 8238usVersion 2.00 ------Sequential Create------ --------RandomCreate--------linux-blk -Create-- --Read--- -Delete-- -Create-- --Read----Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ ++++++++ +++Latency 7620us 126us 1648us 151us 15us 633us
--------------------------------- speed drop---------------------------------
Version 2.00 ------Sequential Output------ --Sequential Input---Random- -Per Chr- --Block-- -Rewrite- -Per Chr- --Block----Seeks--Name:Size etc /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CPlinux-blk 63640M 676k 99 451m 99 314m 93 951k 99 402m 9915167 530Latency 11902us 8959us 24711us 10185us 20884us 5831usVersion 2.00 ------Sequential Create------ --------RandomCreate--------linux-blk -Create-- --Read--- -Delete-- -Create-- --Read----Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 0 96 +++++ +++ +++++ +++ 0 96 ++++++++ 0 75Latency 343us 165us 1636us 113us 55us 1836us
In the example above, the benchmark test repeated about 20 times withresults that were similar to the performance shown above the dottedline ( ~ 1.6g/s seq write and 1.3g/s seq read ). After that, theperformance dropped to what's shown below the dotted line which isless than 1/4 the original speed ( ~ 451m/s seq write and 402m/s seqread ).
 2. What results expecting?
What I expect is that, when I perform the same test with the sameparameters, the results would stay more or less consistent overtime. This is true when KVM is used as the hypervisor on the samehardware and guest options. That said, I'm not worried about bhyvebeing consistently slower than kvm or a FreeBSD guest beingconsistently slower than a Linux guest. I'm concerned that theperformance drop over time is indicative of an issue with how bhyveinteracts with non-freebsd guests.
 3. VM configuration, virtio-blk disk size, etc.
 4. Full command for tests (including size of test-set), bhyve, etc.
I believe this was answered above. Please let me know if you haveadditional questions.
5. Did you pass virtio-blk as 512 or 4K ? If 512, probably youshould try 4K.
The testing performed was not exclusively with virtio-blk.
6. Linux has several read-ahead options for IO schedule, and itcould be related too.
I suppose it's possible that bhyve could be somehow causing the diskscheduler in the Linux guest to act differently. I'll see if I canfigure out how to disable that in future tests.
Additionally could also you play with “sync=disabled” volume/zvoloption? Of course it is only for write testing.
The testing performed was not exclusively with zvols.
Once I have more hardware available, I'll try to report back withmore testing. It may be interesting to also see how a Windows guestperforms compared to Linux & FreeBSD. I suspect that this issue mayonly be triggered when a fast disk array is in use on the host. Mytests use a 16x SSD RAID 10 array. It's also quite possible that thedisk IO slowdown is only a symptom of another issue that's triggeredby the disk IO test ( please see end of my last post related toscheduler priority observations ). All I can say for sure is that ...
1) There is a problem and it's reproducible across multiple hosts
2) It affects RHEL8 & RHEL9 guests but not FreeBSD guests
3) It is not specific to any host or guest storage option

Thanks,

-Matthew

Re: bhyve disk performance issue

Reply via email to