Summary: FreeBSD 10.1/amd64 under Xen 4.2.5 is much slower than FreeBSD 9.3 on the same environment, especially at fork()
I recently installed a FreeBSD-10.1 VM under Xen, and was pleased to see the XENHVM stuff is now integrated into GENERIC. However, the system seemed a little slow and lacking in "snappiness" -- the first fetch/extraction of portsnap was particularly bad, taking at least 20 minutes. It had been a while since I'd done that (as opposed to 'portsnap fetch update') so I wasn't sure how abnormal that was, but then I noticed building stuff from ports, especially stuff using libtool, like security/sssd, was extremely slow compared to physical hardware, so I tested a 9.3 VM, which was much faster. Importantly, it was not a typical case of a slow/overloaded CPU but more like slow context switching/forking. I would see high (40%) system CPU percentage but low user, and usually the process at the top of the list was sh. It would take a long time between compiling files but when cc finally ran it was quite fast, compiling each file in a second or two. The system was not swapping and iostat (also xentop on the host) showed minimal I/O load. Tracing the sh process (which was libtool-related) with truss, I would see it do some stuff, fork, wait several seconds, then do some more stuff, rinse and repeat. Using 'truss -f' to follow the child processes, there was a noticeable delay associated with each fork() call. This led me to do some benchmarking. I found a fork() benchmark at  and ran it on various systems. Notably, on FreeBSD 10.1 (also 10.0) under Xen, it was reasonably fast shortly after bootup (though still slower than 9.3), but would get slower on repeated runs, and significantly slower after compiling some ports. It would also run slowly if the system had booted and then sat idle for a while. The speed was inconsistent, as occasionally after a period of idleness it would run somewhat faster again without rebooting; also configure and compilation times of sssd were inconsistent, but generally "slow", sometimes drastically so. FreeBSD 9.3 (with "xenhvm_load="YES" in loader.conf) on the same Xen host does not have this problem -- it fork()s more quickly and consistently; FreeBSD 10.1 on KVM (unfortunately not on the same hardware) also appears normal, as does 8.4 on (different but similar vintage) physical hardware, and a Linux VM on the same Xen host. Using one or two virtual CPUs does not make much difference, and the host machine is otherwise idle, so it does not appear to be an SMP issue. I was using ZFS, but I have ruled that out as a factor, as the problem occurs even without zfs.ko loaded (/ is ufs). Varying the memory between 1 and 8 GB did not seem to affect anything either. I also built a "NOHVM" 10.1 kernel to see if the Xen drivers were at issue, but that did not help (it was actually a bit slower), so it appears to be something deeper in the kernel or scheduler. The Xen host is running Xen 4.2.5_02-0.7.1 with SLES 11 SP3 as the Dom0, on a Dell 2950 with 8 physical CPU cores (dual socket, quad-core Xeon E5420). I have not experienced performance problems with any other guest OS. As FreeBSD 9.3 runs fine, I am using that for my FreeBSD VMs for now, but hopefully 10.x can be fixed before 9-STABLE goes EOL! Following are the VM config, dmesg, and some benchmarks. -Andrew  https://github.com/mondalaci/fork-benchmark Xen DomU config: ======== name="fbsd10" description="FreeBSD 10.1 - testing" uuid="ed88195c-dee4-0e44-5943-3deceac8a56c" #memory=4096 memory=1024 maxmem=1024 vcpus=2 on_poweroff="destroy" on_reboot="restart" on_crash="preserve" localtime=0 keymap="en-us" builder="hvm" device_model="/usr/lib/xen/bin/qemu-dm" kernel="/usr/lib/xen/boot/hvmloader" boot="c" disk=[ 'phy:/dev/xc-test/fbsd10,hda,w', 'file:/root/FreeBSD-10.1-RELEASE-amd64-dvd1.iso,hdc:cdrom,r', ] vif=[ 'mac=00:16:3e:3a:57:7a,bridge=br0,type=netfront', ] stdvga=0 vnc=1 vncunused=1 viridian=0 acpi=1 pae=1 serial="pty" ======== dmesg: ======== Copyright (c) 1992-2014 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 10.1-RELEASE-p6 #0: Tue Feb 24 19:00:21 UTC 2015 r...@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC amd64 FreeBSD clang version 3.4.1 (tags/RELEASE_34/dot1-final 208032) 20140512 XEN: Hypervisor version 4.2 detected. CPU: Intel(R) Xeon(R) CPU E5420 @ 2.50GHz (2493.90-MHz K8-class CPU) Origin = "GenuineIntel" Id = 0x10676 Family = 0x6 Model = 0x17 Stepping = 6 Features=0x1783fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE,SSE2,HTT> Features2=0x81282201<SSE3,SSSE3,CX16,SSE4.1,x2APIC,TSCDLT,HV> AMD Features=0x20100800<SYSCALL,NX,LM> AMD Features2=0x1<LAHF> real memory = 1073741824 (1024 MB) avail memory = 1010737152 (963 MB) Event timer "LAPIC" quality 400 ACPI APIC Table: <Xen HVM> FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs FreeBSD/SMP: 1 package(s) x 2 core(s) cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 2 ioapic0: Changing APIC ID to 1 MADT: Forcing active-low polarity and level trigger for SCI ioapic0 <Version 1.1> irqs 0-47 on motherboard kbd1 at kbdmux0 random: <Software, Yarrow> initialized xen_et0: <Xen PV Clock> on motherboard Event timer "XENTIMER" frequency 1000000000 Hz quality 950 Timecounter "XENTIMER" frequency 1000000000 Hz quality 950 acpi0: <Xen> on motherboard acpi0: Power Button (fixed) acpi0: Sleep Button (fixed) acpi0: reservation of 0, a0000 (3) failed cpu0: <ACPI CPU> on acpi0 cpu1: <ACPI CPU> on acpi0 attimer0: <AT timer> port 0x40-0x43 irq 0 on acpi0 Timecounter "i8254" frequency 1193182 Hz quality 0 Event timer "i8254" frequency 1193182 Hz quality 100 atrtc0: <AT realtime clock> port 0x70-0x71 irq 8 on acpi0 Event timer "RTC" frequency 32768 Hz quality 0 Timecounter "ACPI-fast" frequency 3579545 Hz quality 900 acpi_timer0: <32-bit timer at 3.579545MHz> port 0xb008-0xb00b on acpi0 pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0 pci0: <ACPI PCI bus> on pcib0 isab0: <PCI-ISA bridge> at device 1.0 on pci0 isa0: <ISA bus> on isab0 atapci0: <Intel PIIX3 WDMA2 controller> port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xc100-0xc10f at device 1.1 on pci0 ata0: <ATA channel> at channel 0 on atapci0 ata1: <ATA channel> at channel 1 on atapci0 pci0: <bridge> at device 1.3 (no driver attached) vgapci0: <VGA-compatible display> mem 0xf0000000-0xf1ffffff,0xf3000000-0xf3000fff at device 2.0 on pci0 vgapci0: Boot video device xenpci0: <Xen Platform Device> port 0xc000-0xc0ff mem 0xf2000000-0xf2ffffff irq 28 at device 3.0 on pci0 xenstore0: <XenStore> on xenpci0 atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0 atkbd0: <AT Keyboard> irq 1 on atkbdc0 kbd0 at atkbd0 atkbd0: [GIANT-LOCKED] psm0: <PS/2 Mouse> irq 12 on atkbdc0 psm0: [GIANT-LOCKED] psm0: model IntelliMouse Explorer, device ID 4 fdc0: <floppy drive controller> port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on acpi0 fdc0: does not respond device_attach: fdc0 attach returned 6 uart0: <16550 or compatible> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0 uart0: console (9600,n,8,1) ppc0: <Parallel port> port 0x378-0x37f irq 7 on acpi0 ppc0: Generic chipset (NIBBLE-only) in COMPATIBLE mode ppbus0: <Parallel port bus> on ppc0 lpt0: <Printer> on ppbus0 lpt0: Interrupt-driven port ppi0: <Parallel I/O> on ppbus0 sc0: <System console> at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x100> vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 fdc0: No FDOUT register! Timecounters tick every 10.000 msec xctrl0: <Xen Control Device> on xenstore0 xenbusb_front0: <Xen Frontend Devices> on xenstore0 cd0 at ata1 bus 0 scbus1 target 0 lun 0 cd0: <QEMU QEMU DVD-ROM 0.10> Removable CD-ROM SCSI-0 device cd0: Serial Number QM00003 cd0: 16.700MB/s transfers (WDMA2, ATAPI 12bytes, PIO 65534bytes) cd0: cd present [1262221 x 2048 byte records] xbd0: 6144MB <Virtual Block Device> at device/vbd/768 on xenbusb_front0 xbd0: attaching as ada0 xbd0: features: flush, write_barrier xbd0: synchronize cache commands enabled. xn0: <Virtual Network Interface> at device/vif/0 on xenbusb_front0 xn0: Ethernet address: 00:16:3e:3a:57:7a xenbusb_back0: <Xen Backend Devices> on xenstore0 xn0: backend features: feature-sg feature-gso-tcp4 random: unblocking device. SMP: AP CPU #1 Launched! Trying to mount root from ufs:/dev/ada0p2 [rw]... xn0: 2 link states coalesced ======== "NOHVM" kernel config (not the dmesg above, but presented for completeness): ======== include GENERIC ident NOHVM # NOTE: XENHVM depends on xenpci. They must be added or removed together. nooptions XENHVM # Xen HVM kernel infrastructure nodevice xenpci # Xen HVM Hypervisor services driver ======== Benchmarks: =========== Fork benchmark -- ./fork-benchmark <numprocs>: 10.1, 2 CPU, fresh boot: Forked, executed and destroyed 100 processes in 0.268835 seconds. Forked, executed and destroyed 1000 processes in 2.362202 seconds. Forked, executed and destroyed 1000 processes in 2.642716 seconds. Forked, executed and destroyed 10000 processes in 28.75984 seconds. Forked, executed and destroyed 10000 processes in 34.568837 seconds. Forked, executed and destroyed 10000 processes in 52.69006 seconds. Forked, executed and destroyed 10000 processes in 53.41585 seconds. 10.1, 1 CPU, after compiling sssd: Forked, executed and destroyed 100 processes in 5.684971 seconds. Forked, executed and destroyed 1000 processes in 60.330680 seconds. 10.1, 2 CPU, NOHVM kernel, after compiling sssd: Forked, executed and destroyed 5000 processes in 102.849662 seconds. Forked, executed and destroyed 5000 processes in 107.160831 seconds. Forked, executed and destroyed 100 processes in 2.524160 seconds. Forked, executed and destroyed 1000 processes in 19.592753 seconds. 9.3, 1 CPU: Forked, executed and destroyed 5000 processes in 8.416964 seconds. 9.3, 2 CPU: 1: Forked, executed and destroyed 5000 processes in 9.951971 seconds. 2: Forked, executed and destroyed 5000 processes in 10.185864 seconds. 3: Forked, executed and destroyed 5000 processes in 10.124263 seconds. (remains consistent) Compilation times -- cd /usr/ports/security/sssd; make clean; time make configure; time make build configure: 9.3, 1 CPU: 22.804u 10.764s 0:40.19 83.5% 1400+2497k 816+7885io 456pf+0w 9.3, 2 CPU: 25.732u 14.651s 0:42.38 95.2% 1326+2432k 164+7885io 30pf+0w 10.1, 1 CPU: 148.992u 68.372s 3:38.52 99.4% 2325+197k 0+294io 3pf+0w 10.1, 2 CPU: 1.156u 29.289s 1:02.47 96.7% 4602+225k 774+300io 654pf+0w (again): 35.229u 21.117s 0:49.30 114.2% 4667+221k 0+291io 0pf+0w 10.1 NOHVM: 80.236u 51.313s 1:51.45 118.0% 2930+200k 0+296io 30pf+0w build: 9.3, 1 CPU: 233.998u 145.352s 6:22.51 99.1% 1360+2777k 287+3966io 32pf+0w 9.3, 2 CPU: 280.641u 230.728s 4:24.23 193.5% 1157+2675k 0+3968io 0pf+0w 10.1, 1 CPU: 3199.849u 764.871s 1:06:26.72 99.4% 753+182k 203+28io 86pf+0w 10.1, 2 CPU: 744.318u 549.327s 11:02.38 195.3% 2388+193k 235+28io 86pf+0w (again): 1072.863u 747.565s 15:30.05 195.7% 2119+192k 3+29io 0pf+0w 10.1 NOHVM: 1173.692u 823.116s 17:06.46 194.5% 1725+188k 0+28io 0pf+0w Note the 10.1/1 CPU build took over an hour! I'm fairly certain I had a 10.1/2 CPU build also take around an hour, but I didn't manage to capture it with time(1). _______________________________________________ email@example.com mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-xen To unsubscribe, send any mail to "freebsd-xen-unsubscr...@freebsd.org"