Summary: FreeBSD 10.1/amd64 under Xen 4.2.5 is much slower than FreeBSD 9.3 on 
the same environment, especially at fork()

I recently installed a FreeBSD-10.1 VM under Xen, and was pleased to see the 
XENHVM stuff is now integrated into GENERIC.  However, the system seemed a 
little slow and lacking in "snappiness" -- the first fetch/extraction of 
portsnap was particularly bad, taking at least 20 minutes.  It had been a while 
since I'd done that (as opposed to 'portsnap fetch update') so I wasn't sure 
how abnormal that was, but then I noticed building stuff from ports, especially 
stuff using libtool, like security/sssd, was extremely slow compared to 
physical hardware, so I tested a 9.3 VM, which was much faster.

Importantly, it was not a typical case of a slow/overloaded CPU but more like 
slow context switching/forking.  I would see high (40%) system CPU percentage 
but low user, and usually the process at the top of the list was sh.  It would 
take a long time between compiling files but when cc finally ran it was quite 
fast, compiling each file in a second or two.  The system was not swapping and 
iostat (also xentop on the host) showed minimal I/O load.

Tracing the sh process (which was libtool-related) with truss, I would see it 
do some stuff, fork, wait several seconds, then do some more stuff, rinse and 
repeat.  Using 'truss -f' to follow the child processes, there was a noticeable 
delay associated with each fork() call.

This led me to do some benchmarking.  I found a fork() benchmark at [1] and ran 
it on various systems.  Notably, on FreeBSD 10.1 (also 10.0) under Xen, it was 
reasonably fast shortly after bootup (though still slower than 9.3), but would 
get slower on repeated runs, and significantly slower after compiling some 
ports.  It would also run slowly if the system had booted and then sat idle for 
a while. The speed was inconsistent, as occasionally after a period of idleness 
it would run somewhat faster again without rebooting; also configure and 
compilation times of sssd were inconsistent, but generally "slow", sometimes 
drastically so.

FreeBSD 9.3 (with "xenhvm_load="YES" in loader.conf) on the same Xen host does 
not have this problem -- it fork()s more quickly and consistently; FreeBSD 10.1 
on KVM (unfortunately not on the same hardware) also appears normal, as does 
8.4 on (different but similar vintage) physical hardware, and a Linux VM on the 
same Xen host.  Using one or two virtual CPUs does not make much difference, 
and the host machine is otherwise idle, so it does not appear to be an SMP 
issue.  I was using ZFS, but I have ruled that out as a factor, as the problem 
occurs even without zfs.ko loaded (/ is ufs).  Varying the memory between 1 and 
8 GB did not seem to affect anything either.  I also built a "NOHVM" 10.1 
kernel to see if the Xen drivers were at issue, but that did not help (it was 
actually a bit slower), so it appears to be something deeper in the kernel or 

The Xen host is running Xen 4.2.5_02-0.7.1 with SLES 11 SP3 as the Dom0, on a 
Dell 2950 with 8 physical CPU cores (dual socket, quad-core Xeon E5420).  I 
have not experienced performance problems with any other guest OS.

As FreeBSD 9.3 runs fine, I am using that for my FreeBSD VMs for now, but 
hopefully 10.x can be fixed before 9-STABLE goes EOL!  Following are the VM 
config, dmesg, and some benchmarks.



Xen DomU config:
description="FreeBSD 10.1 - testing"

disk=[ 'phy:/dev/xc-test/fbsd10,hda,w', 
'file:/root/FreeBSD-10.1-RELEASE-amd64-dvd1.iso,hdc:cdrom,r', ]
vif=[ 'mac=00:16:3e:3a:57:7a,bridge=br0,type=netfront', ]


Copyright (c) 1992-2014 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
        The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 10.1-RELEASE-p6 #0: Tue Feb 24 19:00:21 UTC 2015 amd64
FreeBSD clang version 3.4.1 (tags/RELEASE_34/dot1-final 208032) 20140512
XEN: Hypervisor version 4.2 detected.
CPU: Intel(R) Xeon(R) CPU           E5420  @ 2.50GHz (2493.90-MHz K8-class CPU)
  Origin = "GenuineIntel"  Id = 0x10676  Family = 0x6  Model = 0x17  Stepping = 
  AMD Features=0x20100800<SYSCALL,NX,LM>
  AMD Features2=0x1<LAHF>
real memory  = 1073741824 (1024 MB)
avail memory = 1010737152 (963 MB)
Event timer "LAPIC" quality 400
ACPI APIC Table: <Xen HVM>
FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
FreeBSD/SMP: 1 package(s) x 2 core(s)
 cpu0 (BSP): APIC ID:  0
 cpu1 (AP): APIC ID:  2
ioapic0: Changing APIC ID to 1
MADT: Forcing active-low polarity and level trigger for SCI
ioapic0 <Version 1.1> irqs 0-47 on motherboard
kbd1 at kbdmux0
random: <Software, Yarrow> initialized
xen_et0: <Xen PV Clock> on motherboard
Event timer "XENTIMER" frequency 1000000000 Hz quality 950
Timecounter "XENTIMER" frequency 1000000000 Hz quality 950
acpi0: <Xen> on motherboard
acpi0: Power Button (fixed)
acpi0: Sleep Button (fixed)
acpi0: reservation of 0, a0000 (3) failed
cpu0: <ACPI CPU> on acpi0
cpu1: <ACPI CPU> on acpi0
attimer0: <AT timer> port 0x40-0x43 irq 0 on acpi0
Timecounter "i8254" frequency 1193182 Hz quality 0
Event timer "i8254" frequency 1193182 Hz quality 100
atrtc0: <AT realtime clock> port 0x70-0x71 irq 8 on acpi0
Event timer "RTC" frequency 32768 Hz quality 0
Timecounter "ACPI-fast" frequency 3579545 Hz quality 900
acpi_timer0: <32-bit timer at 3.579545MHz> port 0xb008-0xb00b on acpi0
pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
pci0: <ACPI PCI bus> on pcib0
isab0: <PCI-ISA bridge> at device 1.0 on pci0
isa0: <ISA bus> on isab0
atapci0: <Intel PIIX3 WDMA2 controller> port 
0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xc100-0xc10f at device 1.1 on pci0
ata0: <ATA channel> at channel 0 on atapci0
ata1: <ATA channel> at channel 1 on atapci0
pci0: <bridge> at device 1.3 (no driver attached)
vgapci0: <VGA-compatible display> mem 
0xf0000000-0xf1ffffff,0xf3000000-0xf3000fff at device 2.0 on pci0
vgapci0: Boot video device
xenpci0: <Xen Platform Device> port 0xc000-0xc0ff mem 0xf2000000-0xf2ffffff irq 
28 at device 3.0 on pci0
xenstore0: <XenStore> on xenpci0
atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0
atkbd0: <AT Keyboard> irq 1 on atkbdc0
kbd0 at atkbd0
atkbd0: [GIANT-LOCKED]
psm0: <PS/2 Mouse> irq 12 on atkbdc0
psm0: model IntelliMouse Explorer, device ID 4
fdc0: <floppy drive controller> port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on acpi0
fdc0: does not respond
device_attach: fdc0 attach returned 6
uart0: <16550 or compatible> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0
uart0: console (9600,n,8,1)
ppc0: <Parallel port> port 0x378-0x37f irq 7 on acpi0
ppc0: Generic chipset (NIBBLE-only) in COMPATIBLE mode
ppbus0: <Parallel port bus> on ppc0
lpt0: <Printer> on ppbus0
lpt0: Interrupt-driven port
ppi0: <Parallel I/O> on ppbus0
sc0: <System console> at flags 0x100 on isa0
sc0: VGA <16 virtual consoles, flags=0x100>
vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
fdc0: No FDOUT register!
Timecounters tick every 10.000 msec
xctrl0: <Xen Control Device> on xenstore0
xenbusb_front0: <Xen Frontend Devices> on xenstore0
cd0 at ata1 bus 0 scbus1 target 0 lun 0
cd0: <QEMU QEMU DVD-ROM 0.10> Removable CD-ROM SCSI-0 device
cd0: Serial Number QM00003
cd0: 16.700MB/s transfers (WDMA2, ATAPI 12bytes, PIO 65534bytes)
cd0: cd present [1262221 x 2048 byte records]
xbd0: 6144MB <Virtual Block Device> at device/vbd/768 on xenbusb_front0
xbd0: attaching as ada0
xbd0: features: flush, write_barrier
xbd0: synchronize cache commands enabled.
xn0: <Virtual Network Interface> at device/vif/0 on xenbusb_front0
xn0: Ethernet address: 00:16:3e:3a:57:7a
xenbusb_back0: <Xen Backend Devices> on xenstore0
xn0: backend features: feature-sg feature-gso-tcp4
random: unblocking device.
SMP: AP CPU #1 Launched!
Trying to mount root from ufs:/dev/ada0p2 [rw]...
xn0: 2 link states coalesced

"NOHVM" kernel config (not the dmesg above, but presented for completeness):
include GENERIC
ident NOHVM

# NOTE: XENHVM depends on xenpci.  They must be added or removed together.
nooptions       XENHVM                  # Xen HVM kernel infrastructure
nodevice        xenpci                  # Xen HVM Hypervisor services driver

Fork benchmark -- ./fork-benchmark <numprocs>:

10.1, 2 CPU, fresh boot:
Forked, executed and destroyed 100 processes in 0.268835 seconds.
Forked, executed and destroyed 1000 processes in 2.362202 seconds.
Forked, executed and destroyed 1000 processes in 2.642716 seconds.
Forked, executed and destroyed 10000 processes in 28.75984 seconds.
Forked, executed and destroyed 10000 processes in 34.568837 seconds.
Forked, executed and destroyed 10000 processes in 52.69006 seconds.
Forked, executed and destroyed 10000 processes in 53.41585 seconds.

10.1, 1 CPU, after compiling sssd:
Forked, executed and destroyed 100 processes in 5.684971 seconds.
Forked, executed and destroyed 1000 processes in 60.330680 seconds.

10.1, 2 CPU, NOHVM kernel, after compiling sssd:
Forked, executed and destroyed 5000 processes in 102.849662 seconds.
Forked, executed and destroyed 5000 processes in 107.160831 seconds.
Forked, executed and destroyed 100 processes in 2.524160 seconds.
Forked, executed and destroyed 1000 processes in 19.592753 seconds.

9.3, 1 CPU:
Forked, executed and destroyed 5000 processes in 8.416964 seconds.

9.3, 2 CPU:
1: Forked, executed and destroyed 5000 processes in 9.951971 seconds.
2: Forked, executed and destroyed 5000 processes in 10.185864 seconds.
3: Forked, executed and destroyed 5000 processes in 10.124263 seconds.
(remains consistent)

Compilation times -- cd /usr/ports/security/sssd; make clean; time make 
configure; time make build
9.3, 1 CPU:     22.804u 10.764s 0:40.19 83.5% 1400+2497k 816+7885io 456pf+0w
9.3, 2 CPU:     25.732u 14.651s 0:42.38 95.2%   1326+2432k 164+7885io 30pf+0w
10.1, 1 CPU:    148.992u 68.372s 3:38.52 99.4% 2325+197k 0+294io 3pf+0w
10.1, 2 CPU:    1.156u 29.289s 1:02.47 96.7% 4602+225k 774+300io 654pf+0w
(again):        35.229u 21.117s 0:49.30 114.2% 4667+221k 0+291io 0pf+0w
10.1 NOHVM:     80.236u 51.313s 1:51.45 118.0% 2930+200k 0+296io 30pf+0w

9.3, 1 CPU:     233.998u 145.352s 6:22.51 99.1% 1360+2777k 287+3966io 32pf+0w
9.3, 2 CPU:     280.641u 230.728s 4:24.23 193.5% 1157+2675k 0+3968io 0pf+0w
10.1, 1 CPU:    3199.849u 764.871s 1:06:26.72 99.4% 753+182k 203+28io 86pf+0w
10.1, 2 CPU:    744.318u 549.327s 11:02.38 195.3% 2388+193k 235+28io 86pf+0w
(again):        1072.863u 747.565s 15:30.05 195.7% 2119+192k 3+29io 0pf+0w
10.1 NOHVM:     1173.692u 823.116s 17:06.46 194.5% 1725+188k 0+28io 0pf+0w

Note the 10.1/1 CPU build took over an hour!  I'm fairly certain I had a 10.1/2 
CPU build also take around an hour, but I didn't manage to capture it with 
_______________________________________________ mailing list
To unsubscribe, send any mail to ""

Reply via email to