Hello,
While running an AIM7 (workfile.high_systime) in a single 40-way (or a single
60-way KVM guest) I noticed pretty bad performance when the guest was booted
with 3.3.1 kernel when compared to the same guest booted with 2.6.32-220
(RHEL6.2) kernel.
'am still trying to dig more into the details here. Wondering if some changes
in
the upstream kernel (i.e. since 2.6.32-220) might be causing this to show up in
a guest environment (esp. for this system-intensive workload).
Has anyone else observed this kind of behavior ? Is it a known issue with a fix
in the pipeline ? If not are there any special knobs/tunables that one needs to
explicitly set/clear etc. when using newer kernels like 3.3.1 in a guest ?
I have included some info. below.
Also any pointers on what else I could capture that would be helpful.
Thanks!
Vinod
---
Platform used:
DL980 G7 (80 cores + 128G RAM). Hyper-threading is turned off.
Workload used:
AIM7 (workfile.high_systime) and using RAM disks. This is
primarily a cpu intensive workload...not much i/o.
Software used :
qemu-system-x86_64 : 1.0.50 (i.e. latest as of about a week or so ago).
Native/Host OS : 3.3.1 (SLUB allocator explicitly enabled)
Guest-RunA OS : 2.6.32-220 (i.e. RHEL6.2 kernel)
Guest-RunB OS : 3.3.1
Guest was pinned on :
numa node: 4,5,6,7 -> 40VCPUs + 64G (i.e. 40-way guest)
numa node: 2,3,4,5,7 -> 60VCPUs + 96G (i.e. 60-way guest)
For the 40-way Guest-RunA (2.6.32-220 kernel) performed nearly 9x better than
the Guest-RunB (3.3.1 kernel). In the case of 60-way guest run the older guest
kernel was nearly 12x better !
For the Guest-RunB (3.3.1) case I ran "mpstat -P ALL 1" on the host and
observed
that a very high % of time was being spent by the CPUs outside the guest mode
and mostly in the host (i.e. sys). Looking at the "perf" related traces it
seemed like there were long pauses in the guest perhaps waiting for the
zone->lru_lock as part of release_pages() and this resulted in the VT's PLE
related code to kick-in on the host.
Turned on function tracing and found that there appears to be more time being
spent around the lock code in the 3.3.1 guest when compared to the 2.6.32-220
guest. Here is a small sampling of these traces... Notice the time stamp jump
around "_spin_lock_irqsave <-release_pages" in the case of Guest-RunB.
1) 40-way Guest-RunA (2.6.32-220 kernel):
-----------------------------------------
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
<...>-32147 [020] 145783.127452: native_flush_tlb <-flush_tlb_mm
<...>-32147 [020] 145783.127452: free_pages_and_swap_cache <-
unmap_region
<...>-32147 [020] 145783.127452: lru_add_drain <-
free_pages_and_swap_cache
<...>-32147 [020] 145783.127452: release_pages <-
free_pages_and_swap_cache
<...>-32147 [020] 145783.127452: _spin_lock_irqsave <-release_pages
<...>-32147 [020] 145783.127452: __mod_zone_page_state <-
release_pages
<...>-32147 [020] 145783.127452: mem_cgroup_del_lru_list <-
release_pages
...
<...>-32147 [022] 145783.133536: release_pages <-
free_pages_and_swap_cache
<...>-32147 [022] 145783.133536: _spin_lock_irqsave <-release_pages
<...>-32147 [022] 145783.133536: __mod_zone_page_state <-
release_pages
<...>-32147 [022] 145783.133536: mem_cgroup_del_lru_list <-
release_pages
<...>-32147 [022] 145783.133537: lookup_page_cgroup <-
mem_cgroup_del_lru_list
2) 40-way Guest-RunB (3.3.1):
-----------------------------
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
<...>-16459 [009] .... 101757.383125: free_pages_and_swap_cache <-
tlb_flush_mmu
<...>-16459 [009] .... 101757.383125: lru_add_drain <-
free_pages_and_swap_cache
<...>-16459 [009] .... 101757.383125: release_pages <-
free_pages_and_swap_cache
<...>-16459 [009] .... 101757.383125: _raw_spin_lock_irqsave <-
release_pages
<...>-16459 [009] d... 101757.384861: mem_cgroup_lru_del_list <-
release_pages
<...>-16459 [009] d... 101757.384861: lookup_page_cgroup <-
mem_cgroup_lru_del_list
....
<...>-16459 [009] .N.. 101757.390385: release_pages <-
free_pages_and_swap_cache
<...>-16459 [009] .N.. 101757.390385: _raw_spin_lock_irqsave <-
release_pages
<...>-16459 [009] dN.. 101757.392983: mem_cgroup_lru_del_list <-
release_pages
<...>-16459 [009] dN.. 101757.392983: lookup_page_cgroup <-
mem_cgroup_lru_del_list
<...>-16459 [009] dN.. 101757.392983: __mod_zone_page_state <-
release_pages
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html