Another tidbit for you guys as I make my way through various permutations:
I installed the RHEL3 hugemem kernel and the guest behavior is *much* better.
System time still has some regular hiccups that are higher than xen and esx
(e.g., 1 minute samples out of 5 show system time between 10 and 15%), but
overall guest behavior is good with the hugemem kernel.

One side effect I've noticed is that I cannot restart the RHEL3 guest running
the hugemem kernel in successive attempts. The guest has 2 vcpus and qemu shows
one thread at 100% cpu. If I recall correctly kvm_stat shows a large amount of
tlb_flushes (like millions in a 5-second sample). The scenario is:
1. start guest running hugemem kernel,
2. shutdown,
3. restart guest.

During 3. it hangs, but at random points. Removing kvm/kvm-intel has no effect -
guest still hangs on the restart. Rebooting the host clears the problem.

Alternatively, during the hang on a restart I can kill the guest, and then on
restart choose the normal, 32-bit smp kernel and the guest boots just fine. At
this point I can shutdown the guest and restart with the hugemem kernel and it
boots just fine.

david


David S. Ahern wrote:
> Hi Marcelo:
> 
> mmu_recycled is always 0 for this guest -- even after almost 4 hours of 
> uptime.
> 
> Here is a kvm_stat sample where guest time was very high and qemu had 2
> processors at 100% on the host. I removed counters where both columns have 0
> value for brevity.
> 
>  exits                 45937979  758051
>  fpu_reload             1416831      87
>  halt_exits              112911       0
>  halt_wakeup              31771       0
>  host_state_reload      2068602     263
>  insn_emulation        21601480  365493
>  io_exits               1827374    2705
>  irq_exits              8934818  285196
>  mmio_exits              421674     147
>  mmu_cache_miss         4817689   93680
>  mmu_flooded            4815273   93680
>  mmu_pde_zapped           51344       0
>  mmu_prefetch           4817625   93680
>  mmu_pte_updated       14803298  270104
>  mmu_pte_write         19859863  363785
>  mmu_shadow_zapped      4832106   93679
>  pf_fixed              32184355  468398
>  pf_guest                264138       0
>  remote_tlb_flush      10697762  280522
>  tlb_flush             10301338  176424
> 
> (NOTE: This is for a *5* second sample interval instead of 1 to allow me to
> capture the data).
> 
> Here's a sample when the guest is "well-behaved" (system time <10%, though ):
>  exits                 51502194   97453
>  fpu_reload             1421736     227
>  halt_exits              138361    1927
>  halt_wakeup              33047     117
>  host_state_reload      2110190    3740
>  insn_emulation        24367441   47260
>  io_exits               1874075    2576
>  irq_exits             10224702   13333
>  mmio_exits              435154    1726
>  mmu_cache_miss         5414097   11258
>  mmu_flooded            5411548   11243
>  mmu_pde_zapped           52851      44
>  mmu_prefetch           5414031   11258
>  mmu_pte_updated       16854686   29901
>  mmu_pte_write         22526765   42285
>  mmu_shadow_zapped      5430025   11313
>  pf_fixed              36144578   67666
>  pf_guest                282794     430
>  remote_tlb_flush      12126268   14619
>  tlb_flush             11753162   21460
> 
> 
> There is definitely a strong correlation between the mmu counters and high
> system times in the guest. I am still trying to find out what in the guest is
> stimulating it when running on RHEL3; I do not see this same behavior for an
> equivalent setup running on RHEL4.
> 
> By the way I added an mmu_prefetch stat in prefetch_page() to count the number
> of times the for() loop is hit with PTTYPE == 64; ie., number of times
> paging64_prefetch_page() is invoked. (I wanted an explicit counter for this
> loop, though the info seems to duplicate other entries.) That counter is 
> listed
> above. As I mentioned in a prior post when kscand kicks in the change in
> mmu_prefetch counter is at 20,000+/sec, with each trip through that function
> taking 45k+ cycles.
> 
> kscand is an instigator shortly after boot, however, kscand is *not* the 
> culprit
> once the system has been up for 30-45 minutes. I have started instrumenting 
> the
> RHEL3U8 kernel and for the load I am running kscand does not walk the active
> lists very often once the system is up.
> 
> So, to dig deeper on what in the guest is stimulating the mmu I collected
> kvmtrace data for about a 2 minute time interval which caught about a 
> 30-second
> period where guest system time was steady in the 25-30% range. Summarizing the
> number of times a RIP appears in an VMEXIT shows the following high runners:
> 
>   count      RIP       RHEL3-symbol
>   82549   0xc0140e42  follow_page [kernel] c0140d90 offset b2
>   42532   0xc0144760  handle_mm_fault [kernel] c01446d0 offset 90
>   36826   0xc013da4a  futex_wait [kernel] c013d870 offset 1da
>   29987   0xc0145cd0  zap_pte_range [kernel] c0145c10 offset c0
>   27451   0xc0144018  do_no_page [kernel] c0143e20 offset 1f8
> 
> (halt entry removed the list since that is the ideal scenario for an exit).
> 
> So the RIP correlates to follow_page() for a large percentage of the VMEXITs.
> 
> I wrote an awk script to summarize (histogram style) the TSC cycles between
> VMEXIT and VMENTRY for an address. For the first rip, 0xc0140e42, 82,271 times
> (ie., almost 100% of the time) the trace shows a delta between 50k and 100k
> cycles between the VMEXIT and the subsequent VMENTRY. Similarly for the second
> one, 0xc0144760, 42403 times (again almost 100% of the occurrences) the trace
> shows a delta between 50k and 100k cycles between VMEXIT and VMENTRY. These
> seems to correlate with the prefetch_page function in kvm, though I am not 
> 100%
> positive on that.
> 
> I am now investigating the kernel paths leading to those functions. Any 
> insights
> would definitely be appreciated.
> 
> david
> 
> 
> Marcelo Tosatti wrote:
>> On Fri, Apr 25, 2008 at 11:33:18AM -0600, David S. Ahern wrote:
>>> Most of the cycles (~80% of that 54k+) are spent in 
>>> paging64_prefetch_page():
>>>
>>>         for (i = 0; i < PT64_ENT_PER_PAGE; ++i) {
>>>                 gpa_t pte_gpa = gfn_to_gpa(sp->gfn);
>>>                 pte_gpa += (i+offset) * sizeof(pt_element_t);
>>>
>>>                 r = kvm_read_guest_atomic(vcpu->kvm, pte_gpa, &pt,
>>>                                           sizeof(pt_element_t));
>>>                 if (r || is_present_pte(pt))
>>>                         sp->spt[i] = shadow_trap_nonpresent_pte;
>>>                 else
>>>                         sp->spt[i] = shadow_notrap_nonpresent_pte;
>>>         }
>>>
>>> This loop is run 512 times and takes a total of ~45k cycles, or ~88 cycles 
>>> per
>>> loop.
>>>
>>> This function gets run >20,000/sec during some of the kscand loops.
>> Hi David,
>>
>> Do you see the mmu_recycled counter increase?
>>
> 

-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
_______________________________________________
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Reply via email to