>> Avi Kivity wrote:
>>
>>> David S. Ahern wrote:
>>>
>>>> I added the traces and captured data over another apparent lockup of
>>>> the guest.
>>>> This seems to be representative of the sequence (pid/vcpu removed).
>>>>
>>>> (+4776)  VMEXIT         [ exitcode = 0x00000000, rip = 0x00000000
>>>> c016127c ]
>>>> (+   0)  PAGE_FAULT     [ errorcode = 0x00000003, virt = 0x00000000
>>>> c0009db4 ]
>>>> (+3632)  VMENTRY
>>>> (+4552)  VMEXIT         [ exitcode = 0x00000000, rip = 0x00000000
>>>> c016104a ]
>>>> (+   0)  PAGE_FAULT     [ errorcode = 0x0000000b, virt = 0x00000000
>>>> fffb61c8 ]
>>>> (+   54928)  VMENTRY
>>>>
>>> Can you oprofile the host to see where the 54K cycles are spent?
>>>
>>>

I've continued drilling down with the tracers to answer that question. I have
done runs with tracers in paging64_page_fault and it showed the overhead is with
the fetch() function. On my last run the tracers are in paging64_fetch() as 
follows:

1. after is_present_pte() check
2. before kvm_mmu_get_page()
3. after kvm_mmu_get_page()
4. after if (!metaphysical) {}

The delta between 2 and 3 shows the cycles to run kvm_mmu_get_page(). The delta
between 3 and 4 shows the cycles to run kvm_read_guest_atomic(), if it is run.
Tracer1 dumps  vcpu->arch.last_pt_write_count (a carryover from when the new
tracers were in paging64_page_fault); tracer2 dumps the level, metaphysical and
access variables; tracer5 dumps value in shadow_ent.

A representative trace sample is:

(+    4576)  VMEXIT        [ exitcode = 0x00000000, rip = 0x00000000 c016104a ]
(+       0)  PAGE_FAULT    [ errorcode = 0x0000000b, virt = 0x00000000 fffb6950 
]
(+    2664)  PAGE_FAULT1   [ write_count = 0 ]
(+     472)  PAGE_FAULT2   [ level = 2 metaphysical = 0 access 0x00000007 ]
(+   50416)  PAGE_FAULT3
(+     472)  PAGE_FAULT4
(+     856)  PAGE_FAULT5   [ shadow_ent_val = 0x80000000 9276d043 ]
(+    1528)  VMENTRY
(+    4992)  VMEXIT        [ exitcode = 0x00000000, rip = 0x00000000 c01610e7 ]
(+       0)  PAGE_FAULT    [ errorcode = 0x00000003, virt = 0x00000000 c0009db4 
]
(+    2296)  PAGE_FAULT1   [ write_count = 0 ]
(+     816)  PAGE_FAULT5   [ shadow_ent_val = 0x00000000 8a809041 ]
(+       0)  PTE_WRITE     [ gpa = 0x00000000 00009db4 gpte = 0x00000000 
4176d363 ]
(+    6424)  VMENTRY
(+    3864)  VMEXIT        [ exitcode = 0x00000000, rip = 0x00000000 c01610ee ]
(+       0)  PAGE_FAULT    [ errorcode = 0x00000003, virt = 0x00000000 c0009db0 
]
(+    2496)  PAGE_FAULT1   [ write_count = 1 ]
(+     856)  PAGE_FAULT5   [ shadow_ent_val = 0x00000000 8a809041 ]
(+       0)  PTE_WRITE     [ gpa = 0x00000000 00009db0 gpte = 0x00000000 
00000000 ]
(+   10248)  VMENTRY
(+    4744)  VMEXIT        [ exitcode = 0x00000000, rip = 0x00000000 c016127c ]
(+       0)  PAGE_FAULT    [ errorcode = 0x00000003, virt = 0x00000000 c0009db4 
]
(+    2408)  PAGE_FAULT1   [ write_count = 2 ]
(+     760)  PAGE_FAULT5   [ shadow_ent_val = 0x00000000 8a809043 ]
(+    1240)  VMENTRY
(+    4624)  VMEXIT        [ exitcode = 0x00000000, rip = 0x00000000 c016104a ]
(+       0)  PAGE_FAULT    [ errorcode = 0x0000000b, virt = 0x00000000 fffb6950 
]
(+    2512)  PAGE_FAULT1   [ write_count = 0 ]
(+     496)  PAGE_FAULT2   [ level = 2 metaphysical = 0 access 0x00000007 ]
(+   48664)  PAGE_FAULT3
(+     472)  PAGE_FAULT4
(+     856)  PAGE_FAULT5   [ shadow_ent_val = 0x80000000 9272d043 ]
(+    1576)  VMENTRY

So basically every 4th trip through the fetch function it runs
kvm_mmu_get_page(). A summary of the entire trace file shows this function
rarely executes in less than 50,000 cycles. Also, vcpu->arch.last_pt_write_count
is always 0 when the high cycles are hit.


More tidbits:
- The hugepage option seems to have no effect -- the system spikes and overhead
occurs with and without the hugepage option (above data is with it).

- As the guest runs for hours, the intensity of the spikes drop though they
still occur regularly and kscand continues to be the primary suspect. qemu's RSS
tends to the guests memory allotment of 2GB. Internally guest memory usage runs
at ~1GB page cache, 57M buffers, 24M swap, ~800MB for processes.

- I have looked at process creation and do not see a strong correlation between
system time spikes and number of new processes. So far the only correlations
seem to be kscand and amount of memory used. ie., stock RHEL3 with few processes
shows tiny spikes whereas my tests with 90+ processes using about 800M plus a
continually updating page cache (ie., moderate IO levels) the spikes are strong
and last for seconds.

- Time runs really fast in the guest, gaining several minutes in 24-hours.

I'll download your kvm_stat update and give it a try. When I started this
investigation I was using Christian's kvmstat script which dumped stats to a
file. Plots of that data did not show a strong correlation with guest system 
time.

david

-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
_______________________________________________
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Reply via email to