I added tracers to kvm_mmu_page_fault() that include collecting tsc cycles:

1. before vcpu->arch.mmu.page_fault()
2. after vcpu->arch.mmu.page_fault()
3. after mmu_topup_memory_caches()
4. after emulate_instruction()

So the delta in the trace reports show:
- cycles required for arch.mmu.page_fault (tracer 2)
- cycles required for mmu_topup_memory_caches(tracer 3)
- cycles required for emulate_instruction() (tracer 4)

I captured trace data for ~5-seconds during one of the usual events (again this
time it was due to kscand in the guest). I ran the formatted trace data through
an awk script to summarize:

    TSC cycles      tracer2   tracer3   tracer4
      0 -  10,000:   295067    213251    115873
 10,001 -  25,000:     7682      1004     98336
 25,001 -  50,000:      201        15        36
 50,001 - 100,000:   100655         0        10
        > 100,000:      117         0        15

This means vcpu->arch.mmu.page_fault() was called 403,722 times in the roughyl
5-second interval: 295,067 times it took < 10,000 cycles, but 100,772 times it
took longer than 50,000 cycles. The page_fault function getting run is
paging64_page_fault.

mmu_topup_memory_caches() and emulate_instruction() were both run 214,270 times,
most of them relatively quickly.

Note: I bumped the scheduling priority of the qemu threads to RR 1 so that few
host processes could interrupt it.

david


Avi Kivity wrote:
> David S. Ahern wrote:
>> I added the traces and captured data over another apparent lockup of
>> the guest.
>> This seems to be representative of the sequence (pid/vcpu removed).
>>
>> (+4776)  VMEXIT         [ exitcode = 0x00000000, rip = 0x00000000
>> c016127c ]
>> (+   0)  PAGE_FAULT     [ errorcode = 0x00000003, virt = 0x00000000
>> c0009db4 ]
>> (+3632)  VMENTRY
>> (+4552)  VMEXIT         [ exitcode = 0x00000000, rip = 0x00000000
>> c016104a ]
>> (+   0)  PAGE_FAULT     [ errorcode = 0x0000000b, virt = 0x00000000
>> fffb61c8 ]
>> (+   54928)  VMENTRY
>>   
> 
> Can you oprofile the host to see where the 54K cycles are spent?
> 
>> (+4568)  VMEXIT         [ exitcode = 0x00000000, rip = 0x00000000
>> c01610e7 ]
>> (+   0)  PAGE_FAULT     [ errorcode = 0x00000003, virt = 0x00000000
>> c0009db4 ]
>> (+   0)  PTE_WRITE      [ gpa = 0x00000000 00009db4 gpte = 0x00000000
>> 41c5d363 ]
>> (+8432)  VMENTRY
>> (+3936)  VMEXIT         [ exitcode = 0x00000000, rip = 0x00000000
>> c01610ee ]
>> (+   0)  PAGE_FAULT     [ errorcode = 0x00000003, virt = 0x00000000
>> c0009db0 ]
>> (+   0)  PTE_WRITE      [ gpa = 0x00000000 00009db0 gpte = 0x00000000
>> 00000000 ]
>> (+   13832)  VMENTRY
>>
>>
>> (+5768)  VMEXIT         [ exitcode = 0x00000000, rip = 0x00000000
>> c016127c ]
>> (+   0)  PAGE_FAULT     [ errorcode = 0x00000003, virt = 0x00000000
>> c0009db4 ]
>> (+3712)  VMENTRY
>> (+4576)  VMEXIT         [ exitcode = 0x00000000, rip = 0x00000000
>> c016104a ]
>> (+   0)  PAGE_FAULT     [ errorcode = 0x0000000b, virt = 0x00000000
>> fffb61d0 ]
>> (+   0)  PTE_WRITE      [ gpa = 0x00000000 3d5981d0 gpte = 0x00000000
>> 3d55d047 ]
>>   
> 
> This indeed has the accessed bit clear.
> 
>> (+   65216)  VMENTRY
>> (+4232)  VMEXIT         [ exitcode = 0x00000000, rip = 0x00000000
>> c01610e7 ]
>> (+   0)  PAGE_FAULT     [ errorcode = 0x00000003, virt = 0x00000000
>> c0009db4 ]
>> (+   0)  PTE_WRITE      [ gpa = 0x00000000 00009db4 gpte = 0x00000000
>> 3d598363 ]
>>   
> 
> This has the accessed bit set and the user bit clear, and the pte
> pointing at the previous pte_write gpa.  Looks like a kmap_atomic().
> 
>> (+8640)  VMENTRY
>> (+3936)  VMEXIT         [ exitcode = 0x00000000, rip = 0x00000000
>> c01610ee ]
>> (+   0)  PAGE_FAULT     [ errorcode = 0x00000003, virt = 0x00000000
>> c0009db0 ]
>> (+   0)  PTE_WRITE      [ gpa = 0x00000000 00009db0 gpte = 0x00000000
>> 00000000 ]
>> (+   14160)  VMENTRY
>>
>> I can forward a more complete time snippet if you'd like. vcpu0 +
>> corresponding
>> vcpu1 files have 85000 total lines and compressed the files total ~500k.
>>
>> I did not see the FLOODED trace come out during this sample though I
>> did bump
>> the count from 3 to 4 as you suggested.
>>
>>
>>   
> 
> Bumping the count was supposed to remove the flooding...
> 
>> Correlating rip addresses to the 2.4 kernel:
>>
>> c0160d00-c0161290 = page_referenced
>>
>> It looks like the event is kscand running through the pages. I
>> suspected this
>> some time ago, and tried tweaking the kscand_work_percent sysctl
>> variable. It
>> appeared to lower the peak of the spikes, but maybe I imagined it. I
>> believe
>> lowering that value makes kscand wake up more often but do less work
>> (page
>> scanning) each time it is awakened.
>>
>>   
> 
> What does 'top' in the guest show (perhaps sorted by total cpu time
> rather than instantaneous usage)?
> 
> What host kernel are you running?  How many host cpus?
> 

-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
_______________________________________________
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Reply via email to