David S. Ahern wrote:
> I added tracers to kvm_mmu_page_fault() that include collecting tsc cycles:
>
> 1. before vcpu->arch.mmu.page_fault()
> 2. after vcpu->arch.mmu.page_fault()
> 3. after mmu_topup_memory_caches()
> 4. after emulate_instruction()
>
> So the delta in the trace reports show:
> - cycles required for arch.mmu.page_fault (tracer 2)
> - cycles required for mmu_topup_memory_caches(tracer 3)
> - cycles required for emulate_instruction() (tracer 4)
>
> I captured trace data for ~5-seconds during one of the usual events (again 
> this
> time it was due to kscand in the guest). I ran the formatted trace data 
> through
> an awk script to summarize:
>
>     TSC cycles      tracer2   tracer3   tracer4
>       0 -  10,000:   295067    213251    115873
>  10,001 -  25,000:     7682      1004     98336
>  25,001 -  50,000:      201        15        36
>  50,001 - 100,000:   100655         0        10
>         > 100,000:      117         0        15
>
> This means vcpu->arch.mmu.page_fault() was called 403,722 times in the roughyl
> 5-second interval: 295,067 times it took < 10,000 cycles, but 100,772 times it
> took longer than 50,000 cycles. The page_fault function getting run is
> paging64_page_fault.
>
>   

This does look like the fork detector.  Once in every four faults, it 
triggers and the fault becomes slow.  100K floods == 100K page tables == 
200GB of virtual memory, which seems excessive.

Is this running a forked load like apache, with many processes?  How 
much memory is on the guest, and is there any memory pressure?

> mmu_topup_memory_caches() and emulate_instruction() were both run 214,270 
> times,
> most of them relatively quickly.
> b
> Note: I bumped the scheduling priority of the qemu threads to RR 1 so that few
> host processes could interrupt it.
>
> david
>
>
> Avi Kivity wrote:
>   
>> David S. Ahern wrote:
>>     
>>> I added the traces and captured data over another apparent lockup of
>>> the guest.
>>> This seems to be representative of the sequence (pid/vcpu removed).
>>>
>>> (+4776)  VMEXIT         [ exitcode = 0x00000000, rip = 0x00000000
>>> c016127c ]
>>> (+   0)  PAGE_FAULT     [ errorcode = 0x00000003, virt = 0x00000000
>>> c0009db4 ]
>>> (+3632)  VMENTRY
>>> (+4552)  VMEXIT         [ exitcode = 0x00000000, rip = 0x00000000
>>> c016104a ]
>>> (+   0)  PAGE_FAULT     [ errorcode = 0x0000000b, virt = 0x00000000
>>> fffb61c8 ]
>>> (+   54928)  VMENTRY
>>>   
>>>       
>> Can you oprofile the host to see where the 54K cycles are spent?
>>
>>     
>>> (+4568)  VMEXIT         [ exitcode = 0x00000000, rip = 0x00000000
>>> c01610e7 ]
>>> (+   0)  PAGE_FAULT     [ errorcode = 0x00000003, virt = 0x00000000
>>> c0009db4 ]
>>> (+   0)  PTE_WRITE      [ gpa = 0x00000000 00009db4 gpte = 0x00000000
>>> 41c5d363 ]
>>> (+8432)  VMENTRY
>>> (+3936)  VMEXIT         [ exitcode = 0x00000000, rip = 0x00000000
>>> c01610ee ]
>>> (+   0)  PAGE_FAULT     [ errorcode = 0x00000003, virt = 0x00000000
>>> c0009db0 ]
>>> (+   0)  PTE_WRITE      [ gpa = 0x00000000 00009db0 gpte = 0x00000000
>>> 00000000 ]
>>> (+   13832)  VMENTRY
>>>
>>>
>>> (+5768)  VMEXIT         [ exitcode = 0x00000000, rip = 0x00000000
>>> c016127c ]
>>> (+   0)  PAGE_FAULT     [ errorcode = 0x00000003, virt = 0x00000000
>>> c0009db4 ]
>>> (+3712)  VMENTRY
>>> (+4576)  VMEXIT         [ exitcode = 0x00000000, rip = 0x00000000
>>> c016104a ]
>>> (+   0)  PAGE_FAULT     [ errorcode = 0x0000000b, virt = 0x00000000
>>> fffb61d0 ]
>>> (+   0)  PTE_WRITE      [ gpa = 0x00000000 3d5981d0 gpte = 0x00000000
>>> 3d55d047 ]
>>>   
>>>       
>> This indeed has the accessed bit clear.
>>
>>     
>>> (+   65216)  VMENTRY
>>> (+4232)  VMEXIT         [ exitcode = 0x00000000, rip = 0x00000000
>>> c01610e7 ]
>>> (+   0)  PAGE_FAULT     [ errorcode = 0x00000003, virt = 0x00000000
>>> c0009db4 ]
>>> (+   0)  PTE_WRITE      [ gpa = 0x00000000 00009db4 gpte = 0x00000000
>>> 3d598363 ]
>>>   
>>>       
>> This has the accessed bit set and the user bit clear, and the pte
>> pointing at the previous pte_write gpa.  Looks like a kmap_atomic().
>>
>>     
>>> (+8640)  VMENTRY
>>> (+3936)  VMEXIT         [ exitcode = 0x00000000, rip = 0x00000000
>>> c01610ee ]
>>> (+   0)  PAGE_FAULT     [ errorcode = 0x00000003, virt = 0x00000000
>>> c0009db0 ]
>>> (+   0)  PTE_WRITE      [ gpa = 0x00000000 00009db0 gpte = 0x00000000
>>> 00000000 ]
>>> (+   14160)  VMENTRY
>>>
>>> I can forward a more complete time snippet if you'd like. vcpu0 +
>>> corresponding
>>> vcpu1 files have 85000 total lines and compressed the files total ~500k.
>>>
>>> I did not see the FLOODED trace come out during this sample though I
>>> did bump
>>> the count from 3 to 4 as you suggested.
>>>
>>>
>>>   
>>>       
>> Bumping the count was supposed to remove the flooding...
>>
>>     
>>> Correlating rip addresses to the 2.4 kernel:
>>>
>>> c0160d00-c0161290 = page_referenced
>>>
>>> It looks like the event is kscand running through the pages. I
>>> suspected this
>>> some time ago, and tried tweaking the kscand_work_percent sysctl
>>> variable. It
>>> appeared to lower the peak of the spikes, but maybe I imagined it. I
>>> believe
>>> lowering that value makes kscand wake up more often but do less work
>>> (page
>>> scanning) each time it is awakened.
>>>
>>>   
>>>       
>> What does 'top' in the guest show (perhaps sorted by total cpu time
>> rather than instantaneous usage)?
>>
>> What host kernel are you running?  How many host cpus?
>>
>>     


-- 
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.


-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
_______________________________________________
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Reply via email to