I added tracers to kvm_mmu_page_fault() that include collecting tsc cycles:
1. before vcpu->arch.mmu.page_fault() 2. after vcpu->arch.mmu.page_fault() 3. after mmu_topup_memory_caches() 4. after emulate_instruction() So the delta in the trace reports show: - cycles required for arch.mmu.page_fault (tracer 2) - cycles required for mmu_topup_memory_caches(tracer 3) - cycles required for emulate_instruction() (tracer 4) I captured trace data for ~5-seconds during one of the usual events (again this time it was due to kscand in the guest). I ran the formatted trace data through an awk script to summarize: TSC cycles tracer2 tracer3 tracer4 0 - 10,000: 295067 213251 115873 10,001 - 25,000: 7682 1004 98336 25,001 - 50,000: 201 15 36 50,001 - 100,000: 100655 0 10 > 100,000: 117 0 15 This means vcpu->arch.mmu.page_fault() was called 403,722 times in the roughyl 5-second interval: 295,067 times it took < 10,000 cycles, but 100,772 times it took longer than 50,000 cycles. The page_fault function getting run is paging64_page_fault. mmu_topup_memory_caches() and emulate_instruction() were both run 214,270 times, most of them relatively quickly. Note: I bumped the scheduling priority of the qemu threads to RR 1 so that few host processes could interrupt it. david Avi Kivity wrote: > David S. Ahern wrote: >> I added the traces and captured data over another apparent lockup of >> the guest. >> This seems to be representative of the sequence (pid/vcpu removed). >> >> (+4776) VMEXIT [ exitcode = 0x00000000, rip = 0x00000000 >> c016127c ] >> (+ 0) PAGE_FAULT [ errorcode = 0x00000003, virt = 0x00000000 >> c0009db4 ] >> (+3632) VMENTRY >> (+4552) VMEXIT [ exitcode = 0x00000000, rip = 0x00000000 >> c016104a ] >> (+ 0) PAGE_FAULT [ errorcode = 0x0000000b, virt = 0x00000000 >> fffb61c8 ] >> (+ 54928) VMENTRY >> > > Can you oprofile the host to see where the 54K cycles are spent? > >> (+4568) VMEXIT [ exitcode = 0x00000000, rip = 0x00000000 >> c01610e7 ] >> (+ 0) PAGE_FAULT [ errorcode = 0x00000003, virt = 0x00000000 >> c0009db4 ] >> (+ 0) PTE_WRITE [ gpa = 0x00000000 00009db4 gpte = 0x00000000 >> 41c5d363 ] >> (+8432) VMENTRY >> (+3936) VMEXIT [ exitcode = 0x00000000, rip = 0x00000000 >> c01610ee ] >> (+ 0) PAGE_FAULT [ errorcode = 0x00000003, virt = 0x00000000 >> c0009db0 ] >> (+ 0) PTE_WRITE [ gpa = 0x00000000 00009db0 gpte = 0x00000000 >> 00000000 ] >> (+ 13832) VMENTRY >> >> >> (+5768) VMEXIT [ exitcode = 0x00000000, rip = 0x00000000 >> c016127c ] >> (+ 0) PAGE_FAULT [ errorcode = 0x00000003, virt = 0x00000000 >> c0009db4 ] >> (+3712) VMENTRY >> (+4576) VMEXIT [ exitcode = 0x00000000, rip = 0x00000000 >> c016104a ] >> (+ 0) PAGE_FAULT [ errorcode = 0x0000000b, virt = 0x00000000 >> fffb61d0 ] >> (+ 0) PTE_WRITE [ gpa = 0x00000000 3d5981d0 gpte = 0x00000000 >> 3d55d047 ] >> > > This indeed has the accessed bit clear. > >> (+ 65216) VMENTRY >> (+4232) VMEXIT [ exitcode = 0x00000000, rip = 0x00000000 >> c01610e7 ] >> (+ 0) PAGE_FAULT [ errorcode = 0x00000003, virt = 0x00000000 >> c0009db4 ] >> (+ 0) PTE_WRITE [ gpa = 0x00000000 00009db4 gpte = 0x00000000 >> 3d598363 ] >> > > This has the accessed bit set and the user bit clear, and the pte > pointing at the previous pte_write gpa. Looks like a kmap_atomic(). > >> (+8640) VMENTRY >> (+3936) VMEXIT [ exitcode = 0x00000000, rip = 0x00000000 >> c01610ee ] >> (+ 0) PAGE_FAULT [ errorcode = 0x00000003, virt = 0x00000000 >> c0009db0 ] >> (+ 0) PTE_WRITE [ gpa = 0x00000000 00009db0 gpte = 0x00000000 >> 00000000 ] >> (+ 14160) VMENTRY >> >> I can forward a more complete time snippet if you'd like. vcpu0 + >> corresponding >> vcpu1 files have 85000 total lines and compressed the files total ~500k. >> >> I did not see the FLOODED trace come out during this sample though I >> did bump >> the count from 3 to 4 as you suggested. >> >> >> > > Bumping the count was supposed to remove the flooding... > >> Correlating rip addresses to the 2.4 kernel: >> >> c0160d00-c0161290 = page_referenced >> >> It looks like the event is kscand running through the pages. I >> suspected this >> some time ago, and tried tweaking the kscand_work_percent sysctl >> variable. It >> appeared to lower the peak of the spikes, but maybe I imagined it. I >> believe >> lowering that value makes kscand wake up more often but do less work >> (page >> scanning) each time it is awakened. >> >> > > What does 'top' in the guest show (perhaps sorted by total cpu time > rather than instantaneous usage)? > > What host kernel are you running? How many host cpus? > ------------------------------------------------------------------------- This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone _______________________________________________ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel