* François Godin ([email protected]) wrote: > > > > > > > > > > > > I try to understand why on one trace, is there still > > "page_fault_entry" > > > > in > > > > > TRAP mode. I checked the previous correction, and while it find the > > right > > > > > states for all events except those that LTTv put in "TRAP" mode. Am I > > > > > missing something else or is it a glitch in LTTv? > > > > > > > > No sure what you mean. Can you try to explain a little more ? > > > > > > > > > > I've made tests to verify my assumption. I load textdump in a custom > > parser > > > and it give me the mode of the "page_fault_entry" events. I then compare > > the > > > total of each mode to the number of "page_fault_entry" in each mode in > > the > > > statistics view. For one trace, the result was perfect but for another > > > trace, my parser didn't find any "page_fault_entry" events in the "TRAP" > > > mode like LTTv stats did. The others mode had the right total except one > > > whose total given by the parser was the one given by LTTv + the "page > > > fault_entry" in TRAP. > > > > > > Page_fault_entry by mode: > > > LTTv Parser > > > 1 1 MODE_UNKNOWN > > > 971 999 USER_MODE 999(parser) = 971(LTTv) > > > +28(LTTv-TRAP) > > > 26 26 SYSCALL > > > 28 0 TRAP > > > > Hrm hrm. I'd guess that the nested trap (a trap within a trap) case, or > > the major page faults are not handled correctly. A major page fault > > does: > > > > userspace or kernel > > - trap (page fault entry) > > - reenable interrupts > > - allow to be scheduled out while waiting for I/ > > ... > > - rescheduled > > - disable interrupts > > - iret (page fault exit) > > > > So you could look into those corner-cases to see if we did something > > fishy in there. > > > > I've found something strange related : > Here is a part of the event list in the textdump (the first number being the > event number): > 197217:kernel.page_fault_entry: 7013.274341280 (kernel_0), 1061, 1061, > nautilus, , 971, 0x0, TRAP { 0xb6ad9124, 0xb3e89000, 14, 0 } > 197218:mm.add_to_page_cache: 7013.274351397 (mm_0), 1061, 1061, nautilus, , > 971, 0x0, TRAP { 299160, 8388609 } > 197219:mm.add_to_page_cache: 7013.274358043 (mm_0), 1061, 1061, nautilus, , > 971, 0x0, TRAP { 299160, 8388609 } > 197220:block.remap: 7013.274360846 (block_0), 1061, 1061, nautilus, , 971, > 0x0, TRAP { 8388608, 9871488, 8388609, 8192, 0, 0 } > 197221:block.bio_queue: 7013.274362766 (block_0), 1061, 1061, nautilus, , > 971, 0x0, TRAP { 9873536, 8192, 0, 0 } > 197222:block.getrq_bio: 7013.274367878 (15megs/block_0), 1061, 1061, > nautilus, , 971, 0x0, TRAP { 9873536, 8192, 0, 0 } > 197223:block.rq_insert_fs: 7013.274371287 (Traces/15megs/block_0), 1061, > 1061, nautilus, , 971, 0x0, TRAP { 9873536, 0, 0 } > 197224:block.rq_issue_fs: 7013.274380821 (block_0), 1061, 1061, nautilus, , > 971, 0x0, TRAP { 9873536, 0, 0 } > 197225:block.unplug_io: 7013.274476634 (block_0), 1061, 1061, nautilus, , > 971, 0x0, TRAP { 2 } > 198050:kernel.sched_schedule: 7013.306179290 > (/home/francois/workspace/Traces/15megs/kernel_0), 1061, 1061, nautilus, , > 971, 0x0, TRAP { 0, 1061, 0 } > 198054:kernel.irq_exit: 7013.306261518 > (/home/francois/workspace/Traces/15megs/kernel_0), 1061, 1061, nautilus, , > 971, 0x0, TRAP { 1 } > 198057:kernel.softirq_exit: 7013.306304768 > (/home/francois/workspace/Traces/15megs/kernel_0), 1061, 1061, nautilus, , > 971, 0x0, TRAP { 1 [run_timer_softirq+0x0/0x2c0] } > 198058:kernel.page_fault_exit: 7013.306351291 > (/home/francois/workspace/Traces/15megs/kernel_0), 1061, 1061, nautilus, , > 971, 0x0, USER_MODE { 516 } > > Or, LTTv stats show 10 event unplug_io in SYSCALL, 1 in USER_MODE and 0 in > TRAP. (All the other events listed here are absent of TRAP too). It appear > that all these event take the USER_MODE which is the mode before and after.
I'm afraid you'll have to dig more by yourself. There is probably something wrong, but I don't see what immediately. I'm going on vacation for 2 weeks starting this evening. Good luck! Mathieu > > > > > > > > > I've also made another discovery. It seems that the time between the > > > "page_fault_entry" or "page_fault_exit" and the previous event is never > > > added in the CPU_TIME. Is this intended? > > > > This is why we have the "cumulative cpu time", which keeps track of the > > amount of cpu time used by a context _and_ its nested contexts. I'm not > > 100% sure it work well though. > > > > Oops, I misexplained myself. > > Suppose the sequence: > ... > A:syscall_exit > B:page_fault_entry > C:unplug_io > D:page_fault_exit > E:syscall_entry > ... > > Then the time from A to B and from C to D seems to not be used for > calculation of cpu_time. The time from B to C and from D to E are used. > I've got those result experimently by excluding those time of my program and > remarquing the result where closer. Many lower sections have seen there > results becoming totally egal between my program and LTTv. > Is this behavior normal? It seems strange to me because it would meen that > the cpu_time will not be equal to the time between the first and the last > event when the trace occur only on one core/cpu. (LTTv result is not equal > to this time.) > > Thanks for taking the time to answer all my questions. > > > -- > François Godin -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com _______________________________________________ ltt-dev mailing list [email protected] http://lists.casi.polymtl.ca/cgi-bin/mailman/listinfo/ltt-dev
