Hi Folks,

I have been trying to use LTTng to track down some scheduling issues in a 
program that I writing. The issue I am dealing with is basically deadline 
misses in a user space program that uses isochronous FireWire I/O.

In any case, the results I am getting using LLTng are puzzling, and I am not 
sure whether they are indicating a severe problem with interrupts being 
disabled for too, or if it is a problem with some implementation details on the 
platform I running on.

The platform is TI816x -- a relatively new ARM Cortex-A8 SOC; I have applied 
the LTTng patchset to the patched 2.6.37 kernel that TI has provided to support 
this SOC.

There are two things that are happening in traces that I don't understand; they 
maybe related:

1) I am seeing softirq_entry and softirq_exit events for ids that are much 
larger the 32 softirqs the could exist, and definitely larger than the 10 that 
are defined in my kernel. For example, I see a softirq_raise for softirq_id 1 
(TIMER), followed by a softirq_entry for softirq_id 255, and then softirq_exit 
for softirq_id 112.

2) I am also seeing an enormous amount of time being spent in the the softirq 
that fires after softirq_raise for softirq_id 1 is logged.

I think that what may be happening is that the timer interrupt is adjusting the 
trace clock in some way. The sequence I see is:

kernel.irq_entry: 28.180053710 (/targetfs/test1/kernel_0), 0, 0, , , 0, 0x0, 
IRQ { ip = 3221504228, handler = 0xc0054874, irq_id = 67, kernel_mode = 1 }
kernel.timer_update_time: 28.180054931 (/targetfs/test1/kernel_0), 0, 0, , , 0, 
0x0, IRQ { jiffies = 4294977392, xtime_sec = 1304810558, xtime_nsec = 
329927105, walltomonotonic_sec = -1304810158, walltomonotonic_nsec = 622485308 }
kernel.softirq_raise: 28.180056152 (/targetfs/test1/kernel_0), 0, 0, , , 0, 
0x0, IRQ { softirq_id = 1 [softirq 1] }
kernel.softirq_raise: 28.180057373 (/targetfs/test1/kernel_0), 0, 0, , , 0, 
0x0, IRQ { softirq_id = 1 [softirq 1] }
kernel.irq_exit: 28.180058593 (/targetfs/test1/kernel_0), 0, 0, , , 0, 0x0, 
SYSCALL { handled = 1 }
kernel.softirq_entry: 28.180059814 (/targetfs/test1/kernel_0), 0, 0, , , 0, 
0x0, SOFTIRQ { softirq_id = 255 [softirq 255] }

//****** note jump of approximately 10ms

kernel.timer_set: 28.190001220 (/targetfs/test1/kernel_0), 0, 0, , , 0, 0x0, 
SOFTIRQ { expires = 10097, function = 0xc00af198, data = 0 }
kernel.timer_set: 28.190002441 (/targetfs/test1/kernel_0), 0, 0, , , 0, 0x0, 
SOFTIRQ { expires = 10097, function = 0xbf006210, data = 0 }
kernel.softirq_exit: 28.190003662 (/targetfs/test1/kernel_0), 0, 0, , , 0, 0x0, 
SYSCALL { softirq_id = 112 [softirq 112] }
kernel.irq_entry: 28.190004882 (/targetfs/test1/kernel_0), 0, 0, , , 0, 0x0, 
IRQ { ip = 3221504228, handler = 0xc0054874, irq_id = 67, kernel_mode = 1 }
kernel.timer_update_time: 28.190006103 (/targetfs/test1/kernel_0), 0, 0, , , 0, 
0x0, IRQ { jiffies = 4294977393, xtime_sec = 1304810558, xtime_nsec = 
339927009, walltomonotonic_sec = -1304810158, walltomonotonic_nsec = 622485308 }
kernel.softirq_raise: 28.190007324 (/targetfs/test1/kernel_0), 0, 0, , , 0, 
0x0, IRQ { softirq_id = 1 [softirq 1] }
kernel.softirq_raise: 28.190008544 (/targetfs/test1/kernel_0), 0, 0, , , 0, 
0x0, IRQ { softirq_id = 1 [softirq 1] }
kernel.irq_exit: 28.190009765 (/targetfs/test1/kernel_0), 0, 0, , , 0, 0x0, 
SYSCALL { handled = 1 }
kernel.softirq_entry: 28.190010986 (/targetfs/test1/kernel_0), 0, 0, , , 0, 
0x0, SOFTIRQ { softirq_id = 255 [softirq 255] }

//****** note jump of approximately 10ms

kernel.timer_set: 28.200001220 (/targetfs/test1/kernel_0), 0, 0, , , 0, 0x0, 
SOFTIRQ { expires = 10098, function = 0xc00af198, data = 0 }
kernel.timer_set: 28.200002441 (/targetfs/test1/kernel_0), 0, 0, , , 0, 0x0, 
SOFTIRQ { expires = 10098, function = 0xbf006210, data = 0 }
kernel.softirq_exit: 28.200003662 (/targetfs/test1/kernel_0), 0, 0, , , 0, 0x0, 
SYSCALL { softirq_id = 112 [softirq 112] }
kernel.irq_entry: 28.200004882 (/targetfs/test1/kernel_0), 0, 0, , , 0, 0x0, 
IRQ { ip = 3221504228, handler = 0xc0054874, irq_id = 67, kernel_mode = 1 }

(I added the notes about the time jumps to the trace output).

So, my two main questions are:

(1) Why is the softirq_id being reported incorrectly for kernel.softirq_entry 
and kernel.softirq_exit. It is passed as a simple int in trace_softirq_entry, 
so there doesn't seem to be a lot of places for it to go wrong, but the probe 
function in the ltt kernel modules has some complexity to it that I don't 
understand.

(2) Why does the timestamp jump by approximately 10ms between the 
kernel.softirq_entry for the timer, and the kernel.timer_set? This is what 
looks like a huge usage of cpu time, but it is not clear to me that it is 
accurate. 

In any case, any guidance as to how I might go about resolving these issues so 
that I can get accurate traces would be greatly appreciated.

One final question, that doesn't seem to be fully addressed by docs I have 
read. If I want to add custom trace points to the kernel and have them 
supported by the full toolchain (e.g. in the kernel, enabled for tracing, into 
the trace files and then interpreted by lttv), what steps do I need to take. If 
this is documented somewhere, and I am just missing it, please point me at the 
docs.

Best regards,

B.J. Buchalter
        



_______________________________________________
ltt-dev mailing list
[email protected]
http://lists.casi.polymtl.ca/cgi-bin/mailman/listinfo/ltt-dev

Reply via email to