Hi Folks,
I have been trying to use LTTng to track down some scheduling issues in a
program that I writing. The issue I am dealing with is basically deadline
misses in a user space program that uses isochronous FireWire I/O.
In any case, the results I am getting using LLTng are puzzling, and I am not
sure whether they are indicating a severe problem with interrupts being
disabled for too, or if it is a problem with some implementation details on the
platform I running on.
The platform is TI816x -- a relatively new ARM Cortex-A8 SOC; I have applied
the LTTng patchset to the patched 2.6.37 kernel that TI has provided to support
this SOC.
There are two things that are happening in traces that I don't understand; they
maybe related:
1) I am seeing softirq_entry and softirq_exit events for ids that are much
larger the 32 softirqs the could exist, and definitely larger than the 10 that
are defined in my kernel. For example, I see a softirq_raise for softirq_id 1
(TIMER), followed by a softirq_entry for softirq_id 255, and then softirq_exit
for softirq_id 112.
2) I am also seeing an enormous amount of time being spent in the the softirq
that fires after softirq_raise for softirq_id 1 is logged.
I think that what may be happening is that the timer interrupt is adjusting the
trace clock in some way. The sequence I see is:
kernel.irq_entry: 28.180053710 (/targetfs/test1/kernel_0), 0, 0, , , 0, 0x0,
IRQ { ip = 3221504228, handler = 0xc0054874, irq_id = 67, kernel_mode = 1 }
kernel.timer_update_time: 28.180054931 (/targetfs/test1/kernel_0), 0, 0, , , 0,
0x0, IRQ { jiffies = 4294977392, xtime_sec = 1304810558, xtime_nsec =
329927105, walltomonotonic_sec = -1304810158, walltomonotonic_nsec = 622485308 }
kernel.softirq_raise: 28.180056152 (/targetfs/test1/kernel_0), 0, 0, , , 0,
0x0, IRQ { softirq_id = 1 [softirq 1] }
kernel.softirq_raise: 28.180057373 (/targetfs/test1/kernel_0), 0, 0, , , 0,
0x0, IRQ { softirq_id = 1 [softirq 1] }
kernel.irq_exit: 28.180058593 (/targetfs/test1/kernel_0), 0, 0, , , 0, 0x0,
SYSCALL { handled = 1 }
kernel.softirq_entry: 28.180059814 (/targetfs/test1/kernel_0), 0, 0, , , 0,
0x0, SOFTIRQ { softirq_id = 255 [softirq 255] }
//****** note jump of approximately 10ms
kernel.timer_set: 28.190001220 (/targetfs/test1/kernel_0), 0, 0, , , 0, 0x0,
SOFTIRQ { expires = 10097, function = 0xc00af198, data = 0 }
kernel.timer_set: 28.190002441 (/targetfs/test1/kernel_0), 0, 0, , , 0, 0x0,
SOFTIRQ { expires = 10097, function = 0xbf006210, data = 0 }
kernel.softirq_exit: 28.190003662 (/targetfs/test1/kernel_0), 0, 0, , , 0, 0x0,
SYSCALL { softirq_id = 112 [softirq 112] }
kernel.irq_entry: 28.190004882 (/targetfs/test1/kernel_0), 0, 0, , , 0, 0x0,
IRQ { ip = 3221504228, handler = 0xc0054874, irq_id = 67, kernel_mode = 1 }
kernel.timer_update_time: 28.190006103 (/targetfs/test1/kernel_0), 0, 0, , , 0,
0x0, IRQ { jiffies = 4294977393, xtime_sec = 1304810558, xtime_nsec =
339927009, walltomonotonic_sec = -1304810158, walltomonotonic_nsec = 622485308 }
kernel.softirq_raise: 28.190007324 (/targetfs/test1/kernel_0), 0, 0, , , 0,
0x0, IRQ { softirq_id = 1 [softirq 1] }
kernel.softirq_raise: 28.190008544 (/targetfs/test1/kernel_0), 0, 0, , , 0,
0x0, IRQ { softirq_id = 1 [softirq 1] }
kernel.irq_exit: 28.190009765 (/targetfs/test1/kernel_0), 0, 0, , , 0, 0x0,
SYSCALL { handled = 1 }
kernel.softirq_entry: 28.190010986 (/targetfs/test1/kernel_0), 0, 0, , , 0,
0x0, SOFTIRQ { softirq_id = 255 [softirq 255] }
//****** note jump of approximately 10ms
kernel.timer_set: 28.200001220 (/targetfs/test1/kernel_0), 0, 0, , , 0, 0x0,
SOFTIRQ { expires = 10098, function = 0xc00af198, data = 0 }
kernel.timer_set: 28.200002441 (/targetfs/test1/kernel_0), 0, 0, , , 0, 0x0,
SOFTIRQ { expires = 10098, function = 0xbf006210, data = 0 }
kernel.softirq_exit: 28.200003662 (/targetfs/test1/kernel_0), 0, 0, , , 0, 0x0,
SYSCALL { softirq_id = 112 [softirq 112] }
kernel.irq_entry: 28.200004882 (/targetfs/test1/kernel_0), 0, 0, , , 0, 0x0,
IRQ { ip = 3221504228, handler = 0xc0054874, irq_id = 67, kernel_mode = 1 }
(I added the notes about the time jumps to the trace output).
So, my two main questions are:
(1) Why is the softirq_id being reported incorrectly for kernel.softirq_entry
and kernel.softirq_exit. It is passed as a simple int in trace_softirq_entry,
so there doesn't seem to be a lot of places for it to go wrong, but the probe
function in the ltt kernel modules has some complexity to it that I don't
understand.
(2) Why does the timestamp jump by approximately 10ms between the
kernel.softirq_entry for the timer, and the kernel.timer_set? This is what
looks like a huge usage of cpu time, but it is not clear to me that it is
accurate.
In any case, any guidance as to how I might go about resolving these issues so
that I can get accurate traces would be greatly appreciated.
One final question, that doesn't seem to be fully addressed by docs I have
read. If I want to add custom trace points to the kernel and have them
supported by the full toolchain (e.g. in the kernel, enabled for tracing, into
the trace files and then interpreted by lttv), what steps do I need to take. If
this is documented somewhere, and I am just missing it, please point me at the
docs.
Best regards,
B.J. Buchalter
_______________________________________________
ltt-dev mailing list
[email protected]
http://lists.casi.polymtl.ca/cgi-bin/mailman/listinfo/ltt-dev