http://www.redhat.com/magazine/005mar05/features/kprobes/Gaining insight into the Linux® kernel with Kprobesby William CohenIntroductionMany times kernel developers have resorted to using the "diagnostic print statements" approach to understand what is occurring in the Linux kernel. This technique can be painful because a new kernel must be built and installed on the machine. The machine must then be rebooted with the new kernel. Each new experiment requires another reboot of the machine, which could take minutes on some machines. Developers have found the ability to inspect the operation of unmodified executables to be very useful. In the case of userspace applications developers can use debuggers to set breakpoints at specific locations in the unmodified executable. When the processor encounters a breakpoint the developer uses the debugger to inspect program state to gain insight into how the program is operating (or failing). There are advantages to this method of examining the program operation over the traditional technique of compiling "diagnostic print statements" into the program:
Due to interrupt handling it is not feasible to completely stop the Linux kernel and wait for the developer to type in commands. However, it is possible to place snippets of instrumentation code in the kernel to collect information at specific locations to determine whether a specific function is being executed and state of variables. The recent 2.6 Linux kernels, including the x86 kernel in the upcoming Fedora Core 4, have support to allow developers to gather information about the Linux kernel's operation without compiling or booting a new kernel. This is implemented with Kprobes, a dynamic instrumentation system. This article describes how Kprobes operate and provides kernel instrumentation examples. KprobesKprobes is a dynamic instrumentation system in the mainline 2.6 Linux kernel and will be enabled in the soon to be released x86 Fedora Core 4 kernels. Kprobes allows one to gather additional information about kernel operation without recompiling or rebooting a kernel. Kprobes enables locations in the kernel to be instrumented with code, and the instrumentation code runs when the processor encounters that probe point. Once the instrumentation code completes execution, the kernel continues normal execution. The Kprobes instrumentation is built as a kernel module. Thus, rather than having to recompile and reboot the system with an instrumented kernel, a kprobe instrumentation module can be written, compiled, and loaded on the system. There is no need to reboot the system. Once the instrumentation module has served its purpose, it can be unloaded, and the kernel returned to its normal operation. There are two types of kernel probes available: kprobes and jprobes. A kprobe inserts a probe at a specific instruction. The instrumentation provided by a kprobe could be inserted anywhere in a function, thus the kprobe code cannot make assumptions about local variables or arguments passed into the function being probed. A jprobe instruments the entry of a function and allows the probe to examine the arguments passed into the probed function. The kprobe support in the kernel provides simple data structures
and a set of functions to allow the insertion and removal of kernel
probes. A data structure is filled out and registered with a call to
either the
Table 1. Kernel probes management functions
Listing 1, kprobe data
structure shows the fields of struct struct kprobe {
/* elided fields for internal state information */
kprobe_opcode_t *addr;
kprobe_pre_handler_t pre_handler;
kprobe_post_handler_t post_handler;
kprobe_fault_handler_t fault_handler;
/* elided fields for internal state information */
};
Listing 1. kprobe data structure
The jprobe is built on top of the basic kprobe. The jprobes
simplify the instrumentation of function entries and allow one to
inspect the arguments passed to the function. The struct struct jprobe {
struct kprobe kp;
kprobe_opcode_t *entry; /* probe handling code to jump to */
};
Listing 2. jprobe data structure
The execution of a kprobe has similarities to the execution of a
breakpoint set by a debugger. The instruction at the kernel probe
location is saved in a buffer, and the instruction at that location is
replaced by an breakpoint instruction. When the processor encounters
the breakpoint, the trap handler is invoked. A check is made to
determine whether there is a kprobe registered at this location. If
there is no probe registered for that location, the breakpoint is
passed on to the normal handler. If a probe is found, the Examples This article contains two examples: one example using a kprobe and
the other example using a jprobe. Most all of the block device I/O goes
through the function You need to have the Assuming that the
Kprobe example The kprobe example The include for The The function The instrumentation is started as root with the following command:
The instrumentation is shutdown as root with the following command:
When the module is unloaded, the data is written to Feb 23 12:09:20 slingshot kernel: kprobe registered Feb 23 12:09:31 slingshot kernel: kprobe unregistered Feb 23 12:09:31 slingshot kernel: generic_make_request() called 52 times. Listing 5. Output of kprobebio module in
/var/log/messagesJprobe example Another useful mechanism provided by Kprobes support is Jprobes.
Jprobes allow instrumentation of the function entry and access to the
arguments passed into the instrumented function. Listing 6, jprobebio.c shows the
the code to generate instrumentation that counts the number of times
that The Another significant difference between kprobes and jprobes is how
the instrumentation function is exited. In a jprobe there needs to be
an explicit A jprobe uses a struct When the module is removed from the kernel, Feb 23 13:55:01 slingshot kernel: plant jprobe at c024f900, handler addr e09e4000 Feb 23 13:55:02 slingshot crond(pam_unix)[5969]: session closed for user root Feb 23 13:55:21 slingshot kernel: jprobe unregistered Feb 23 13:55:21 slingshot kernel: generic_make_request() called 119 times for 952 sectors. Feb 23 13:55:21 slingshot kernel: bdev 0xcb199da8 (3,5) 26 208 sectors. Feb 23 13:55:21 slingshot kernel: bdev 0xdf00eda8 (3,2) 93 744 sectors. Listing 7. Output of the jprobebio module in
/var/logThe futureThe examples in this article show how to write simple instrumentation using the Kprobes support in the Fedora Core 4 kernels. However, one might notices that the instrumentation is written in raw C code, and it is quite possible to crash the machine if the instrumentation code has a flaw in it. The Kprobes mechanism is also a very low-level interface that simply places individual probes where directed. There is no predefined library that selects groups of probe points to measure things that a regular user might be interested in. Thus, currently Kprobes requires a good understanding of the kernel to know which locations in the kernel to instrument to get data and to perform analysis on the collected data to produce a meaningful result. An effort has started to address these deficiencies in the current kprobe instrumentation: SystemTap. SystemTap will provide a safer language for writing the instrumentation and a library of useful instrumentation. Further readingAbout the author |
