http://www.tglx.de/hrtimers.htmlhrtimershrtimers is the rework of the former ktimers project. After the extensive discussions on LKML about the name "ktimer", Andrew Morton suggested "hrtimer" and we picked it up.
hrtimers - High-resloution timer subsystemThis patch introduces a new subsystem for high-resolution kernel timers. One might ask the question: we already have a timer subsystem (kernel/timers.c), why do we need two timer subsystems? After a lot of back and forth trying to integrate high-resolution and high-precision features into the existing timer framework, and after testing various such high-resolution timer implementations in practice, we came to the conclusion that the timer wheel code is fundamentally not suitable for such an approach. We initially didnt believe this ('there must be a way to solve this'), and spent a considerable effort trying to integrate things into the timer wheel, but we failed. In hindsight, there are several reasons why such integration is hard/impossible:
While this subsystem does not offer high-resolution clock sources just yet, the hrtimer subsystem can be easily extended with high-resolution clock capabilities, and patches for that exist and are maturing quickly. The increasing demand for realtime and multimedia applications along with other potential users for precise timers gives another reason to separate the "timeout" and "precise timer" subsystems. Another potential benefit is that such a seperation allows even more special-purpose optimization of the existing timer wheel for the low resolution and low precision use cases - once the precision-sensitive APIs are separated from the timer wheel and are migrated over to hrtimers. E.g. we could decrease the frequency of the timeout subsystem from 250 Hz to 100 HZ (or even smaller). hrtimer subsystem implementation detailsThe basic design considerations were:
(This seperate list is also useful for later when we'll introduce high-resolution clocks, where we need seperate pending and expired queues while keeping the time-order intact.) Time-ordered enqueueing is not purely for the purposes of high-resolution clocks though, it also simplifies the handling of absolute timers based on a low-resolution CLOCK_REALTIME. The existing implementation needed to keep an extra list of all armed absolute CLOCK_REALTIME timers along with complex locking. In case of settimeofday and NTP, all the timers (!) had to be dequeued, the time-changing code had to fix them up one by one, and all of them had to be enqueued again. The time-ordered enqueueing and the storage of the expiry time in absolute time units removes all this complex and poorly scaling code from the posix-timer implementation - the clock can simply be set without having to touch the rbtree. This also makes the handling of posix-timers simpler in general. The locking and per-CPU behavior of hrtimers was mostly taken from the existing timer wheel code, as it is mature and well suited. Sharing code was not really a win, due to the different data structures. Also, the hrtimer functions now have clearer behavior and clearer names - such as hrtimer_try_to_cancel() and hrtimer_cancel() [which are roughly equivalent to del_timer() and del_timer_sync()] - so there's no direct 1:1 mapping between them on the algorithmical level, and thus no real potential for code sharing either. Basic data types: every time value, absolute or relative, is in a special nanosecond-resolution type: ktime_t. The kernel-internal representation of ktime_t values and operations is implemented via macros and inline functions, and can be switched between a "hybrid union" type and a plain "scalar" 64bit nanoseconds representation (at compile time). The hybrid union type optimizes time conversions on 32bit CPUs. This build-time-selectable ktime_t storage format was implemented to avoid the performance impact of 64-bit multiplications and divisions on 32bit CPUs. Such operations are frequently necessary to convert between the storage formats provided by kernel and userspace interfaces and the internal time format. (See include/linux/ktime.h for further details.) hrtimers - rounding of timer valuesThe hrtimer code will round timer events to lower-resolution clocks because it has to. Otherwise it will do no artificial rounding at all. one question is, what resolution value should be returned to the user by the clock_getres() interface. This will return whatever real resolution a given clock has - be it low-res, high-res, or artificially-low-res. hrtimers - testing and verificationWe used the high-resolution clock subsystem ontop of hrtimers to verify the hrtimer implementation details in praxis, and we also ran the posix timer tests in order to ensure specification compliance. We also ran tests on low-resolution clocks. The hrtimer patch converts the following kernel functionality to use hrtimers:
The conversion of nanosleep and posix-timers enabled the unification of nanosleep and clock_nanosleep.
The code was successfully compiled for the following platforms:
The code was run-tested on the following platforms: hrtimers were also integrated into the -rt tree, along with a hrtimers-based high-resolution clock implementation, so the hrtimers code got a healthy amount of testing and use in practice. Thomas Gleixner, Ingo Molnar hrtimers high resolution patchesA hrtimers high resolution timer implemtation is available which
introduces a new abstraction layer for clock event sources to make high
resolution timers less intrusive.
Latest source is here |