Thomas Gleixner and Ingo Molnar [interview] posted an
update of their high-res timers kernel
patches for the 2.6.17 kernel, "upon which we based a tickless
kernel (dyntick) implementation and a 'dynamic HZ' feature as well".
The patch currently works for x86, with ports to x86_64, PPC and ARM in
the works. Thomas explains, "the
high-res timers feature (CONFIG_HIGH_RES_TIMERS) enables POSIX timers
and nanosleep() to be as accurate as the hardware allows (around 1usec
on typical hardware). This feature is transparent - if enabled it just
makes these timers much more accurate than the current HZ resolution."
He goes on to discribe the tickless kernel:
"The tickless kernel feature (CONFIG_NO_HZ) enables
'on-demand' timer interrupts: if there is no timer to be expired for
say 1.5 seconds when the system goes idle, then the system will stay
totally idle for 1.5 seconds. This should bring cooler CPUs and power
savings: on our (x86) testboxes we have measured the effective IRQ rate
to go from HZ to 1-2 timer interrupts per second.
"This feature is implemented by driving 'low res timer wheel'
processing via special per-CPU high-res timers, which timers are
reprogrammed to the next-low-res-timer-expires interval. This
tickless-kernel design is SMP-safe in a natural way and has been
developed on SMP systems from the beginning."
From: Thomas Gleixner [email blocked]
To: LKML [email blocked]
Subject: [PATCHSET] Announce: High-res timers, tickless/dyntick and dynamic HZ
Date: Sun, 18 Jun 2006 17:10:26 +0200
We are pleased to announce the 2.6.17 based release of our high-res
timers kernel feature, upon which we based a tickless kernel (dyntick)
implementation and a 'dynamic HZ' feature as well:
http://www.tglx.de/projects/hrtimers/2.6.17/
The easiest way to try these features is to apply the combo patch to
vanilla 2.6.17. The patching order is:
http://www.kernel.org/pub/linux/kernel/v2.6/linux-2.6.17.tar.bz2
http://www.tglx.de/projects/hrtimers/2.6.17/patch-2.6.17-hrt-dyntick1.patch
A broken out patch series is available too:
http://www.tglx.de/projects/hrtimers/2.6.17/patch-2.6.17-hrt-dyntick1.patches.tar.bz2
The high-res timers feature (CONFIG_HIGH_RES_TIMERS) enables POSIX
timers and nanosleep() to be as accurate as the hardware allows (around
1usec on typical hardware). This feature is transparent - if enabled it
just makes these timers much more accurate than the current HZ
resolution. It is based on the Generic Time Of Day patchset from John
Stultz and it in essence finishes what we started with the
kernel/hrtimers.c code in 2.6.16.
The tickless kernel feature (CONFIG_NO_HZ) enables 'on-demand' timer
interrupts: if there is no timer to be expired for say 1.5 seconds when
the system goes idle, then the system will stay totally idle for 1.5
seconds. This should bring cooler CPUs and power savings: on our (x86)
testboxes we have measured the effective IRQ rate to go from HZ to 1-2
timer interrupts per second.
This feature is implemented by driving 'low res timer wheel' processing
via special per-CPU high-res timers, which timers are reprogrammed to
the next-low-res-timer-expires interval. This tickless-kernel design is
SMP-safe in a natural way and has been developed on SMP systems from
the
beginning.
Note: while our code should be similar in behavior to the existing
dynticks kernel patch from Con, it is a fundamentally different design
(being based on the high-res timers support and APIs) and is thus a
different implementation. We reused one area of dynticks: we integrated
and improved the 'timer top' profiling tool (CONFIG_TIMER_INFO).
When running the kernel then there's a 'timeout granularity'
runtime tunable parameter as well, under:
/proc/sys/kernel/timeout_granularity
it defaults to 1, meaning that CONFIG_HZ is the granularity of timers.
For example, if CONFIG_HZ is 1000 and timeout_granularity is set to 10,
then low-res timers will be expired every 10 jiffies (every 10 msecs),
thus the effective granularity of low-res timers is 100 HZ. Thus this
feature implements nonintrusive dynamic HZ in essence, without touching
the HZ macro itself.
Supported platforms: high-res timers and tickless works on x86 (x86_64,
PPC and ARM port are in the works). Other platforms should still work
fine with the usual HZ frequency timer tick.
Naturally, we'd like these features to be integrated into the upstream
kernel as well.
Bugreports and suggestions are welcome,
Thomas, Ingo
From: Roman Zippel <[EMAIL PROTECTED]>
Subject: Re: [PATCHSET] Announce: High-res timers, tickless/dyntick and dynamic HZ
Date: Mon, 19 Jun 2006 01:47:22 +0200 (CEST)
Hi,
On Sun, 18 Jun 2006, Thomas Gleixner wrote:
> Bugreports and suggestions are welcome,
Could you please document the patches? I know it sucks compared to
hacking, but it would make a review a lot simpler.
bye, Roman
From: Ingo Molnar [email blocked]
Subject: Re: [PATCHSET] Announce: High-res timers, tickless/dyntick and dynamic HZ
Date: Mon, 19 Jun 2006 14:50:18 +0200
* Roman Zippel <[EMAIL PROTECTED]> wrote:
> > Bugreports and suggestions are welcome,
>
> Could you please document the patches? I know it sucks compared to
> hacking, but it would make a review a lot simpler.
yeah, we'll add some description to the patches themselves, but
otherwise i'm afraid it will be like with almost all patch submissions
on lkml: 99% of the details are in the code and people have to ask
specifically if one area or another is unclear :-|
Meanwhile the patch names should provide you with some initial info
(also, we reuse GTOD which is documented in -mm) and the splitup is
pretty clean too - but in any case please feel free to ask pointed
questions! (we happily accept documentation patches as well.)
Ingo
From: Roman Zippel <[EMAIL PROTECTED]>
Subject: Re: [PATCHSET] Announce: High-res timers, tickless/dyntick and dynamic HZ
Date: Mon, 19 Jun 2006 15:47:45 +0200 (CEST)
Hi,
On Mon, 19 Jun 2006, Ingo Molnar wrote:
> > > Bugreports and suggestions are welcome,
> >
> > Could you please document the patches? I know it sucks compared to
> > hacking, but it would make a review a lot simpler.
>
> yeah, we'll add some description to the patches themselves, but
The problem is this is not the first time I mentioned this and some
patches still have no descriptions at all! :-(
> otherwise i'm afraid it will be like with almost all patch submissions
> on lkml: 99% of the details are in the code and people have to ask
> specifically if one area or another is unclear :-|
For a lot of things this acceptable, but if patches (e.g. clockevents) add
new generic infrastructure which effect all archs, they need
documentation (unless you also provide all the arch specific changes).
> Meanwhile the patch names should provide you with some initial info
> (also, we reuse GTOD which is documented in -mm) and the splitup is
> pretty clean too - but in any case please feel free to ask pointed
> questions! (we happily accept documentation patches as well.)
I can't do this without documentation. Without any information I'm only
wondering why it has to be this complex.
For example clockevents, I think all the special event handlers are
overkill, a simple list would do just fine. This way it may also possible
to treat a clock as virtual interrupt source and we could share code with
interrupt code and a callback can simply be requested via request_irq().
More information about what this code actually intends to do and what it
is required to do, would help a great deal to judge alternative solutions,
but only the author of this code can really provide this information and
IMO it's really sad that this information is still lacking after being
requested multiple times.
bye, Roman
From: Con Kolivas [email blocked]
Subject: Re: [PATCHSET] Announce: High-res timers, tickless/dyntick and dynamic HZ
Date: Mon, 19 Jun 2006 15:21:05 +1000
On Monday 19 June 2006 01:10, Thomas Gleixner wrote:
> We are pleased to announce the 2.6.17 based release of our high-res
> timers kernel feature, upon which we based a tickless kernel (dyntick)
> implementation and a 'dynamic HZ' feature as well:
>
> http://www.tglx.de/projects/hrtimers/2.6.17/
>
> The easiest way to try these features is to apply the combo patch to
> vanilla 2.6.17. The patching order is:
>
> http://www.kernel.org/pub/linux/kernel/v2.6/linux-2.6.17.tar.bz2
> http://www.tglx.de/projects/hrtimers/2.6.17/patch-2.6.17-hrt-dyntick1.patch
>
>
> A broken out patch series is available too:
>
> http://www.tglx.de/projects/hrtimers/2.6.17/patch-2.6.17-hrt-dyntick1.patch
>es.tar.bz2
>
>
> The high-res timers feature (CONFIG_HIGH_RES_TIMERS) enables POSIX
> timers and nanosleep() to be as accurate as the hardware allows (around
> 1usec on typical hardware). This feature is transparent - if enabled it
> just makes these timers much more accurate than the current HZ
> resolution. It is based on the Generic Time Of Day patchset from John
> Stultz and it in essence finishes what we started with the
> kernel/hrtimers.c code in 2.6.16.
>
> The tickless kernel feature (CONFIG_NO_HZ) enables 'on-demand' timer
> interrupts: if there is no timer to be expired for say 1.5 seconds when
> the system goes idle, then the system will stay totally idle for 1.5
> seconds. This should bring cooler CPUs and power savings: on our (x86)
> testboxes we have measured the effective IRQ rate to go from HZ to 1-2
> timer interrupts per second.
>
> This feature is implemented by driving 'low res timer wheel' processing
> via special per-CPU high-res timers, which timers are reprogrammed to
> the next-low-res-timer-expires interval. This tickless-kernel design is
> SMP-safe in a natural way and has been developed on SMP systems from
> the
> beginning.
>
> Note: while our code should be similar in behavior to the existing
> dynticks kernel patch from Con, it is a fundamentally different design
> (being based on the high-res timers support and APIs) and is thus a
> different implementation. We reused one area of dynticks: we integrated
> and improved the 'timer top' profiling tool (CONFIG_TIMER_INFO).
>
> When running the kernel then there's a 'timeout granularity'
> runtime tunable parameter as well, under:
>
> /proc/sys/kernel/timeout_granularity
>
> it defaults to 1, meaning that CONFIG_HZ is the granularity of timers.
>
> For example, if CONFIG_HZ is 1000 and timeout_granularity is set to 10,
> then low-res timers will be expired every 10 jiffies (every 10 msecs),
> thus the effective granularity of low-res timers is 100 HZ. Thus this
> feature implements nonintrusive dynamic HZ in essence, without touching
> the HZ macro itself.
>
> Supported platforms: high-res timers and tickless works on x86 (x86_64,
> PPC and ARM port are in the works). Other platforms should still work
> fine with the usual HZ frequency timer tick.
>
> Naturally, we'd like these features to be integrated into the upstream
> kernel as well.
>
> Bugreports and suggestions are welcome,
>
> Thomas, Ingo
Nice work Thomas and Ingo.
The approach to previous dynticks that I was working on had some nasty issues
with scalability that were not addressable without a complete rewrite which
is why I abandoned the previous implementation. Your approach for using the
hires timer events is ultimately a better solution and the code base is
cleaner so I'm very pleased to see it.
A couple of comments.
One of the problems we enountered with dynticks was that using the higher
resolution timers such as TSC and HPET to adjust for timer ticks over longer
periods when skipping ticks made the overall clock drift when run for many
days and only the PM Timer was not prone to this happening. ie the timers
were very accurate for short periods but over days it would drift. It could
well have been a design flaw in the dynticks I was maintaining rather than
the timers themselves but have you checked that this isn't a problem?
The other thing I note is that there is a reasonable amount of indirection in
fairly hot paths. It looks like there is scope for more local variable
storage of these indirect calls. Also if set_next_event is separated from
struct clock_event, the whole struct looks like a suitable candidate for
__read_only.
--
-ck
From: Ingo Molnar [email blocked]
Subject: Re: [PATCHSET] Announce: High-res timers, tickless/dyntick and dynamic HZ
Date: Mon, 19 Jun 2006 14:26:07 +0200
* Con Kolivas [email blocked] wrote:
> Nice work Thomas and Ingo.
>
> The approach to previous dynticks that I was working on had some nasty
> issues with scalability that were not addressable without a complete
> rewrite which is why I abandoned the previous implementation. Your
> approach for using the hires timer events is ultimately a better
> solution and the code base is cleaner so I'm very pleased to see it.
thanks!
> A couple of comments.
>
> One of the problems we enountered with dynticks was that using the
> higher resolution timers such as TSC and HPET to adjust for timer
> ticks over longer periods when skipping ticks made the overall clock
> drift when run for many days and only the PM Timer was not prone to
> this happening. ie the timers were very accurate for short periods but
> over days it would drift. It could well have been a design flaw in the
> dynticks I was maintaining rather than the timers themselves but have
> you checked that this isn't a problem?
not yet. If it's a real problem we could introduce a 'make clock events
more reliable' framework by doing something like always programming
clock event sources into periodic mode and reading their current time
offset [if possible] when the event is processesed (thus compensating
for most of the drift caused by irq processing latency). But if it's not
needed it would be nice to avoid that complexity. I'm also wondering why
the PM timer was the most accurate in that regard - it's almost as slow
to program as the PIT, so i'd have expected it to to show the biggest
drift.
(another technique to reduce drift: we could increase the APIC-priority
of the lapic timer, making it less suspect to drift when there are lots
of other IRQs going on.)
can you think of any other similar 'weird cases' that you saw happen
with dynticks? For example there's the 'APIC stops timer irqs when
entering C3 mode' bug - any similar weirdness we should be careful
about? [right now the patch doesnt handle the C3 mode bug, but it should
be relatively straightforward to blacklist lapic events in that case]
i'm looking at dynticks-060227.patch right now, and there seem to be a
fair amount of dyntick specific changes to ACPI's processor_idle.c code.
Do you remember what those changes were about and should we pick them up
in one way or another?
> The other thing I note is that there is a reasonable amount of
> indirection in fairly hot paths. It looks like there is scope for more
> local variable storage of these indirect calls. [...]
which function(s) were you looking at when coming to this conclusion?
clockevents_init_next_event() perhaps? [we could certainly put
'sources->nextevent' into a local variable there]
> [...] Also if set_next_event is separated from struct clock_event, the
> whole struct looks like a suitable candidate for __read_mostly.
You mean ->event_handler()? We can make all clockevent instantiations
__read_mostly right now - all of the fields of clock_event are static,
even ->event_handler() will change at most once per bootup [when we
switch from low-res into high-res mode].
Ingo
From: Con Kolivas [email blocked]
Subject: Re: [PATCHSET] Announce: High-res timers, tickless/dyntick and dynamic HZ
Date: Tue, 20 Jun 2006 00:03:25 +1000
On Monday 19 June 2006 22:26, Ingo Molnar wrote:
> * Con Kolivas [email blocked] wrote:
> > One of the problems we enountered with dynticks was that using the
> > higher resolution timers such as TSC and HPET to adjust for timer
> > ticks over longer periods when skipping ticks made the overall clock
> > drift when run for many days and only the PM Timer was not prone to
> > this happening. ie the timers were very accurate for short periods but
> > over days it would drift. It could well have been a design flaw in the
> > dynticks I was maintaining rather than the timers themselves but have
> > you checked that this isn't a problem?
>
> not yet. If it's a real problem we could introduce a 'make clock events
> more reliable' framework by doing something like always programming
> clock event sources into periodic mode and reading their current time
> offset [if possible] when the event is processesed (thus compensating
> for most of the drift caused by irq processing latency). But if it's not
> needed it would be nice to avoid that complexity. I'm also wondering why
> the PM timer was the most accurate in that regard - it's almost as slow
> to program as the PIT, so i'd have expected it to to show the biggest
> drift.
>
> (another technique to reduce drift: we could increase the APIC-priority
> of the lapic timer, making it less suspect to drift when there are lots
> of other IRQs going on.)
Better to wait and see if it was an artefact of my dodgy code for recover
walltime and if this code doesn't have that issue.
> can you think of any other similar 'weird cases' that you saw happen
> with dynticks? For example there's the 'APIC stops timer irqs when
> entering C3 mode' bug - any similar weirdness we should be careful
> about? [right now the patch doesnt handle the C3 mode bug, but it should
> be relatively straightforward to blacklist lapic events in that case]
The hardware that also did C4 was more troublesome but for the same reasons
since it's a subset of C3. See Dominik's patches mentioned below which
address these high state transitions. There isn't anything else offhand I can
think of that I actually managed to track down :|
> i'm looking at dynticks-060227.patch right now, and there seem to be a
> fair amount of dyntick specific changes to ACPI's processor_idle.c code.
> Do you remember what those changes were about and should we pick them up
> in one way or another?
Dominik donated a lot of code to use the dynticks infrastructure to actually
implement the power savings. Just skipping ticks seemed to make very little
power difference unless we also used the knowledge from next timer interrupt
to know how long we are going to be idle and choose C state transitions
accordingly. Each patch is documented at length in the split out
C-States-1_bm_activity_improvements.patch
C-States-2_bm_activity_handling_improvement.patch
C-States-3_accounting_of_sleep_times.patch
C-States-4_dyn-ticks_tweaks.patch
http://ck.kolivas.org/patches/dyn-ticks/split-out/
> > The other thing I note is that there is a reasonable amount of
> > indirection in fairly hot paths. It looks like there is scope for more
> > local variable storage of these indirect calls. [...]
>
> which function(s) were you looking at when coming to this conclusion?
> clockevents_init_next_event() perhaps? [we could certainly put
> 'sources->nextevent' into a local variable there]
>From what I could see
hrtimer_restart_sched_tick() could use
struct hrtimer *sched_timer = &cpu_base->sched_timer;
clockevents_init_next_event() and clockevents_set_next_event() could use
struct clock_event *nextevt = sources->nextevt;
> > [...] Also if set_next_event is separated from struct clock_event, the
> > whole struct looks like a suitable candidate for __read_mostly.
>
> You mean ->event_handler()? We can make all clockevent instantiations
> __read_mostly right now - all of the fields of clock_event are static,
> even ->event_handler() will change at most once per bootup [when we
> switch from low-res into high-res mode].
Great, thanks!
--
-ck
Related Links:
"tickless" ... irq latency FX?
Does this address the issue whereby catching up to all the omitted ticks at once -- by calling timer_tick() in a loop -- adds lots of IRQ latency? Seems that catching them all up at once would be a lot more efficient than the current ARM approach.
On one system I've seen that cause lots of trouble with the serial console driver, which happens to use PIO not DMA. The kernel runs nicely with an actual tick rate of about 3 timer IRQs per second, but that means it usually needs to catch up to HZ/3 ticks before it calls the PIO handler ... losing badly. Can't use the uparrow key to scroll back through BASH history etc.