Hi folks. The forthcoming kernel will be introducing a rather visible (and potentially important) change. Firstly, it will provide an option to select the default timer frequency. Secondly, the default frequency will now be 250HZ. For the record, the value used to be 100HZ and was rather drastically changed to 1000HZ for 2.6. It will now be possible to choose a HZ value of either 100, 250 or 1000.
The most interesting aspect of the timer is that it affects scheduling granularity. That is, the minimum possible time for which a given process is allowed to run before interruption. This is commonly known as a "jiffy" and can be calculated by dividing 1000 by the HZ value. So, the length of a jiffy in 2.6 at present is 1ms. In general, a longer jiffy results in more throughput but with more latency whereas a shorter jiffy results in less throughput (due to the overhead from frequent interruption) but with a reduction in latency. In my view, this is rather interesting in terms of the impact this can have on server performance. The help text for the new kernel configuration option explains further: - "Allows the configuration of the timer frequency. It is customary to have the timer interrupt run at 1000 HZ but 100 HZ may be more beneficial for servers and NUMA systems that do not need to have a fast response for user interaction and that may experience bus contention and cacheline bounces as a result of timer interrupts. Note that the timer interrupt occurs on each processor in an SMP environment leading to NR_CPUS * HZ number of timer interrupts per second." The specific help text for the 100HZ option is as follows: - "100 HZ is a typical choice for servers, SMP and NUMA systems with lots of processors that may show reduced performance if too many timer interrupts are occurring." For 250HZ: - "250 HZ is a good compromise choice allowing server performance while also showing good interactive responsiveness even on SMP and NUMA systems." For 1000HZ: - "1000 HZ is the preferred choice for desktop systems and other systems requiring fast interactive responses to events." Another interesting point that is not mentioned above is that reducing the timer frequency can significantly help to reduce time drift (particularly on "real" SMP systems where this effect may be even more pronounced). The reason that I am posting is twofold: * To notify everyone that this change is coming * To provide a means for anyone interested to test the ramifications of this newly acquired flexibility using a 2.6.12 kernel prior to the eventual release of 2.6.13. To that end I have grabbed the specific patch that adds the configuration option which can be found here: http://www.recruit2recruit.net/kerframil/2.6.12-i386-selectable-hz.patch This patch is taken directly from upstream (the option itself can be found at the bottom of the "Processor type and features" menu). Interestingly, the change has drawn attention to a few areas in the kernel where the methods used for timing (such as delay loops) are not ideal. So I have gone through all of the patches committed to the mainline tree since 2.6.12.3 and collected the ones that are relevant then aggregated them into one patch (the second one applies cleanly against a gentoo-sources tree): http://www.recruit2recruit.net/kerframil/2.6.12-timing-fixes-rollup.patch http://www.recruit2recruit.net/kerframil/2.6.12-gentoo-r7-timing-fixes-rollup.patch The individual patches are all very small and non-intrusive by nature. They are certainly not critical in order to be able to change the timer frequency but I recommend that they be used. For the curious, some notes on the contents of the patch can be found here here: http://www.recruit2recruit.net/kerframil/NOTES-2.6.12-timing-fixes-rollup What I would like is for anyone who is interested to put an alternate value (100 or 250) to the test (if they are in a position to be able to do so) and to determine whether there is any clear performance improvement with their workload. I switched my system (Compaq Proliant ML370) to 100HZ 2 days ago and can at least confirm that it is stable although I have not yet had the opportunity to perform any comparative benchmarks. If in doubt, then 250 might be a good value to try as, after all, that is going to be the default - at least for x86 ;) Finally, this topic has generated a nice little flame war over on the LKML (actually, it's quite an interesting thread if a little confusing at points): http://kerneltrap.org/node/5430 http://lkml.org/lkml/2005/7/8/259 Cheers, --Kerin Millar -- [email protected] mailing list
