http://www.lesswatts.org/tips/cpu.php#smpsched
Processor
Multi-core, multi-threaded
power-saving tunables for the process
scheduler:
- On platforms with
multi-core and/or multi-threaded capable processors, the process
scheduler in the Linux kernel (starting from version 2.6.18) provides a
couple of tunables for power-savings. In the presence of lightly loaded
scenarios (where the number of running tasks is less than the number of
available logical CPUs in the system), these tunables will minimize the
number of processor packages and CPU cores carrying the process load.
This allows the other idle processor packages and idle cores in the
system to go to the deepest idle state, saving power.
- 'sched_mc_power_savings'
tunable under /sys/devices/system/cpu/ controls the Multi-core related
tunable. By default, this is set to '0' (for optimal performance). By
setting this to '1', under light load scenarios, the process load is
distributed such that all the cores in a processor package are busy
before distributing the process load to other processor packages.
The sched_mc_power_savings tunable can save a significant amount of
power (multiple Watts)
under workloads where there is idle time in the system. To enable the
tunable, use this command:
echo 1 > /sys/devices/system/cpu/sched_mc_power_savings
-
'sched_smt_power_savings' tunable under /sys/devices/system/cpu/
controls the multi-threading related tunable. By default, this is set
to '0' (for optimal performance). By setting this to '1', under light
load scenarios, the process load is distributed such that all the
threads in a core and all the cores in a processor package are busy
before distributing the process load to threads and cores, in other
processor packages.
These power savings options may impact the performance of some
applications. For some, there will be negligible impact on the
performance.
These tunables are provided in the hope that the administrators can use
these tunables during off-hours (where they can tolerate some impact on
performance) or turn on some of these options by default, if the
corresponding power-savings policy has no impact on performance for a
specific target workload that the administrator is interested in.
For more information: http://oss.intel.com/pdf/mclinux.pdf
P-states: Frequency
control and ondemand-governor
Processor performance
states (P-states) are a predefined set of
frequency and voltage combinations at which the processor operates.
With higher frequencies, you get higher performance, but to achieve
that the voltage needs to be higher as well, which makes the processor
consume more power. With P-states, the operating system can dynamically
change the tradeoff between power and performance all the time.
Although changing from one voltage/frequency combination to another
takes a bit of time,
on current chips this time is actually really short. This time, as well
as certain other characteristics,
determine how the operating system should control the frequency/voltage
combinations.
Some older x86 processors, as well as embedded processors,
have a different behavior, and for that reason the Linux kernel
implements several different algorithms for controlling (governing)
the performance state that works best on the various processors.
The infrastructure for controlling the CPU's current P-state in Linux
is known as
CPUFREQ. This needs to be enabled in the running kernel, along with the
different controlling algorithms, that is, governors. The governor that
has the best behavior
on current PC processors is the ondemand
governer. While current distributions tend to
enable this governer by default, if you're compiling your own kernel or
just want to make sure your system is running optimally, you can find
the commands and settings in this tip.
For current kernels. you can find information on what is running on
your
system by looking at the files in this directory:
/sys/devices/system/cpu/cpu0/cpufreq If this directory is not present,
there is a good chance that your kernel does not have the CPUFREQ
feature enabled.
You can list the available governors by using this command:
# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_governors
ondemand userspace performance
In the example above,
there are three governers available. In addition to the ondemand
governer, there are
the userspace and performance governers.
You can see what governor is currently active with this command:
# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
ondemand
You can also change the
currently running governor by echoing one of the
available governors to the scaling_governor file node:
# echo ondemand > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
For most distributions,
the proper P-state governor is loaded automatically
from the init script /etc/init.d/powernow or /etc/init.d/cpuspeed
initialization scripts. But its worth checking to be sure just the
same.
If you are compiling your own kernel, you want to make sure you
enable at least the following options, to get the optimal CPU
frequency/power control:
CONFIG_CPU_FREQ=y
CONFIG_CPU_FREQ_GOV_ONDEMAND=m
CONFIG_X86_ACPI_CPUFREQ=m
There are related governor
and processor options that are ok to have enabled
as well, but for current systems, the options above are the important
ones to
set.
You can tell that CONFIG_CPU_FREQ is enabled if
/sys/devices/system/cpu/cpu0/cpufreq exists on your system.
Power aware interrupt
balancing
Use the interrupt (IRQ)
balancing technique to distribute the various interrupts, from the
devices on the system, over the cores/processors in the system.
IRQ balancing is not new in Linux. It is a mechanism for load
balancing the work done by different CPU cores running on a system. For
a number of years, there
has been a configuration option to compile automatic IRQ
balancing into the kernel. It is recommended that you do not use that
option. The implementation is rather simple and not optimal for current
systems.
Doing a good job of IRQ balancing requires the load balancing policy
manager to have a good
understanding of the processor or multi-core configuration, regarding
shared caches
and front side bus (FSB) configuration, or NUMA topology. There is a
new IRQ
balance solution available for Linux, http://www.irqbalance.org, which
is cache topology and is somewhat power aware. It achieves good
behavior
for balancing IRQs for modern multi-core/multi-socket systems.
irqbalance takes into account the new topologies of modern systems and
does a better
job at balancing the load of processing interrupts, as well as
implementing a
heuristic power save mode that thresholds on IRQ load. When the IRQ
load is
low, it will consolidate the IRQs to one core or CPU package. This
helps to
enable larger power savings on the other package by enabling it to take
advantage of lower C and P states.
If you have a large multi-socket system running workloads
with loads that vary, it is recommended that you use this new
IRQBALANCE daemon to do your
balancing. IRQ load balancing isn't worthwhile until you have more
than one socket, or more than two CPU cores.