http://www.lesswatts.org/tips/cpu.php#smpsched

Processor

Scheduler tunables for multi-socket systems

Multi-core, multi-threaded power-saving tunables for the process scheduler:
  • On platforms with multi-core and/or multi-threaded capable processors, the process scheduler in the Linux kernel (starting from version 2.6.18) provides a couple of tunables for power-savings. In the presence of lightly loaded scenarios (where the number of running tasks is less than the number of available logical CPUs in the system), these tunables will minimize the number of processor packages and CPU cores carrying the process load. This allows the other idle processor packages and idle cores in the system to go to the deepest idle state, saving power.


  • 'sched_mc_power_savings' tunable under /sys/devices/system/cpu/ controls the Multi-core related tunable. By default, this is set to '0' (for optimal performance). By setting this to '1', under light load scenarios, the process load is distributed such that all the cores in a processor package are busy before distributing the process load to other processor packages.

    The sched_mc_power_savings tunable can save a significant amount of power (multiple Watts) under workloads where there is idle time in the system. To enable the tunable, use this command:
echo 1 > /sys/devices/system/cpu/sched_mc_power_savings
  • 'sched_smt_power_savings' tunable under /sys/devices/system/cpu/ controls the multi-threading related tunable. By default, this is set to '0' (for optimal performance). By setting this to '1', under light load scenarios, the process load is distributed such that all the threads in a core and all the cores in a processor package are busy before distributing the process load to threads and cores, in other processor packages.

These power savings options may impact the performance of some applications. For some, there will be negligible impact on the performance. These tunables are provided in the hope that the administrators can use these tunables during off-hours (where they can tolerate some impact on performance) or turn on some of these options by default, if the corresponding power-savings policy has no impact on performance for a specific target workload that the administrator is interested in.


For more information: http://oss.intel.com/pdf/mclinux.pdf

P-states: Frequency control and ondemand-governor

Processor performance states (P-states) are a predefined set of frequency and voltage combinations at which the processor operates. With higher frequencies, you get higher performance, but to achieve that the voltage needs to be higher as well, which makes the processor consume more power. With P-states, the operating system can dynamically change the tradeoff between power and performance all the time.


Although changing from one voltage/frequency combination to another takes a bit of time, on current chips this time is actually really short. This time, as well as certain other characteristics, determine how the operating system should control the frequency/voltage combinations.


Some older x86 processors, as well as embedded processors, have a different behavior, and for that reason the Linux kernel implements several different algorithms for controlling (governing) the performance state that works best on the various processors.


The infrastructure for controlling the CPU's current P-state in Linux is known as CPUFREQ. This needs to be enabled in the running kernel, along with the different controlling algorithms, that is, governors. The governor that has the best behavior on current PC processors is the ondemand governer. While current distributions tend to enable this governer by default, if you're compiling your own kernel or just want to make sure your system is running optimally, you can find the commands and settings in this tip.


For current kernels. you can find information on what is running on your system by looking at the files in this directory: /sys/devices/system/cpu/cpu0/cpufreq If this directory is not present, there is a good chance that your kernel does not have the CPUFREQ feature enabled.


You can list the available governors by using this command:

# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_governors
ondemand userspace performance
In the example above, there are three governers available. In addition to the ondemand governer, there are the userspace and performance governers.


You can see what governor is currently active with this command:

# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
ondemand
You can also change the currently running governor by echoing one of the available governors to the scaling_governor file node:
# echo ondemand > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor 
For most distributions, the proper P-state governor is loaded automatically from the init script /etc/init.d/powernow or /etc/init.d/cpuspeed initialization scripts. But its worth checking to be sure just the same.


If you are compiling your own kernel, you want to make sure you enable at least the following options, to get the optimal CPU frequency/power control:

CONFIG_CPU_FREQ=y
CONFIG_CPU_FREQ_GOV_ONDEMAND=m
CONFIG_X86_ACPI_CPUFREQ=m
There are related governor and processor options that are ok to have enabled as well, but for current systems, the options above are the important ones to set.


You can tell that CONFIG_CPU_FREQ is enabled if /sys/devices/system/cpu/cpu0/cpufreq exists on your system.

Power aware interrupt balancing

Use the interrupt (IRQ) balancing technique to distribute the various interrupts, from the devices on the system, over the cores/processors in the system.


IRQ balancing is not new in Linux. It is a mechanism for load balancing the work done by different CPU cores running on a system. For a number of years, there has been a configuration option to compile automatic IRQ balancing into the kernel. It is recommended that you do not use that option. The implementation is rather simple and not optimal for current systems.


Doing a good job of IRQ balancing requires the load balancing policy manager to have a good understanding of the processor or multi-core configuration, regarding shared caches and front side bus (FSB) configuration, or NUMA topology. There is a new IRQ balance solution available for Linux, http://www.irqbalance.org, which is cache topology and is somewhat power aware. It achieves good behavior for balancing IRQs for modern multi-core/multi-socket systems.


irqbalance takes into account the new topologies of modern systems and does a better job at balancing the load of processing interrupts, as well as implementing a heuristic power save mode that thresholds on IRQ load. When the IRQ load is low, it will consolidate the IRQs to one core or CPU package. This helps to enable larger power savings on the other package by enabling it to take advantage of lower C and P states.


If you have a large multi-socket system running workloads with loads that vary, it is recommended that you use this new IRQBALANCE daemon to do your balancing. IRQ load balancing isn't worthwhile until you have more than one socket, or more than two CPU cores.


Reply via email to