Changes in v2:
 - Optimized the logic in descriptions. (Song Liu)
 - Created a new header file to declare kfuncs for future extensions included 
by other files. (Christian Loehle)
 - Fixed some logical issues in the code. (Christian Loehle)

Reference:
[1] https://lore.kernel.org/bpf/20250829101137.9507-1-yikai....@vivo.com/

Summary
----------
Hi, everyone,
This patch set introduces an extensible cpuidle governor framework
using BPF struct_ops, enabling dynamic implementation of idle-state selection 
policies
via BPF programs.

Motivation
----------
As is well-known, CPUs support multiple idle states (e.g., C0, C1, C2, ...),
where deeper states reduce power consumption, but results in longer wakeup 
latency,
potentially affecting performance. 
Existing generic cpuidle governors operate effectively in common scenarios
but exhibit suboptimal behavior in specific Android phone's use cases.

Our testing reveals that during low-utilization scenarios
(e.g., screen-off background tasks like music playback with CPU utilization 
<10%),
the C0 state occupies ~50% of idle time, causing significant energy 
inefficiency.
Reducing C0 to ≤20% could yield ≥5% power savings on mobile phones.

To address this, we expect:
  1.Dynamic governor switching to power-saved policies for low cpu utilization 
scenarios (e.g., screen-off mode)
  2.Dynamic switching to alternate governors for high-performance scenarios 
(e.g., gaming)

OverView
----------
The BPF cpuidle ext governor registers at postcore_initcall()
but remains disabled by default due to its low priority "rating" with value "1".
Activation requires adjust higer "rating" than other governors within BPF.

Core Components:
1.**struct cpuidle_gov_ext_ops** – BPF-overridable operations:
- ops.enable()/ops.disable(): enable or disable callback
- ops.select(): cpu Idle-state selection logic
- ops.set_stop_tick(): Scheduler tick management after state selection
- ops.reflect(): feedback info about previous idle state.
- ops.init()/ops.deinit(): Initialization or cleanup.

2.**Critical kfuncs for kernel state access**:
- bpf_cpuidle_ext_gov_update_rating(): 
  Activate ext governor by raising rating must be called from "ops.init()"
- bpf_cpuidle_ext_gov_latency_req(): get idle-state latency constraints
- bpf_tick_nohz_get_sleep_length(): get CPU sleep duration in tickless mode

Future work
----------
1. Scenario detection: Identifying low-utilization states (e.g., screen-off + 
background music)
2. Policy optimization: Optimizing state-selection algorithms for specific 
scenarios

Is it related to sched_ext?
---------------------------
The cpuidle framework is as follows.
  ----------------------------------------------------------
                 Scheduler Core
  ----------------------------------------------------------
                     |
                     v
  ----------------------------------------------------------
| FAIR Class | EXT Class |           IDLE Class           |
  ----------------------------------------------------------
|            |           |              |
|            |           |              v
|            |           |      ------------------------
|            |           |          enter_cpu_idle()
|            |           |      ------------------------
|            |           |              |
|            |           |              v
|            |           |   ------------------------------
|            |           |       | CPUIDLE Governor |
|            |           |   ------------------------------
|            |           |     |            |           |
|            |           |     v            v           v
|            |           |-----------------------------------
|            |           | default   | |   other  | | BPF ext  |
|            |           | Governor  | | Governor | | Governor |  <<===Here is 
the feature we add.
|            |           |-----------------------------------
|            |           |     |            |           |
|            |           |     v            v           v
|            |           |-------------------------------------
|            |           |           select idle state
|            |           |-------------------------------------

Whereas cpuidle is invoked after switching to idle class when no tasks are 
present in the scheduling RQ.
They are not directly related, so implementing kfuncs or other extensions 
through sched_ext is not feasible.


Lin Yikai (2):
  cpuidle: Implement BPF extensible cpuidle governor class
  selftests/bpf: Add selftests for cpuidle_gov_ext

 drivers/cpuidle/Kconfig                       |  12 +
 drivers/cpuidle/governors/Makefile            |   1 +
 drivers/cpuidle/governors/ext.c               | 537 ++++++++++++++++++
 .../bpf/prog_tests/test_cpuidle_gov_ext.c     |  28 +
 .../selftests/bpf/progs/cpuidle_common.h      |  13 +
 .../selftests/bpf/progs/cpuidle_gov_ext.c     | 200 +++++++
 6 files changed, 791 insertions(+)
 create mode 100644 drivers/cpuidle/governors/ext.c
 create mode 100644 
tools/testing/selftests/bpf/prog_tests/test_cpuidle_gov_ext.c
 create mode 100644 tools/testing/selftests/bpf/progs/cpuidle_common.h
 create mode 100644 tools/testing/selftests/bpf/progs/cpuidle_gov_ext.c

-- 
2.43.0


Reply via email to