hi, Paul E. McKenney,
we don't have enough knowledge to understand the performance impact of this
commit. since the data is stable, we still report out FYI what we see in our
tests. please educate us if this report is less meaningful. thanks
Hello,
kernel test robot noticed a 2.0% improvement of unixbench.throughput on:
commit: b41f5a411fb5f8c76c1d945ab391873414d01647 ("rcu: Clean up after the
SRCU-fastification of RCU Tasks Trace")
https://github.com/paulmckrcu/linux dev.2025.12.16a
testcase: unixbench
config: x86_64-rhel-9.4
compiler: gcc-14
test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz
(Ice Lake) with 256G memory
parameters:
runtime: 300s
nr_task: 100%
test: double
cpufreq_governor: performance
In addition to that, the commit also has significant impact on the following
tests:
+------------------+-------------------------------------------------------------------------------------------+
| testcase: change | unixbench: unixbench.throughput 2.1% improvement
|
| test machine | 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @
3.10GHz (Ice Lake) with 256G memory |
| test parameters | cpufreq_governor=performance
|
| | nr_task=100%
|
| | runtime=300s
|
| | test=long
|
+------------------+-------------------------------------------------------------------------------------------+
Details are as below:
-------------------------------------------------------------------------------------------------->
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20260107/[email protected]
=========================================================================================
compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
gcc-14/performance/x86_64-rhel-9.4/100%/debian-13-x86_64-20250902.cgz/300s/lkp-icl-2sp9/double/unixbench
commit:
14c7fd5dbf ("context_tracking: Remove
rcu_task_trace_heavyweight_{enter,exit}()")
b41f5a411f ("rcu: Clean up after the SRCU-fastification of RCU Tasks Trace")
14c7fd5dbfa07e79 b41f5a411fb5f8c76c1d945ab39
---------------- ---------------------------
%stddev %change %stddev
\ | \
13742762 ± 15% +31.8% 18115584 ± 15% meminfo.DirectMap2M
43251 -2.0% 42383 proc-vmstat.nr_slab_unreclaimable
2.114e+09 +2.0% 2.156e+09 unixbench.throughput
2.748e+11 +2.0% 2.803e+11 unixbench.workload
13.18 ± 51% -7.8 5.39 ±142%
perf-profile.calltrace.cycles-pp.drm_atomic_helper_commit_planes.drm_atomic_helper_commit_tail.ast_mode_config_helper_atomic_commit_tail.commit_tail.drm_atomic_helper_commit
13.12 ± 51% -7.7 5.39 ±142%
perf-profile.calltrace.cycles-pp.ast_primary_plane_helper_atomic_update.drm_atomic_helper_commit_planes.drm_atomic_helper_commit_tail.ast_mode_config_helper_atomic_commit_tail.commit_tail
13.12 ± 51% -7.7 5.39 ±142%
perf-profile.calltrace.cycles-pp.drm_fb_memcpy.ast_primary_plane_helper_atomic_update.drm_atomic_helper_commit_planes.drm_atomic_helper_commit_tail.ast_mode_config_helper_atomic_commit_tail
12.91 ± 52% -7.6 5.29 ±141%
perf-profile.calltrace.cycles-pp.memcpy_toio.drm_fb_memcpy.ast_primary_plane_helper_atomic_update.drm_atomic_helper_commit_planes.drm_atomic_helper_commit_tail
1.80 ± 26% -1.3 0.48 ±110%
perf-profile.calltrace.cycles-pp.setlocale
13.18 ± 51% -7.8 5.39 ±142%
perf-profile.children.cycles-pp.drm_atomic_helper_commit_planes
1.80 ± 26% -1.1 0.69 ± 52%
perf-profile.children.cycles-pp.setlocale
0.45 ± 86% +0.8 1.29 ± 37%
perf-profile.children.cycles-pp.folio_remove_rmap_ptes
0.34 ±121% +0.9 1.22 ± 32%
perf-profile.self.cycles-pp.folio_remove_rmap_ptes
8.198e+10 +2.0% 8.361e+10 perf-stat.i.branch-instructions
0.89 -0.1 0.75 ± 3% perf-stat.i.branch-miss-rate%
3.715e+08 -45.1% 2.04e+08 ± 2% perf-stat.i.branch-misses
1.09 -1.8% 1.07 perf-stat.i.cpi
1.713e+11 +2.0% 1.746e+11 perf-stat.i.instructions
0.97 +1.6% 0.98 perf-stat.i.ipc
0.45 -0.2 0.24 ± 2%
perf-stat.overall.branch-miss-rate%
1.02 -1.9% 1.00 perf-stat.overall.cpi
0.98 +2.0% 1.00 perf-stat.overall.ipc
8.136e+10 +2.0% 8.299e+10 perf-stat.ps.branch-instructions
3.687e+08 -45.1% 2.024e+08 ± 2% perf-stat.ps.branch-misses
1.7e+11 +2.0% 1.733e+11 perf-stat.ps.instructions
2.241e+13 +1.9% 2.283e+13 perf-stat.total.instructions
***************************************************************************************************
lkp-icl-2sp9: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz
(Ice Lake) with 256G memory
=========================================================================================
compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
gcc-14/performance/x86_64-rhel-9.4/100%/debian-13-x86_64-20250902.cgz/300s/lkp-icl-2sp9/long/unixbench
commit:
14c7fd5dbf ("context_tracking: Remove
rcu_task_trace_heavyweight_{enter,exit}()")
b41f5a411f ("rcu: Clean up after the SRCU-fastification of RCU Tasks Trace")
14c7fd5dbfa07e79 b41f5a411fb5f8c76c1d945ab39
---------------- ---------------------------
%stddev %change %stddev
\ | \
43238 -1.9% 42435 proc-vmstat.nr_slab_unreclaimable
2.113e+09 +2.1% 2.156e+09 unixbench.throughput
2.746e+11 +2.1% 2.803e+11 unixbench.workload
8.191e+10 +2.1% 8.36e+10 perf-stat.i.branch-instructions
0.91 ± 2% -0.2 0.72 perf-stat.i.branch-miss-rate%
3.799e+08 -46.3% 2.041e+08 ± 2% perf-stat.i.branch-misses
644407 ± 2% -3.0% 624964
perf-stat.i.cycles-between-cache-misses
1.711e+11 +2.1% 1.746e+11 perf-stat.i.instructions
0.97 +1.7% 0.98 perf-stat.i.ipc
0.46 -0.2 0.24 ± 2%
perf-stat.overall.branch-miss-rate%
1.03 -2.0% 1.00 perf-stat.overall.cpi
0.98 +2.1% 1.00 perf-stat.overall.ipc
8.13e+10 +2.1% 8.297e+10 perf-stat.ps.branch-instructions
3.771e+08 -46.3% 2.025e+08 ± 2% perf-stat.ps.branch-misses
1.698e+11 +2.1% 1.733e+11 perf-stat.ps.instructions
2.24e+13 +2.0% 2.286e+13 perf-stat.total.instructions
16.25 ± 81% -8.4 7.84 ±141%
perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
16.25 ± 81% -8.4 7.84 ±141%
perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
16.18 ± 80% -8.3 7.84 ±141%
perf-profile.calltrace.cycles-pp.console_flush_one_record.console_unlock.vprintk_emit.devkmsg_emit.devkmsg_write.cold
16.18 ± 80% -8.3 7.84 ±141%
perf-profile.calltrace.cycles-pp.console_unlock.vprintk_emit.devkmsg_emit.devkmsg_write.cold.vfs_write
16.18 ± 80% -8.3 7.84 ±141%
perf-profile.calltrace.cycles-pp.devkmsg_emit.devkmsg_write.cold.vfs_write.ksys_write.do_syscall_64
16.18 ± 80% -8.3 7.84 ±141%
perf-profile.calltrace.cycles-pp.devkmsg_write.cold.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
16.18 ± 80% -8.3 7.84 ±141%
perf-profile.calltrace.cycles-pp.vprintk_emit.devkmsg_emit.devkmsg_write.cold.vfs_write.ksys_write
15.38 ± 80% -7.7 7.63 ±141%
perf-profile.calltrace.cycles-pp.serial8250_console_write.console_flush_one_record.console_unlock.vprintk_emit.devkmsg_emit
13.33 ± 82% -6.4 6.90 ±141%
perf-profile.calltrace.cycles-pp.wait_for_lsr.serial8250_console_write.console_flush_one_record.console_unlock.vprintk_emit
10.50 ± 77% -4.9 5.65 ±141%
perf-profile.calltrace.cycles-pp.io_serial_in.wait_for_lsr.serial8250_console_write.console_flush_one_record.console_unlock
1.58 ± 22% +0.9 2.46 ± 20%
perf-profile.calltrace.cycles-pp.perf_mmap__push.record__mmap_read_evlist.cmd_record.run_builtin.handle_internal_command
1.58 ± 22% +0.9 2.46 ± 20%
perf-profile.calltrace.cycles-pp.record__mmap_read_evlist.cmd_record.run_builtin.handle_internal_command.main
1.52 ± 25% +0.9 2.46 ± 20%
perf-profile.calltrace.cycles-pp.record__pushfn.perf_mmap__push.record__mmap_read_evlist.cmd_record.run_builtin
16.18 ± 80% -8.3 7.84 ±141%
perf-profile.children.cycles-pp.console_flush_one_record
16.18 ± 80% -8.3 7.84 ±141%
perf-profile.children.cycles-pp.console_unlock
16.18 ± 80% -8.3 7.84 ±141%
perf-profile.children.cycles-pp.devkmsg_emit
16.18 ± 80% -8.3 7.84 ±141%
perf-profile.children.cycles-pp.devkmsg_write.cold
16.18 ± 80% -8.3 7.84 ±141%
perf-profile.children.cycles-pp.vprintk_emit
15.38 ± 80% -7.9 7.52 ±141%
perf-profile.children.cycles-pp.serial8250_console_write
13.92 ± 81% -6.8 7.11 ±141%
perf-profile.children.cycles-pp.wait_for_lsr
11.03 ± 77% -5.3 5.74 ±141%
perf-profile.children.cycles-pp.io_serial_in
1.58 ± 22% +0.9 2.46 ± 20%
perf-profile.children.cycles-pp.perf_mmap__push
1.58 ± 22% +0.9 2.46 ± 20%
perf-profile.children.cycles-pp.record__mmap_read_evlist
1.52 ± 25% +0.9 2.46 ± 20%
perf-profile.children.cycles-pp.record__pushfn
11.03 ± 77% -5.3 5.74 ±141%
perf-profile.self.cycles-pp.io_serial_in
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki