hi, Paul E. McKenney,

we don't have enough knowledge to understand the performance impact of this
commit. since the data is stable, we still report out FYI what we see in our
tests. please educate us if this report is less meaningful. thanks


Hello,

kernel test robot noticed a 2.0% improvement of unixbench.throughput on:


commit: b41f5a411fb5f8c76c1d945ab391873414d01647 ("rcu: Clean up after the 
SRCU-fastification of RCU Tasks Trace")
https://github.com/paulmckrcu/linux dev.2025.12.16a

testcase: unixbench
config: x86_64-rhel-9.4
compiler: gcc-14
test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz 
(Ice Lake) with 256G memory
parameters:

        runtime: 300s
        nr_task: 100%
        test: double
        cpufreq_governor: performance


In addition to that, the commit also has significant impact on the following 
tests:

+------------------+-------------------------------------------------------------------------------------------+
| testcase: change | unixbench: unixbench.throughput 2.1% improvement           
                               |
| test machine     | 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 
3.10GHz (Ice Lake) with 256G memory |
| test parameters  | cpufreq_governor=performance                               
                               |
|                  | nr_task=100%                                               
                               |
|                  | runtime=300s                                               
                               |
|                  | test=long                                                  
                               |
+------------------+-------------------------------------------------------------------------------------------+




Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20260107/[email protected]

=========================================================================================
compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
  
gcc-14/performance/x86_64-rhel-9.4/100%/debian-13-x86_64-20250902.cgz/300s/lkp-icl-2sp9/double/unixbench

commit: 
  14c7fd5dbf ("context_tracking: Remove 
rcu_task_trace_heavyweight_{enter,exit}()")
  b41f5a411f ("rcu: Clean up after the SRCU-fastification of RCU Tasks Trace")

14c7fd5dbfa07e79 b41f5a411fb5f8c76c1d945ab39 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
  13742762 ± 15%     +31.8%   18115584 ± 15%  meminfo.DirectMap2M
     43251            -2.0%      42383        proc-vmstat.nr_slab_unreclaimable
 2.114e+09            +2.0%  2.156e+09        unixbench.throughput
 2.748e+11            +2.0%  2.803e+11        unixbench.workload
     13.18 ± 51%      -7.8        5.39 ±142%  
perf-profile.calltrace.cycles-pp.drm_atomic_helper_commit_planes.drm_atomic_helper_commit_tail.ast_mode_config_helper_atomic_commit_tail.commit_tail.drm_atomic_helper_commit
     13.12 ± 51%      -7.7        5.39 ±142%  
perf-profile.calltrace.cycles-pp.ast_primary_plane_helper_atomic_update.drm_atomic_helper_commit_planes.drm_atomic_helper_commit_tail.ast_mode_config_helper_atomic_commit_tail.commit_tail
     13.12 ± 51%      -7.7        5.39 ±142%  
perf-profile.calltrace.cycles-pp.drm_fb_memcpy.ast_primary_plane_helper_atomic_update.drm_atomic_helper_commit_planes.drm_atomic_helper_commit_tail.ast_mode_config_helper_atomic_commit_tail
     12.91 ± 52%      -7.6        5.29 ±141%  
perf-profile.calltrace.cycles-pp.memcpy_toio.drm_fb_memcpy.ast_primary_plane_helper_atomic_update.drm_atomic_helper_commit_planes.drm_atomic_helper_commit_tail
      1.80 ± 26%      -1.3        0.48 ±110%  
perf-profile.calltrace.cycles-pp.setlocale
     13.18 ± 51%      -7.8        5.39 ±142%  
perf-profile.children.cycles-pp.drm_atomic_helper_commit_planes
      1.80 ± 26%      -1.1        0.69 ± 52%  
perf-profile.children.cycles-pp.setlocale
      0.45 ± 86%      +0.8        1.29 ± 37%  
perf-profile.children.cycles-pp.folio_remove_rmap_ptes
      0.34 ±121%      +0.9        1.22 ± 32%  
perf-profile.self.cycles-pp.folio_remove_rmap_ptes
 8.198e+10            +2.0%  8.361e+10        perf-stat.i.branch-instructions
      0.89            -0.1        0.75 ±  3%  perf-stat.i.branch-miss-rate%
 3.715e+08           -45.1%   2.04e+08 ±  2%  perf-stat.i.branch-misses
      1.09            -1.8%       1.07        perf-stat.i.cpi
 1.713e+11            +2.0%  1.746e+11        perf-stat.i.instructions
      0.97            +1.6%       0.98        perf-stat.i.ipc
      0.45            -0.2        0.24 ±  2%  
perf-stat.overall.branch-miss-rate%
      1.02            -1.9%       1.00        perf-stat.overall.cpi
      0.98            +2.0%       1.00        perf-stat.overall.ipc
 8.136e+10            +2.0%  8.299e+10        perf-stat.ps.branch-instructions
 3.687e+08           -45.1%  2.024e+08 ±  2%  perf-stat.ps.branch-misses
   1.7e+11            +2.0%  1.733e+11        perf-stat.ps.instructions
 2.241e+13            +1.9%  2.283e+13        perf-stat.total.instructions


***************************************************************************************************
lkp-icl-2sp9: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz 
(Ice Lake) with 256G memory
=========================================================================================
compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
  
gcc-14/performance/x86_64-rhel-9.4/100%/debian-13-x86_64-20250902.cgz/300s/lkp-icl-2sp9/long/unixbench

commit: 
  14c7fd5dbf ("context_tracking: Remove 
rcu_task_trace_heavyweight_{enter,exit}()")
  b41f5a411f ("rcu: Clean up after the SRCU-fastification of RCU Tasks Trace")

14c7fd5dbfa07e79 b41f5a411fb5f8c76c1d945ab39 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
     43238            -1.9%      42435        proc-vmstat.nr_slab_unreclaimable
 2.113e+09            +2.1%  2.156e+09        unixbench.throughput
 2.746e+11            +2.1%  2.803e+11        unixbench.workload
 8.191e+10            +2.1%   8.36e+10        perf-stat.i.branch-instructions
      0.91 ±  2%      -0.2        0.72        perf-stat.i.branch-miss-rate%
 3.799e+08           -46.3%  2.041e+08 ±  2%  perf-stat.i.branch-misses
    644407 ±  2%      -3.0%     624964        
perf-stat.i.cycles-between-cache-misses
 1.711e+11            +2.1%  1.746e+11        perf-stat.i.instructions
      0.97            +1.7%       0.98        perf-stat.i.ipc
      0.46            -0.2        0.24 ±  2%  
perf-stat.overall.branch-miss-rate%
      1.03            -2.0%       1.00        perf-stat.overall.cpi
      0.98            +2.1%       1.00        perf-stat.overall.ipc
  8.13e+10            +2.1%  8.297e+10        perf-stat.ps.branch-instructions
 3.771e+08           -46.3%  2.025e+08 ±  2%  perf-stat.ps.branch-misses
 1.698e+11            +2.1%  1.733e+11        perf-stat.ps.instructions
  2.24e+13            +2.0%  2.286e+13        perf-stat.total.instructions
     16.25 ± 81%      -8.4        7.84 ±141%  
perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
     16.25 ± 81%      -8.4        7.84 ±141%  
perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
     16.18 ± 80%      -8.3        7.84 ±141%  
perf-profile.calltrace.cycles-pp.console_flush_one_record.console_unlock.vprintk_emit.devkmsg_emit.devkmsg_write.cold
     16.18 ± 80%      -8.3        7.84 ±141%  
perf-profile.calltrace.cycles-pp.console_unlock.vprintk_emit.devkmsg_emit.devkmsg_write.cold.vfs_write
     16.18 ± 80%      -8.3        7.84 ±141%  
perf-profile.calltrace.cycles-pp.devkmsg_emit.devkmsg_write.cold.vfs_write.ksys_write.do_syscall_64
     16.18 ± 80%      -8.3        7.84 ±141%  
perf-profile.calltrace.cycles-pp.devkmsg_write.cold.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
     16.18 ± 80%      -8.3        7.84 ±141%  
perf-profile.calltrace.cycles-pp.vprintk_emit.devkmsg_emit.devkmsg_write.cold.vfs_write.ksys_write
     15.38 ± 80%      -7.7        7.63 ±141%  
perf-profile.calltrace.cycles-pp.serial8250_console_write.console_flush_one_record.console_unlock.vprintk_emit.devkmsg_emit
     13.33 ± 82%      -6.4        6.90 ±141%  
perf-profile.calltrace.cycles-pp.wait_for_lsr.serial8250_console_write.console_flush_one_record.console_unlock.vprintk_emit
     10.50 ± 77%      -4.9        5.65 ±141%  
perf-profile.calltrace.cycles-pp.io_serial_in.wait_for_lsr.serial8250_console_write.console_flush_one_record.console_unlock
      1.58 ± 22%      +0.9        2.46 ± 20%  
perf-profile.calltrace.cycles-pp.perf_mmap__push.record__mmap_read_evlist.cmd_record.run_builtin.handle_internal_command
      1.58 ± 22%      +0.9        2.46 ± 20%  
perf-profile.calltrace.cycles-pp.record__mmap_read_evlist.cmd_record.run_builtin.handle_internal_command.main
      1.52 ± 25%      +0.9        2.46 ± 20%  
perf-profile.calltrace.cycles-pp.record__pushfn.perf_mmap__push.record__mmap_read_evlist.cmd_record.run_builtin
     16.18 ± 80%      -8.3        7.84 ±141%  
perf-profile.children.cycles-pp.console_flush_one_record
     16.18 ± 80%      -8.3        7.84 ±141%  
perf-profile.children.cycles-pp.console_unlock
     16.18 ± 80%      -8.3        7.84 ±141%  
perf-profile.children.cycles-pp.devkmsg_emit
     16.18 ± 80%      -8.3        7.84 ±141%  
perf-profile.children.cycles-pp.devkmsg_write.cold
     16.18 ± 80%      -8.3        7.84 ±141%  
perf-profile.children.cycles-pp.vprintk_emit
     15.38 ± 80%      -7.9        7.52 ±141%  
perf-profile.children.cycles-pp.serial8250_console_write
     13.92 ± 81%      -6.8        7.11 ±141%  
perf-profile.children.cycles-pp.wait_for_lsr
     11.03 ± 77%      -5.3        5.74 ±141%  
perf-profile.children.cycles-pp.io_serial_in
      1.58 ± 22%      +0.9        2.46 ± 20%  
perf-profile.children.cycles-pp.perf_mmap__push
      1.58 ± 22%      +0.9        2.46 ± 20%  
perf-profile.children.cycles-pp.record__mmap_read_evlist
      1.52 ± 25%      +0.9        2.46 ± 20%  
perf-profile.children.cycles-pp.record__pushfn
     11.03 ± 77%      -5.3        5.74 ±141%  
perf-profile.self.cycles-pp.io_serial_in





Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


Reply via email to