hi, Paul E. McKenney,

similar to b41f5a411f report we just made out. we make this report based on
stable data.
please educate us if this report is less meaningful. thanks


Hello,

kernel test robot noticed a 3.4% improvement of stress-ng.memfd.ops_per_sec on:


commit: 1ac50ec62874025381a864f784583dbdc30dcc7c ("rcu: Re-implement RCU Tasks 
Trace in terms of SRCU-fast")
https://github.com/paulmckrcu/linux dev.2025.12.16a

testcase: stress-ng
config: x86_64-rhel-9.4
compiler: gcc-14
test machine: 192 threads 2 sockets Intel(R) Xeon(R) 6740E  CPU @ 2.4GHz 
(Sierra Forest) with 256G memory
parameters:

        nr_threads: 100%
        testtime: 60s
        test: memfd
        cpufreq_governor: performance



Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20260107/[email protected]

=========================================================================================
compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
  
gcc-14/performance/x86_64-rhel-9.4/100%/debian-13-x86_64-20250902.cgz/lkp-srf-2sp2/memfd/stress-ng/60s

commit: 
  43c23963b3 ("tracing: Guard __DECLARE_TRACE() use of __DO_TRACE_CALL() with 
SRCU-fast")
  1ac50ec628 ("rcu: Re-implement RCU Tasks Trace in terms of SRCU-fast")

43c23963b3c549da 1ac50ec62874025381a864f7845 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
    203232 ± 22%     -21.8%     158902 ± 18%  numa-meminfo.node1.Mapped
     50977 ± 22%     -21.8%      39853 ± 18%  numa-vmstat.node1.nr_mapped
      7039            +1.6%       7152        vmstat.system.cs
    107613            -3.9%     103453        
stress-ng.memfd.nanosecs_per_memfd_create_call
    193537            +3.4%     200175        stress-ng.memfd.ops
      3226            +3.4%       3337        stress-ng.memfd.ops_per_sec
    187908            +1.6%     190921        
stress-ng.time.involuntary_context_switches
  99134672            +3.4%  1.025e+08        stress-ng.time.minor_page_faults
     61965 ±  3%      -6.2%      58116        proc-vmstat.nr_mapped
 1.526e+08            +3.4%  1.578e+08        proc-vmstat.numa_hit
 1.524e+08            +3.4%  1.576e+08        proc-vmstat.numa_local
 1.631e+08            +3.4%  1.687e+08        proc-vmstat.pgalloc_normal
  99574028            +3.4%   1.03e+08        proc-vmstat.pgfault
 1.624e+08            +3.4%  1.679e+08        proc-vmstat.pgfree
      2.26            +1.1%       2.28        perf-stat.i.MPKI
 1.646e+10            +1.8%  1.675e+10        perf-stat.i.branch-instructions
      0.24            +0.0        0.25        perf-stat.i.branch-miss-rate%
  38660390            +6.8%   41301381        perf-stat.i.branch-misses
 1.714e+08            +3.1%  1.768e+08        perf-stat.i.cache-misses
 2.884e+08            +3.3%  2.979e+08        perf-stat.i.cache-references
      6758            +1.3%       6846        perf-stat.i.context-switches
      7.88            -2.0%       7.73        perf-stat.i.cpi
      3504            -3.1%       3397        
perf-stat.i.cycles-between-cache-misses
 7.628e+10            +2.0%  7.783e+10        perf-stat.i.instructions
      0.13            +2.0%       0.13        perf-stat.i.ipc
     17.01            +3.5%      17.59        perf-stat.i.metric.K/sec
   1632582            +3.5%    1689088        perf-stat.i.minor-faults
   1632582            +3.5%    1689088        perf-stat.i.page-faults
      2.25            +1.1%       2.27        perf-stat.overall.MPKI
      0.23            +0.0        0.25        
perf-stat.overall.branch-miss-rate%
      7.91            -2.0%       7.76        perf-stat.overall.cpi
      3519            -3.1%       3411        
perf-stat.overall.cycles-between-cache-misses
      0.13            +2.0%       0.13        perf-stat.overall.ipc
 1.619e+10            +1.8%  1.648e+10        perf-stat.ps.branch-instructions
  37927942            +6.8%   40525278        perf-stat.ps.branch-misses
 1.687e+08            +3.1%   1.74e+08        perf-stat.ps.cache-misses
 2.841e+08            +3.3%  2.933e+08        perf-stat.ps.cache-references
      6638            +1.4%       6729        perf-stat.ps.context-switches
 7.503e+10            +2.0%  7.654e+10        perf-stat.ps.instructions
   1606108            +3.5%    1661536        perf-stat.ps.minor-faults
   1606108            +3.5%    1661536        perf-stat.ps.page-faults
 4.564e+12            +1.8%  4.647e+12        perf-stat.total.instructions
     46.05            -0.3       45.79        
perf-profile.calltrace.cycles-pp._raw_spin_lock.inode_sb_list_add.new_inode.__shmem_get_inode.__shmem_file_setup
     45.93            -0.3       45.68        
perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.inode_sb_list_add.new_inode.__shmem_get_inode
     46.36            -0.3       46.10        
perf-profile.calltrace.cycles-pp.__shmem_get_inode.__shmem_file_setup.__x64_sys_memfd_create.do_syscall_64.entry_SYSCALL_64_after_hwframe
     46.12            -0.3       45.87        
perf-profile.calltrace.cycles-pp.inode_sb_list_add.new_inode.__shmem_get_inode.__shmem_file_setup.__x64_sys_memfd_create
     46.26            -0.3       46.01        
perf-profile.calltrace.cycles-pp.new_inode.__shmem_get_inode.__shmem_file_setup.__x64_sys_memfd_create.do_syscall_64
     46.57            -0.2       46.32        
perf-profile.calltrace.cycles-pp.__shmem_file_setup.__x64_sys_memfd_create.do_syscall_64.entry_SYSCALL_64_after_hwframe.memfd_create
     46.62            -0.2       46.38        
perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.memfd_create
     46.61            -0.2       46.37        
perf-profile.calltrace.cycles-pp.__x64_sys_memfd_create.do_syscall_64.entry_SYSCALL_64_after_hwframe.memfd_create
     46.64            -0.2       46.40        
perf-profile.calltrace.cycles-pp.memfd_create
     46.62            -0.2       46.37        
perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.memfd_create
     45.57            -0.2       45.38        
perf-profile.calltrace.cycles-pp._raw_spin_lock.evict.__dentry_kill.finish_dput.__fput
     45.40            -0.2       45.22        
perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.evict.__dentry_kill.finish_dput
     46.69            -0.2       46.53        
perf-profile.calltrace.cycles-pp.__fput.task_work_run.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
     46.61            -0.2       46.45        
perf-profile.calltrace.cycles-pp.finish_dput.__fput.task_work_run.exit_to_user_mode_loop.do_syscall_64
     46.73            -0.2       46.57        
perf-profile.calltrace.cycles-pp.close_range
     46.71            -0.2       46.55        
perf-profile.calltrace.cycles-pp.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe.close_range
     46.71            -0.2       46.55        
perf-profile.calltrace.cycles-pp.task_work_run.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe.close_range
     46.73            -0.2       46.57        
perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.close_range
     46.73            -0.2       46.57        
perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.close_range
     46.60            -0.2       46.45        
perf-profile.calltrace.cycles-pp.__dentry_kill.finish_dput.__fput.task_work_run.exit_to_user_mode_loop
     46.40            -0.2       46.24        
perf-profile.calltrace.cycles-pp.evict.__dentry_kill.finish_dput.__fput.task_work_run
      0.59            +0.0        0.60        
perf-profile.calltrace.cycles-pp.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
      0.56            +0.0        0.57        
perf-profile.calltrace.cycles-pp.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64
      0.62            +0.0        0.64        
perf-profile.calltrace.cycles-pp.__munmap
      0.59            +0.0        0.60        
perf-profile.calltrace.cycles-pp.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
      0.57            +0.0        0.59        
perf-profile.calltrace.cycles-pp.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.59            +0.0        0.61        
perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__munmap
      0.59            +0.0        0.61        
perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
      0.98            +0.0        1.01        
perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
      0.94            +0.0        0.97        
perf-profile.calltrace.cycles-pp.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault
      0.94            +0.0        0.97        
perf-profile.calltrace.cycles-pp.do_shared_fault.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
      1.01            +0.0        1.04        
perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.stress_memfd_child
      0.83            +0.0        0.86        
perf-profile.calltrace.cycles-pp.shmem_get_folio_gfp.shmem_fault.__do_fault.do_shared_fault.do_fault
      0.84            +0.0        0.87        
perf-profile.calltrace.cycles-pp.shmem_fault.__do_fault.do_shared_fault.do_fault.__handle_mm_fault
      0.84            +0.0        0.88        
perf-profile.calltrace.cycles-pp.__do_fault.do_shared_fault.do_fault.__handle_mm_fault.handle_mm_fault
      1.08            +0.0        1.12        
perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.stress_memfd_child
      1.09            +0.0        1.14        
perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault.stress_memfd_child
      1.25            +0.1        1.30        
perf-profile.calltrace.cycles-pp.asm_exc_page_fault.stress_memfd_child
      1.26            +0.1        1.32        
perf-profile.calltrace.cycles-pp.stress_memfd_child
      0.81 ±  2%      +0.1        0.91 ±  2%  
perf-profile.calltrace.cycles-pp.mmap_region.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64
      0.80 ±  3%      +0.1        0.90 ±  2%  
perf-profile.calltrace.cycles-pp.__mmap_region.mmap_region.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff
      0.99 ±  2%      +0.1        1.11        
perf-profile.calltrace.cycles-pp.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe
      1.00 ±  2%      +0.1        1.13        
perf-profile.calltrace.cycles-pp.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe.__mmap
      1.03 ±  2%      +0.1        1.16        
perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__mmap
      1.02 ±  2%      +0.1        1.15        
perf-profile.calltrace.cycles-pp.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe.__mmap
      1.03 ±  2%      +0.1        1.16        
perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__mmap
      1.21            +0.1        1.35        
perf-profile.calltrace.cycles-pp.__mmap
     92.36            -0.4       91.92        
perf-profile.children.cycles-pp._raw_spin_lock
     92.52            -0.3       92.17        
perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
     46.12            -0.3       45.87        
perf-profile.children.cycles-pp.inode_sb_list_add
     46.26            -0.3       46.01        
perf-profile.children.cycles-pp.new_inode
     46.36            -0.3       46.10        
perf-profile.children.cycles-pp.__shmem_get_inode
     46.57            -0.2       46.32        
perf-profile.children.cycles-pp.__shmem_file_setup
     46.61            -0.2       46.37        
perf-profile.children.cycles-pp.__x64_sys_memfd_create
     46.65            -0.2       46.41        
perf-profile.children.cycles-pp.memfd_create
     47.18            -0.2       47.01        
perf-profile.children.cycles-pp.__fput
     47.10            -0.2       46.94        
perf-profile.children.cycles-pp.finish_dput
     46.73            -0.2       46.57        
perf-profile.children.cycles-pp.close_range
     47.09            -0.2       46.93        
perf-profile.children.cycles-pp.__dentry_kill
     46.88            -0.2       46.72        
perf-profile.children.cycles-pp.evict
     46.71            -0.2       46.55        
perf-profile.children.cycles-pp.exit_to_user_mode_loop
     46.71            -0.2       46.55        
perf-profile.children.cycles-pp.task_work_run
     97.60            -0.1       97.51        
perf-profile.children.cycles-pp.do_syscall_64
     97.61            -0.1       97.53        
perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
      0.14            +0.0        0.15        
perf-profile.children.cycles-pp.xas_create
      0.11            +0.0        0.12        
perf-profile.children.cycles-pp.__alloc_frozen_pages_noprof
      0.11            +0.0        0.12        
perf-profile.children.cycles-pp.folio_alloc_mpol_noprof
      0.07            +0.0        0.08        
perf-profile.children.cycles-pp.mas_rev_awalk
      0.12            +0.0        0.13        
perf-profile.children.cycles-pp.alloc_pages_mpol
      0.12            +0.0        0.13        
perf-profile.children.cycles-pp.native_flush_tlb_one_user
      0.29            +0.0        0.30        
perf-profile.children.cycles-pp.kmem_cache_free
      0.09 ±  5%      +0.0        0.10 ±  4%  
perf-profile.children.cycles-pp.xas_expand
      0.17 ±  2%      +0.0        0.18 ±  2%  
perf-profile.children.cycles-pp.kmem_cache_alloc_lru_noprof
      0.22            +0.0        0.24 ±  2%  
perf-profile.children.cycles-pp.xas_store
      0.13 ±  3%      +0.0        0.15 ±  3%  
perf-profile.children.cycles-pp.flush_tlb_func
      0.47            +0.0        0.49        
perf-profile.children.cycles-pp.kthread
      0.47            +0.0        0.49        
perf-profile.children.cycles-pp.ret_from_fork
      0.47            +0.0        0.49        
perf-profile.children.cycles-pp.ret_from_fork_asm
      0.23            +0.0        0.25        
perf-profile.children.cycles-pp.shmem_add_to_page_cache
      0.12 ±  4%      +0.0        0.14        
perf-profile.children.cycles-pp.shmem_alloc_folio
      0.46            +0.0        0.48        
perf-profile.children.cycles-pp.run_ksoftirqd
      0.14 ±  3%      +0.0        0.16        
perf-profile.children.cycles-pp.arch_get_unmapped_area_topdown
      0.13 ±  3%      +0.0        0.15        
perf-profile.children.cycles-pp.vm_unmapped_area
      0.15 ±  3%      +0.0        0.17        
perf-profile.children.cycles-pp.flush_tlb_mm_range
      0.08 ±  5%      +0.0        0.10        
perf-profile.children.cycles-pp.mas_empty_area_rev
      0.59            +0.0        0.60        
perf-profile.children.cycles-pp.__vm_munmap
      0.53            +0.0        0.55        
perf-profile.children.cycles-pp.handle_softirqs
      0.52            +0.0        0.54        
perf-profile.children.cycles-pp.rcu_do_batch
      0.56            +0.0        0.57        
perf-profile.children.cycles-pp.do_vmi_align_munmap
      0.59            +0.0        0.61        
perf-profile.children.cycles-pp.__x64_sys_munmap
      0.52            +0.0        0.54        
perf-profile.children.cycles-pp.rcu_core
      0.31            +0.0        0.33        
perf-profile.children.cycles-pp.unmap_page_range
      0.13 ±  2%      +0.0        0.15        
perf-profile.children.cycles-pp.unmapped_area_topdown
      0.15 ±  2%      +0.0        0.17        
perf-profile.children.cycles-pp.__get_unmapped_area
      0.28            +0.0        0.30        
perf-profile.children.cycles-pp.zap_pte_range
      0.63            +0.0        0.65        
perf-profile.children.cycles-pp.__munmap
      0.57            +0.0        0.59        
perf-profile.children.cycles-pp.do_vmi_munmap
      0.15 ±  2%      +0.0        0.17        
perf-profile.children.cycles-pp.shmem_get_unmapped_area
      0.21            +0.0        0.23        
perf-profile.children.cycles-pp.zap_page_range_single
      0.36            +0.0        0.38        
perf-profile.children.cycles-pp.__mmap_new_vma
      0.29            +0.0        0.31        
perf-profile.children.cycles-pp.zap_pmd_range
      0.19            +0.0        0.22 ±  2%  
perf-profile.children.cycles-pp.zap_page_range_single_batched
      0.23            +0.0        0.26        
perf-profile.children.cycles-pp.unmap_mapping_range
      0.05            +0.0        0.08        
perf-profile.children.cycles-pp.perf_iterate_sb
      0.51            +0.0        0.54        
perf-profile.children.cycles-pp.shmem_alloc_and_add_folio
      0.98            +0.0        1.02        
perf-profile.children.cycles-pp.__handle_mm_fault
      0.94            +0.0        0.97        
perf-profile.children.cycles-pp.do_fault
      0.94            +0.0        0.97        
perf-profile.children.cycles-pp.do_shared_fault
      1.01            +0.0        1.04        
perf-profile.children.cycles-pp.handle_mm_fault
      0.84            +0.0        0.87        
perf-profile.children.cycles-pp.shmem_fault
      1.09            +0.0        1.12        
perf-profile.children.cycles-pp.do_user_addr_fault
      0.84            +0.0        0.88        
perf-profile.children.cycles-pp.__do_fault
      1.09            +0.0        1.14        
perf-profile.children.cycles-pp.exc_page_fault
      0.14 ±  3%      +0.0        0.18        
perf-profile.children.cycles-pp.perf_event_mmap
      0.13 ±  3%      +0.0        0.17        
perf-profile.children.cycles-pp.perf_event_mmap_event
      1.02            +0.0        1.06        
perf-profile.children.cycles-pp.shmem_get_folio_gfp
      0.00            +0.1        0.05        
perf-profile.children.cycles-pp.fault_dirty_shared_page
      0.00            +0.1        0.05        
perf-profile.children.cycles-pp.perf_event_mmap_output
      1.40            +0.1        1.46        
perf-profile.children.cycles-pp.asm_exc_page_fault
      1.53            +0.1        1.60        
perf-profile.children.cycles-pp.stress_memfd_child
      0.81 ±  2%      +0.1        0.91 ±  2%  
perf-profile.children.cycles-pp.mmap_region
      0.80 ±  2%      +0.1        0.90 ±  2%  
perf-profile.children.cycles-pp.__mmap_region
      0.99 ±  2%      +0.1        1.11        
perf-profile.children.cycles-pp.do_mmap
      1.00 ±  2%      +0.1        1.13        
perf-profile.children.cycles-pp.vm_mmap_pgoff
      1.02 ±  2%      +0.1        1.15        
perf-profile.children.cycles-pp.ksys_mmap_pgoff
      1.22            +0.1        1.36        
perf-profile.children.cycles-pp.__mmap
     92.16            -0.4       91.80        
perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
      0.05            +0.0        0.06        
perf-profile.self.cycles-pp.mas_rev_awalk
      0.12            +0.0        0.13        
perf-profile.self.cycles-pp.native_flush_tlb_one_user




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


Reply via email to