hi, Paul E. McKenney,
similar to b41f5a411f report we just made out. we make this report based on
stable data.
please educate us if this report is less meaningful. thanks
Hello,
kernel test robot noticed a 3.4% improvement of stress-ng.memfd.ops_per_sec on:
commit: 1ac50ec62874025381a864f784583dbdc30dcc7c ("rcu: Re-implement RCU Tasks
Trace in terms of SRCU-fast")
https://github.com/paulmckrcu/linux dev.2025.12.16a
testcase: stress-ng
config: x86_64-rhel-9.4
compiler: gcc-14
test machine: 192 threads 2 sockets Intel(R) Xeon(R) 6740E CPU @ 2.4GHz
(Sierra Forest) with 256G memory
parameters:
nr_threads: 100%
testtime: 60s
test: memfd
cpufreq_governor: performance
Details are as below:
-------------------------------------------------------------------------------------------------->
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20260107/[email protected]
=========================================================================================
compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
gcc-14/performance/x86_64-rhel-9.4/100%/debian-13-x86_64-20250902.cgz/lkp-srf-2sp2/memfd/stress-ng/60s
commit:
43c23963b3 ("tracing: Guard __DECLARE_TRACE() use of __DO_TRACE_CALL() with
SRCU-fast")
1ac50ec628 ("rcu: Re-implement RCU Tasks Trace in terms of SRCU-fast")
43c23963b3c549da 1ac50ec62874025381a864f7845
---------------- ---------------------------
%stddev %change %stddev
\ | \
203232 ± 22% -21.8% 158902 ± 18% numa-meminfo.node1.Mapped
50977 ± 22% -21.8% 39853 ± 18% numa-vmstat.node1.nr_mapped
7039 +1.6% 7152 vmstat.system.cs
107613 -3.9% 103453
stress-ng.memfd.nanosecs_per_memfd_create_call
193537 +3.4% 200175 stress-ng.memfd.ops
3226 +3.4% 3337 stress-ng.memfd.ops_per_sec
187908 +1.6% 190921
stress-ng.time.involuntary_context_switches
99134672 +3.4% 1.025e+08 stress-ng.time.minor_page_faults
61965 ± 3% -6.2% 58116 proc-vmstat.nr_mapped
1.526e+08 +3.4% 1.578e+08 proc-vmstat.numa_hit
1.524e+08 +3.4% 1.576e+08 proc-vmstat.numa_local
1.631e+08 +3.4% 1.687e+08 proc-vmstat.pgalloc_normal
99574028 +3.4% 1.03e+08 proc-vmstat.pgfault
1.624e+08 +3.4% 1.679e+08 proc-vmstat.pgfree
2.26 +1.1% 2.28 perf-stat.i.MPKI
1.646e+10 +1.8% 1.675e+10 perf-stat.i.branch-instructions
0.24 +0.0 0.25 perf-stat.i.branch-miss-rate%
38660390 +6.8% 41301381 perf-stat.i.branch-misses
1.714e+08 +3.1% 1.768e+08 perf-stat.i.cache-misses
2.884e+08 +3.3% 2.979e+08 perf-stat.i.cache-references
6758 +1.3% 6846 perf-stat.i.context-switches
7.88 -2.0% 7.73 perf-stat.i.cpi
3504 -3.1% 3397
perf-stat.i.cycles-between-cache-misses
7.628e+10 +2.0% 7.783e+10 perf-stat.i.instructions
0.13 +2.0% 0.13 perf-stat.i.ipc
17.01 +3.5% 17.59 perf-stat.i.metric.K/sec
1632582 +3.5% 1689088 perf-stat.i.minor-faults
1632582 +3.5% 1689088 perf-stat.i.page-faults
2.25 +1.1% 2.27 perf-stat.overall.MPKI
0.23 +0.0 0.25
perf-stat.overall.branch-miss-rate%
7.91 -2.0% 7.76 perf-stat.overall.cpi
3519 -3.1% 3411
perf-stat.overall.cycles-between-cache-misses
0.13 +2.0% 0.13 perf-stat.overall.ipc
1.619e+10 +1.8% 1.648e+10 perf-stat.ps.branch-instructions
37927942 +6.8% 40525278 perf-stat.ps.branch-misses
1.687e+08 +3.1% 1.74e+08 perf-stat.ps.cache-misses
2.841e+08 +3.3% 2.933e+08 perf-stat.ps.cache-references
6638 +1.4% 6729 perf-stat.ps.context-switches
7.503e+10 +2.0% 7.654e+10 perf-stat.ps.instructions
1606108 +3.5% 1661536 perf-stat.ps.minor-faults
1606108 +3.5% 1661536 perf-stat.ps.page-faults
4.564e+12 +1.8% 4.647e+12 perf-stat.total.instructions
46.05 -0.3 45.79
perf-profile.calltrace.cycles-pp._raw_spin_lock.inode_sb_list_add.new_inode.__shmem_get_inode.__shmem_file_setup
45.93 -0.3 45.68
perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.inode_sb_list_add.new_inode.__shmem_get_inode
46.36 -0.3 46.10
perf-profile.calltrace.cycles-pp.__shmem_get_inode.__shmem_file_setup.__x64_sys_memfd_create.do_syscall_64.entry_SYSCALL_64_after_hwframe
46.12 -0.3 45.87
perf-profile.calltrace.cycles-pp.inode_sb_list_add.new_inode.__shmem_get_inode.__shmem_file_setup.__x64_sys_memfd_create
46.26 -0.3 46.01
perf-profile.calltrace.cycles-pp.new_inode.__shmem_get_inode.__shmem_file_setup.__x64_sys_memfd_create.do_syscall_64
46.57 -0.2 46.32
perf-profile.calltrace.cycles-pp.__shmem_file_setup.__x64_sys_memfd_create.do_syscall_64.entry_SYSCALL_64_after_hwframe.memfd_create
46.62 -0.2 46.38
perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.memfd_create
46.61 -0.2 46.37
perf-profile.calltrace.cycles-pp.__x64_sys_memfd_create.do_syscall_64.entry_SYSCALL_64_after_hwframe.memfd_create
46.64 -0.2 46.40
perf-profile.calltrace.cycles-pp.memfd_create
46.62 -0.2 46.37
perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.memfd_create
45.57 -0.2 45.38
perf-profile.calltrace.cycles-pp._raw_spin_lock.evict.__dentry_kill.finish_dput.__fput
45.40 -0.2 45.22
perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.evict.__dentry_kill.finish_dput
46.69 -0.2 46.53
perf-profile.calltrace.cycles-pp.__fput.task_work_run.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
46.61 -0.2 46.45
perf-profile.calltrace.cycles-pp.finish_dput.__fput.task_work_run.exit_to_user_mode_loop.do_syscall_64
46.73 -0.2 46.57
perf-profile.calltrace.cycles-pp.close_range
46.71 -0.2 46.55
perf-profile.calltrace.cycles-pp.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe.close_range
46.71 -0.2 46.55
perf-profile.calltrace.cycles-pp.task_work_run.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe.close_range
46.73 -0.2 46.57
perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.close_range
46.73 -0.2 46.57
perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.close_range
46.60 -0.2 46.45
perf-profile.calltrace.cycles-pp.__dentry_kill.finish_dput.__fput.task_work_run.exit_to_user_mode_loop
46.40 -0.2 46.24
perf-profile.calltrace.cycles-pp.evict.__dentry_kill.finish_dput.__fput.task_work_run
0.59 +0.0 0.60
perf-profile.calltrace.cycles-pp.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
0.56 +0.0 0.57
perf-profile.calltrace.cycles-pp.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64
0.62 +0.0 0.64
perf-profile.calltrace.cycles-pp.__munmap
0.59 +0.0 0.60
perf-profile.calltrace.cycles-pp.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
0.57 +0.0 0.59
perf-profile.calltrace.cycles-pp.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.59 +0.0 0.61
perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__munmap
0.59 +0.0 0.61
perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
0.98 +0.0 1.01
perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
0.94 +0.0 0.97
perf-profile.calltrace.cycles-pp.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault
0.94 +0.0 0.97
perf-profile.calltrace.cycles-pp.do_shared_fault.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
1.01 +0.0 1.04
perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.stress_memfd_child
0.83 +0.0 0.86
perf-profile.calltrace.cycles-pp.shmem_get_folio_gfp.shmem_fault.__do_fault.do_shared_fault.do_fault
0.84 +0.0 0.87
perf-profile.calltrace.cycles-pp.shmem_fault.__do_fault.do_shared_fault.do_fault.__handle_mm_fault
0.84 +0.0 0.88
perf-profile.calltrace.cycles-pp.__do_fault.do_shared_fault.do_fault.__handle_mm_fault.handle_mm_fault
1.08 +0.0 1.12
perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.stress_memfd_child
1.09 +0.0 1.14
perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault.stress_memfd_child
1.25 +0.1 1.30
perf-profile.calltrace.cycles-pp.asm_exc_page_fault.stress_memfd_child
1.26 +0.1 1.32
perf-profile.calltrace.cycles-pp.stress_memfd_child
0.81 ± 2% +0.1 0.91 ± 2%
perf-profile.calltrace.cycles-pp.mmap_region.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64
0.80 ± 3% +0.1 0.90 ± 2%
perf-profile.calltrace.cycles-pp.__mmap_region.mmap_region.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff
0.99 ± 2% +0.1 1.11
perf-profile.calltrace.cycles-pp.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe
1.00 ± 2% +0.1 1.13
perf-profile.calltrace.cycles-pp.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe.__mmap
1.03 ± 2% +0.1 1.16
perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__mmap
1.02 ± 2% +0.1 1.15
perf-profile.calltrace.cycles-pp.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe.__mmap
1.03 ± 2% +0.1 1.16
perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__mmap
1.21 +0.1 1.35
perf-profile.calltrace.cycles-pp.__mmap
92.36 -0.4 91.92
perf-profile.children.cycles-pp._raw_spin_lock
92.52 -0.3 92.17
perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
46.12 -0.3 45.87
perf-profile.children.cycles-pp.inode_sb_list_add
46.26 -0.3 46.01
perf-profile.children.cycles-pp.new_inode
46.36 -0.3 46.10
perf-profile.children.cycles-pp.__shmem_get_inode
46.57 -0.2 46.32
perf-profile.children.cycles-pp.__shmem_file_setup
46.61 -0.2 46.37
perf-profile.children.cycles-pp.__x64_sys_memfd_create
46.65 -0.2 46.41
perf-profile.children.cycles-pp.memfd_create
47.18 -0.2 47.01
perf-profile.children.cycles-pp.__fput
47.10 -0.2 46.94
perf-profile.children.cycles-pp.finish_dput
46.73 -0.2 46.57
perf-profile.children.cycles-pp.close_range
47.09 -0.2 46.93
perf-profile.children.cycles-pp.__dentry_kill
46.88 -0.2 46.72
perf-profile.children.cycles-pp.evict
46.71 -0.2 46.55
perf-profile.children.cycles-pp.exit_to_user_mode_loop
46.71 -0.2 46.55
perf-profile.children.cycles-pp.task_work_run
97.60 -0.1 97.51
perf-profile.children.cycles-pp.do_syscall_64
97.61 -0.1 97.53
perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
0.14 +0.0 0.15
perf-profile.children.cycles-pp.xas_create
0.11 +0.0 0.12
perf-profile.children.cycles-pp.__alloc_frozen_pages_noprof
0.11 +0.0 0.12
perf-profile.children.cycles-pp.folio_alloc_mpol_noprof
0.07 +0.0 0.08
perf-profile.children.cycles-pp.mas_rev_awalk
0.12 +0.0 0.13
perf-profile.children.cycles-pp.alloc_pages_mpol
0.12 +0.0 0.13
perf-profile.children.cycles-pp.native_flush_tlb_one_user
0.29 +0.0 0.30
perf-profile.children.cycles-pp.kmem_cache_free
0.09 ± 5% +0.0 0.10 ± 4%
perf-profile.children.cycles-pp.xas_expand
0.17 ± 2% +0.0 0.18 ± 2%
perf-profile.children.cycles-pp.kmem_cache_alloc_lru_noprof
0.22 +0.0 0.24 ± 2%
perf-profile.children.cycles-pp.xas_store
0.13 ± 3% +0.0 0.15 ± 3%
perf-profile.children.cycles-pp.flush_tlb_func
0.47 +0.0 0.49
perf-profile.children.cycles-pp.kthread
0.47 +0.0 0.49
perf-profile.children.cycles-pp.ret_from_fork
0.47 +0.0 0.49
perf-profile.children.cycles-pp.ret_from_fork_asm
0.23 +0.0 0.25
perf-profile.children.cycles-pp.shmem_add_to_page_cache
0.12 ± 4% +0.0 0.14
perf-profile.children.cycles-pp.shmem_alloc_folio
0.46 +0.0 0.48
perf-profile.children.cycles-pp.run_ksoftirqd
0.14 ± 3% +0.0 0.16
perf-profile.children.cycles-pp.arch_get_unmapped_area_topdown
0.13 ± 3% +0.0 0.15
perf-profile.children.cycles-pp.vm_unmapped_area
0.15 ± 3% +0.0 0.17
perf-profile.children.cycles-pp.flush_tlb_mm_range
0.08 ± 5% +0.0 0.10
perf-profile.children.cycles-pp.mas_empty_area_rev
0.59 +0.0 0.60
perf-profile.children.cycles-pp.__vm_munmap
0.53 +0.0 0.55
perf-profile.children.cycles-pp.handle_softirqs
0.52 +0.0 0.54
perf-profile.children.cycles-pp.rcu_do_batch
0.56 +0.0 0.57
perf-profile.children.cycles-pp.do_vmi_align_munmap
0.59 +0.0 0.61
perf-profile.children.cycles-pp.__x64_sys_munmap
0.52 +0.0 0.54
perf-profile.children.cycles-pp.rcu_core
0.31 +0.0 0.33
perf-profile.children.cycles-pp.unmap_page_range
0.13 ± 2% +0.0 0.15
perf-profile.children.cycles-pp.unmapped_area_topdown
0.15 ± 2% +0.0 0.17
perf-profile.children.cycles-pp.__get_unmapped_area
0.28 +0.0 0.30
perf-profile.children.cycles-pp.zap_pte_range
0.63 +0.0 0.65
perf-profile.children.cycles-pp.__munmap
0.57 +0.0 0.59
perf-profile.children.cycles-pp.do_vmi_munmap
0.15 ± 2% +0.0 0.17
perf-profile.children.cycles-pp.shmem_get_unmapped_area
0.21 +0.0 0.23
perf-profile.children.cycles-pp.zap_page_range_single
0.36 +0.0 0.38
perf-profile.children.cycles-pp.__mmap_new_vma
0.29 +0.0 0.31
perf-profile.children.cycles-pp.zap_pmd_range
0.19 +0.0 0.22 ± 2%
perf-profile.children.cycles-pp.zap_page_range_single_batched
0.23 +0.0 0.26
perf-profile.children.cycles-pp.unmap_mapping_range
0.05 +0.0 0.08
perf-profile.children.cycles-pp.perf_iterate_sb
0.51 +0.0 0.54
perf-profile.children.cycles-pp.shmem_alloc_and_add_folio
0.98 +0.0 1.02
perf-profile.children.cycles-pp.__handle_mm_fault
0.94 +0.0 0.97
perf-profile.children.cycles-pp.do_fault
0.94 +0.0 0.97
perf-profile.children.cycles-pp.do_shared_fault
1.01 +0.0 1.04
perf-profile.children.cycles-pp.handle_mm_fault
0.84 +0.0 0.87
perf-profile.children.cycles-pp.shmem_fault
1.09 +0.0 1.12
perf-profile.children.cycles-pp.do_user_addr_fault
0.84 +0.0 0.88
perf-profile.children.cycles-pp.__do_fault
1.09 +0.0 1.14
perf-profile.children.cycles-pp.exc_page_fault
0.14 ± 3% +0.0 0.18
perf-profile.children.cycles-pp.perf_event_mmap
0.13 ± 3% +0.0 0.17
perf-profile.children.cycles-pp.perf_event_mmap_event
1.02 +0.0 1.06
perf-profile.children.cycles-pp.shmem_get_folio_gfp
0.00 +0.1 0.05
perf-profile.children.cycles-pp.fault_dirty_shared_page
0.00 +0.1 0.05
perf-profile.children.cycles-pp.perf_event_mmap_output
1.40 +0.1 1.46
perf-profile.children.cycles-pp.asm_exc_page_fault
1.53 +0.1 1.60
perf-profile.children.cycles-pp.stress_memfd_child
0.81 ± 2% +0.1 0.91 ± 2%
perf-profile.children.cycles-pp.mmap_region
0.80 ± 2% +0.1 0.90 ± 2%
perf-profile.children.cycles-pp.__mmap_region
0.99 ± 2% +0.1 1.11
perf-profile.children.cycles-pp.do_mmap
1.00 ± 2% +0.1 1.13
perf-profile.children.cycles-pp.vm_mmap_pgoff
1.02 ± 2% +0.1 1.15
perf-profile.children.cycles-pp.ksys_mmap_pgoff
1.22 +0.1 1.36
perf-profile.children.cycles-pp.__mmap
92.16 -0.4 91.80
perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
0.05 +0.0 0.06
perf-profile.self.cycles-pp.mas_rev_awalk
0.12 +0.0 0.13
perf-profile.self.cycles-pp.native_flush_tlb_one_user
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki