On 20/08/2025 18:18, Edgecombe, Rick P wrote: > On Wed, 2025-08-20 at 18:01 +0200, Kevin Brodsky wrote: >> Apologies, Thunderbird helpfully decided to wrap around that table... >> Here's the unmangled table: >> >> +-------------------+----------------------------------+------------------+---------------+ >>> Benchmark | Result Class | Without batching | >>> With batching | >> +===================+==================================+==================+===============+ >>> mmtests/kernbench | real time | 0.32% | >>> 0.35% | >>> | system time | (R) 4.18% | >>> (R) 3.18% | >>> | user time | 0.08% | >>> 0.20% | >> +-------------------+----------------------------------+------------------+---------------+ >>> micromm/fork | fork: h:0 | (R) 221.39% | >>> (R) 3.35% | >>> | fork: h:1 | (R) 282.89% | >>> (R) 6.99% | >> +-------------------+----------------------------------+------------------+---------------+ >>> micromm/munmap | munmap: h:0 | (R) 17.37% | >>> -0.28% | >>> | munmap: h:1 | (R) 172.61% | >>> (R) 8.08% | >> +-------------------+----------------------------------+------------------+---------------+ >>> micromm/vmalloc | fix_size_alloc_test: p:1, h:0 | (R) 15.54% | >>> (R) 12.57% | > Both this and the previous one have the 95% confidence interval. So it saw a > 16% > speed up with direct map modification. Possible?
Positive numbers mean performance degradation ("(R)" actually stands for regression), so in that case the protection is adding a 16%/13% overhead. Here this is mainly due to the added pkey register switching (+ barrier) happening on every call to vmalloc() and vfree(), which has a large relative impact since only one page is being allocated/freed. >>> | fix_size_alloc_test: p:4, h:0 | (R) 39.18% | >>> (R) 9.13% | >>> | fix_size_alloc_test: p:16, h:0 | (R) 65.81% | >>> 2.97% | >>> | fix_size_alloc_test: p:64, h:0 | (R) 83.39% | >>> -0.49% | >>> | fix_size_alloc_test: p:256, h:0 | (R) 87.85% | >>> (I) -2.04% | >>> | fix_size_alloc_test: p:16, h:1 | (R) 51.21% | >>> 3.77% | >>> | fix_size_alloc_test: p:64, h:1 | (R) 60.02% | >>> 0.99% | >>> | fix_size_alloc_test: p:256, h:1 | (R) 63.82% | >>> 1.16% | >>> | random_size_alloc_test: p:1, h:0 | (R) 77.79% | >>> -0.51% | >>> | vm_map_ram_test: p:1, h:0 | (R) 30.67% | >>> (R) 27.09% | >> +-------------------+----------------------------------+------------------+---------------+ > Hmm, still surprisingly low to me, but ok. It would be good have x86 and arm > work the same, but I don't think we have line of sight to x86 currently. And I > actually never did real benchmarks. It would certainly be good to get numbers on x86 as well - I'm hoping that someone with a better understanding of x86 than myself could implement kpkeys on x86 at some point, so that we can run the same benchmarks there. - Kevin