On 2026-01-09 14:19, Steven Rostedt wrote:
On Fri, 9 Jan 2026 11:10:16 -0800
Alexei Starovoitov <[email protected]> wrote:
\> >
We also have to consider that migrate disable is *not* cheap at all
compared to preempt disable.
Looks like your complaint comes from lack of engagement in kernel
development.
No need to make comments like that. The Linux kernel is an ocean of code.
It's very hard to keep up on everything that is happening. I knew of work
being done on migrate_disable but I didn't know what the impacts of that
work was. Mathieu is still very much involved and engaged in kernel
development.
Thanks Steven. I guess Alexei missed my recent involvement in other
areas of the kernel.
As Steven pointed out, the kernel is vast, so I cannot keep up with
the progress on every single topic. That being said, I very recently
(about 1 month ago) tried using migrate disable for the RSS tracking
improvements (hierarchical percpu counters), and found that the overhead
of migrate disable was large compared to preempt disable. The generated
assembler is also orders of magnitude larger (on x86-64).
Creating small placeholder functions which just call preempt/migrate
disable and enable for a preempt RT build:
0000000000002a20 <test_preempt_disable>:
2a20: f3 0f 1e fa endbr64
2a24: 65 ff 05 00 00 00 00 incl %gs:0x0(%rip) # 2a2b
<test_preempt_disable+0xb>
2a2b: e9 00 00 00 00 jmp 2a30 <test_preempt_disable+0x10>
0000000000002a40 <test_preempt_enable>:
2a40: f3 0f 1e fa endbr64
2a44: 65 ff 0d 00 00 00 00 decl %gs:0x0(%rip) # 2a4b
<test_preempt_enable+0xb>
2a4b: 74 05 je 2a52 <test_preempt_enable+0x12>
2a4d: e9 00 00 00 00 jmp 2a52 <test_preempt_enable+0x12>
2a52: e8 00 00 00 00 call 2a57 <test_preempt_enable+0x17>
2a57: e9 00 00 00 00 jmp 2a5c <test_preempt_enable+0x1c>
0000000000002920 <test_migrate_disable>:
2920: f3 0f 1e fa endbr64
2924: 65 48 8b 15 00 00 00 mov %gs:0x0(%rip),%rdx # 292c
<test_migrate_disable+0xc>
292b: 00
292c: 0f b7 82 38 07 00 00 movzwl 0x738(%rdx),%eax
2933: 66 85 c0 test %ax,%ax
2936: 74 0f je 2947 <test_migrate_disable+0x27>
2938: 83 c0 01 add $0x1,%eax
293b: 66 89 82 38 07 00 00 mov %ax,0x738(%rdx)
2942: e9 00 00 00 00 jmp 2947 <test_migrate_disable+0x27>
2947: 65 ff 05 00 00 00 00 incl %gs:0x0(%rip) # 294e
<test_migrate_disable+0x2e>
294e: 65 48 8b 05 00 00 00 mov %gs:0x0(%rip),%rax # 2956
<test_migrate_disable+0x36>
2955: 00
2956: 83 80 00 00 00 00 01 addl $0x1,0x0(%rax)
295d: b8 01 00 00 00 mov $0x1,%eax
2962: 66 89 82 38 07 00 00 mov %ax,0x738(%rdx)
2969: 65 ff 0d 00 00 00 00 decl %gs:0x0(%rip) # 2970
<test_migrate_disable+0x50>
2970: 74 05 je 2977 <test_migrate_disable+0x57>
2972: e9 00 00 00 00 jmp 2977 <test_migrate_disable+0x57>
2977: e8 00 00 00 00 call 297c <test_migrate_disable+0x5c>
297c: e9 00 00 00 00 jmp 2981 <test_migrate_disable+0x61>
00000000000029a0 <test_migrate_enable>:
29a0: f3 0f 1e fa endbr64
29a4: 65 48 8b 15 00 00 00 mov %gs:0x0(%rip),%rdx # 29ac
<test_migrate_enable+0xc>
29ab: 00
29ac: 0f b7 82 38 07 00 00 movzwl 0x738(%rdx),%eax
29b3: 66 85 c0 test %ax,%ax
29b6: 74 0f je 29c7 <test_migrate_enable+0x27>
29b8: 83 c0 01 add $0x1,%eax
29bb: 66 89 82 38 07 00 00 mov %ax,0x738(%rdx)
29c2: e9 00 00 00 00 jmp 29c7 <test_migrate_enable+0x27>
29c7: 65 ff 05 00 00 00 00 incl %gs:0x0(%rip) # 29ce
<test_migrate_enable+0x2e>
29ce: 65 48 8b 05 00 00 00 mov %gs:0x0(%rip),%rax # 29d6
<test_migrate_enable+0x36>
29d5: 00
29d6: 83 80 00 00 00 00 01 addl $0x1,0x0(%rax)
29dd: b8 01 00 00 00 mov $0x1,%eax
29e2: 66 89 82 38 07 00 00 mov %ax,0x738(%rdx)
29e9: 65 ff 0d 00 00 00 00 decl %gs:0x0(%rip) # 29f0
<test_migrate_enable+0x50>
29f0: 74 05 je 29f7 <test_migrate_enable+0x57>
29f2: e9 00 00 00 00 jmp 29f7 <test_migrate_enable+0x57>
29f7: e8 00 00 00 00 call 29fc <test_migrate_enable+0x5c>
29fc: e9 00 00 00 00 jmp 2a01 <test_migrate_enable+0x61>
migrate_disable _was_ not cheap.
Try to benchmark it now.
It's inlined. It's a fraction of extra overhead on top of preempt_disable.
It would be good to have a benchmark of the two. What about fast_srcu? Is
that fast enough to replace the preempt_disable()? If so, then could we
just make this the same for both RT and !RT?
I've modified kernel/rcu/refscale.c to compare those:
AMD EPYC 9654 96-Core Processor, kernel baseline: v6.18.1
CONFIG_PREEMPT=y
# CONFIG_PREEMPT_LAZY is not set
# CONFIG_PREEMPT_RT is not set
* preempt disable/enable pair: 1.1 ns
* srcu-fast lock/unlock: 1.5 ns
CONFIG_RCU_REF_SCALE_TEST=y
* migrate disable/enable pair: 3.0 ns
* calls to migrate disable/enable pair within noinline functions: 17.0 ns
CONFIG_RCU_REF_SCALE_TEST=m
* migrate disable/enable pair: 22.0 ns
When I attempted using migrate disable, I configured refscale as
a module, which gave me the appalling 22 ns overhead. It looks like
the implementation of migrate disable/enable now differs depending on
whether it's used from the core kernel or from a module. That's rather
unexpected.
It seems to be done on purpose though (INSTANTIATE_EXPORTED_MIGRATE_DISABLE)
to work around the fact that it is not possible to export the runqueues
variable.
That's the kind of compilation context dependent overhead variability I'd
rather avoid in the implementation of the tracepoint instrumentation API.
Thanks,
Mathieu
--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com