On Tue, 12 Mar 2024 13:42:28 +0000 Mark Rutland <mark.rutl...@arm.com> wrote:
> There are ways around that, but they're complicated and/or expensive, e.g. > > * Use a sequence of multiple patches, starting with replacing the JALR with an > exception-generating instruction with a fixup handler, which is sort-of what > x86 does with UD2. This may require multiple passes with > synchronize_rcu_tasks() to make sure all threads have seen the latest > instructions, and that cannot be done under stop_machine(), so if you need > stop_machine() for CMODx reasons, you may need to use that several times > with > intervening calls to synchronize_rcu_tasks(). Just for clarification. x86 doesn't use UD2 for updating the call sites. It uses the breakpoint (0xcc) operation. This is because x86 instructions are not a fixed size and can cross cache boundaries, making updates to text sections dangerous as another CPU may get half the old instruction and half the new one! Thus, when we have: 0f 1f 44 00 00 nop and want to convert it to: e8 e7 57 07 00 call ftrace_caller We have to first add a breakpoint: cc 17 44 00 00 Send an IPI to all CPUs so that they see the breakpoint (IPI is equivalent to a memory barrier). We have a ftrace breakpoint handler that will look at the function that the breakpoint happened on. If it was a nop, it will just skip over the rest of the instruction, and return from the break point handler just after the "cc 17 44 00 00". If it's supposed to be a function, it will push the return to after the instruction onto the stack, and return from the break point handler to ftrace_caller. (Although things changed a little since this update is now handled by text_poke_bp_batch()). Then it changes the rest of the instruction: cc e7 57 07 00 Sends out another IPI to all CPUs and removes the break point with the new instruction op. e8 e7 57 07 00 and now all the callers of this function will call the ftrace_caller. -- Steve