On Wed, Feb 04, 2026 at 08:06:50AM -0800, Alexei Starovoitov wrote: > On Wed, Feb 4, 2026 at 4:36 AM Jiri Olsa <[email protected]> wrote: > > > > On Tue, Feb 03, 2026 at 03:17:05PM -0800, Alexei Starovoitov wrote: > > > On Tue, Feb 3, 2026 at 1:38 AM Jiri Olsa <[email protected]> wrote: > > > > > > > > hi, > > > > as an option to Meglong's change [1] I'm sending proposal for > > > > tracing_multi > > > > link that does not add static trampoline but attaches program to all > > > > needed > > > > trampolines. > > > > > > > > This approach keeps the same performance but has some drawbacks: > > > > > > > > - when attaching 20k functions we allocate and attach 20k trampolines > > > > - during attachment we hold each trampoline mutex, so for above > > > > 20k functions we will hold 20k mutexes during the attachment, > > > > should be very prone to deadlock, but haven't hit it yet > > > > > > If you check that it's sorted and always take them in the same order > > > then there will be no deadlock. > > > Or just grab one global mutex first and then grab trampolines mutexes > > > next in any order. The global one will serialize this attach operation. > > > > > > > It looks the trampoline allocations/generation might not be big a > > > > problem > > > > and I'll try to find a solution for holding that many mutexes. If > > > > there's > > > > no better solution I think having one read/write mutex for tracing multi > > > > link attach/detach should work. > > > > > > If you mean to have one global mutex as I proposed above then I don't see > > > a downside. It only serializes multiple libbpf calls. > > > > we also need to serialize it with standard single trampoline attach, > > because the direct ftrace update is now done under trampoline->mutex: > > > > bpf_trampoline_link_prog(tr) > > { > > mutex_lock(&tr->mutex); > > ... > > update_ftrace_direct_* > > ... > > mutex_unlock(&tr->mutex); > > } > > > > for tracing_multi we would link the program first (with tr->mutex) > > and do the bulk ftrace update later (without tr->mutex) > > > > { > > for each involved trampoline: > > bpf_trampoline_link_prog > > > > --> and here we could race with some other thread doing single > > trampoline attach > > > > update_ftrace_direct_* > > } > > > > note the current version locks all tr->mutex instances all the way > > through the update_ftrace_direct_* update > > > > I think we could use global rwsem and take read lock on single > > trampoline attach path and write lock on tracing_multi attach, > > > > I thought we could take direct_mutex early, but that would mean > > different order with trampoline mutex than we already have in > > single attach path > > I feel we're talking past each other. > I meant: > > For multi: > 1. take some global mutex > 2. take N tramp mutexes in any order > > For single: > 1. take that 1 specific tramp mutex.
ah ok, I understand, it's to prevent the lockup but keep holding all the trampolines locks.. the rwsem I mentioned was for the 'fix', where we do not take all the trampolines locks thanks, jirka
