On Thu, Feb 05, 2026 at 07:55:19AM -0800, Alexei Starovoitov wrote:
> On Thu, Feb 5, 2026 at 12:55 AM Jiri Olsa <[email protected]> wrote:
> >
> > On Wed, Feb 04, 2026 at 08:06:50AM -0800, Alexei Starovoitov wrote:
> > > On Wed, Feb 4, 2026 at 4:36 AM Jiri Olsa <[email protected]> wrote:
> > > >
> > > > On Tue, Feb 03, 2026 at 03:17:05PM -0800, Alexei Starovoitov wrote:
> > > > > On Tue, Feb 3, 2026 at 1:38 AM Jiri Olsa <[email protected]> wrote:
> > > > > >
> > > > > > hi,
> > > > > > as an option to Meglong's change [1] I'm sending proposal for
> > > > > > tracing_multi
> > > > > > link that does not add static trampoline but attaches program to
> > > > > > all needed
> > > > > > trampolines.
> > > > > >
> > > > > > This approach keeps the same performance but has some drawbacks:
> > > > > >
> > > > > > - when attaching 20k functions we allocate and attach 20k
> > > > > > trampolines
> > > > > > - during attachment we hold each trampoline mutex, so for above
> > > > > > 20k functions we will hold 20k mutexes during the attachment,
> > > > > > should be very prone to deadlock, but haven't hit it yet
> > > > >
> > > > > If you check that it's sorted and always take them in the same order
> > > > > then there will be no deadlock.
> > > > > Or just grab one global mutex first and then grab trampolines mutexes
> > > > > next in any order. The global one will serialize this attach
> > > > > operation.
> > > > >
> > > > > > It looks the trampoline allocations/generation might not be big a
> > > > > > problem
> > > > > > and I'll try to find a solution for holding that many mutexes. If
> > > > > > there's
> > > > > > no better solution I think having one read/write mutex for tracing
> > > > > > multi
> > > > > > link attach/detach should work.
> > > > >
> > > > > If you mean to have one global mutex as I proposed above then I don't
> > > > > see
> > > > > a downside. It only serializes multiple libbpf calls.
> > > >
> > > > we also need to serialize it with standard single trampoline attach,
> > > > because the direct ftrace update is now done under trampoline->mutex:
> > > >
> > > > bpf_trampoline_link_prog(tr)
> > > > {
> > > > mutex_lock(&tr->mutex);
> > > > ...
> > > > update_ftrace_direct_*
> > > > ...
> > > > mutex_unlock(&tr->mutex);
> > > > }
> > > >
> > > > for tracing_multi we would link the program first (with tr->mutex)
> > > > and do the bulk ftrace update later (without tr->mutex)
> > > >
> > > > {
> > > > for each involved trampoline:
> > > > bpf_trampoline_link_prog
> > > >
> > > > --> and here we could race with some other thread doing single
> > > > trampoline attach
> > > >
> > > > update_ftrace_direct_*
> > > > }
> > > >
> > > > note the current version locks all tr->mutex instances all the way
> > > > through the update_ftrace_direct_* update
> > > >
> > > > I think we could use global rwsem and take read lock on single
> > > > trampoline attach path and write lock on tracing_multi attach,
> > > >
> > > > I thought we could take direct_mutex early, but that would mean
> > > > different order with trampoline mutex than we already have in
> > > > single attach path
> > >
> > > I feel we're talking past each other.
> > > I meant:
> > >
> > > For multi:
> > > 1. take some global mutex
> > > 2. take N tramp mutexes in any order
> > >
> > > For single:
> > > 1. take that 1 specific tramp mutex.
> >
> > ah ok, I understand, it's to prevent the lockup but keep holding all
> > the trampolines locks.. the rwsem I mentioned was for the 'fix', where
> > we do not take all the trampolines locks
>
> I don't understand how rwsem would help.
> All the operations on trampoline are protected by mutex.
> Switching to rw makes sense only if we can designate certain
> operations as "read" and others as "write" and number of "reads"
> dominate. This won't be the case with multi-fentry.
> And we still need to take all of them as "write" to update trampoline.
this applies to scenario where we do not hold all the trampoline locks,
in such case we could have race between single and multi attachment,
while single/single attachment race stays safe
as a fix the single attach would take read lock and multi attach would
take write lock, so single/single race is allowed and single/multi is
not ... showed in the patch below
but it might be too much.. in a sense that there's already many locks
involved in trampoline attach/detach, and simple global lock in multi
or just sorting the ids would be enough
jirka
---
diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c
index b76bb545077b..edbc8f133dda 100644
--- a/kernel/bpf/trampoline.c
+++ b/kernel/bpf/trampoline.c
@@ -30,6 +30,8 @@ static struct hlist_head
trampoline_ip_table[TRAMPOLINE_TABLE_SIZE];
/* serializes access to trampoline tables */
static DEFINE_MUTEX(trampoline_mutex);
+static DECLARE_RWSEM(multi_sem);
+
struct bpf_trampoline_ops {
int (*register_fentry)(struct bpf_trampoline *tr, void *new_addr, void
*data);
int (*unregister_fentry)(struct bpf_trampoline *tr, u32 orig_flags,
void *old_addr, void *data);
@@ -367,11 +369,7 @@ static struct bpf_trampoline *bpf_trampoline_lookup(u64
key, unsigned long ip)
head = &trampoline_ip_table[hash_64(tr->ip, TRAMPOLINE_HASH_BITS)];
hlist_add_head(&tr->hlist_ip, head);
refcount_set(&tr->refcnt, 1);
-#ifdef CONFIG_LOCKDEP
- mutex_init_with_key(&tr->mutex, &__lockdep_no_track__);
-#else
mutex_init(&tr->mutex);
-#endif
for (i = 0; i < BPF_TRAMP_MAX; i++)
INIT_HLIST_HEAD(&tr->progs_hlist[i]);
out:
@@ -871,6 +869,8 @@ int bpf_trampoline_link_prog(struct bpf_tramp_node *node,
{
int err;
+ guard(rwsem_read)(&multi_sem);
+
mutex_lock(&tr->mutex);
err = __bpf_trampoline_link_prog(node, tr, tgt_prog, &trampoline_ops,
NULL);
mutex_unlock(&tr->mutex);
@@ -916,6 +916,8 @@ int bpf_trampoline_unlink_prog(struct bpf_tramp_node *node,
{
int err;
+ guard(rwsem_read)(&multi_sem);
+
mutex_lock(&tr->mutex);
err = __bpf_trampoline_unlink_prog(node, tr, tgt_prog, &trampoline_ops,
NULL);
mutex_unlock(&tr->mutex);
@@ -1463,6 +1465,8 @@ int bpf_trampoline_multi_attach(struct bpf_prog *prog,
u32 *ids,
struct bpf_trampoline *tr;
u64 key;
+ guard(rwsem_write)(&multi_sem);
+
data.reg = alloc_ftrace_hash(FTRACE_HASH_DEFAULT_BITS);
if (!data.reg)
return -ENOMEM;
@@ -1494,12 +1498,10 @@ int bpf_trampoline_multi_attach(struct bpf_prog *prog,
u32 *ids,
tr = mnode->trampoline;
mutex_lock(&tr->mutex);
-
err = __bpf_trampoline_link_prog(&mnode->node, tr, NULL,
&trampoline_multi_ops, &data);
- if (err) {
- mutex_unlock(&tr->mutex);
+ mutex_unlock(&tr->mutex);
+ if (err)
goto rollback_unlink;
- }
}
if (ftrace_hash_count(data.reg)) {
@@ -1516,11 +1518,6 @@ int bpf_trampoline_multi_attach(struct bpf_prog *prog,
u32 *ids,
}
}
- for (i = 0; i < cnt; i++) {
- tr = link->nodes[i].trampoline;
- mutex_unlock(&tr->mutex);
- }
-
free_fentry_multi_data(&data);
return 0;
@@ -1528,6 +1525,7 @@ int bpf_trampoline_multi_attach(struct bpf_prog *prog,
u32 *ids,
for (j = 0; j < i; j++) {
mnode = &link->nodes[j];
tr = mnode->trampoline;
+ mutex_lock(&tr->mutex);
WARN_ON_ONCE(__bpf_trampoline_unlink_prog(&mnode->node, tr,
NULL,
&trampoline_multi_ops, &data));
mutex_unlock(&tr->mutex);
@@ -1550,6 +1548,8 @@ int bpf_trampoline_multi_detach(struct bpf_prog *prog,
struct bpf_tracing_multi_
int i, cnt = link->nodes_cnt;
struct bpf_trampoline *tr;
+ guard(rwsem_write)(&multi_sem);
+
data.unreg = alloc_ftrace_hash(FTRACE_HASH_DEFAULT_BITS);
if (!data.unreg)
return -ENOMEM;
@@ -1567,6 +1567,7 @@ int bpf_trampoline_multi_detach(struct bpf_prog *prog,
struct bpf_tracing_multi_
mutex_lock(&tr->mutex);
WARN_ON_ONCE(__bpf_trampoline_unlink_prog(&mnode->node, tr,
NULL,
&trampoline_multi_ops, &data));
+ mutex_unlock(&tr->mutex);
}
if (ftrace_hash_count(data.unreg))
@@ -1576,7 +1577,6 @@ int bpf_trampoline_multi_detach(struct bpf_prog *prog,
struct bpf_tracing_multi_
for (i = 0; i < cnt; i++) {
tr = link->nodes[i].trampoline;
- mutex_unlock(&tr->mutex);
bpf_trampoline_put(tr);
}