On 2025-11-13 10:24:45 [-0500], Steven Rostedt wrote: > On Thu, 13 Nov 2025 16:17:29 +0100 > Sebastian Andrzej Siewior <[email protected]> wrote: > > > On 2025-11-13 10:05:24 [-0500], Steven Rostedt wrote: > > > This means that the chunks are not being freed and we can't be doing > > > synchronize_rcu() in every exit. > > > > You don't have to, you can do call_rcu(). > > But the chunk isn't being freed. They may be used right away.
Not if you avoid using it until after the rcu callback. > > > > So I *think* the RCU approach should be doable and cover this. > > > > > > Where would you put the synchronize_rcu()? In do_exit()? > > > > simply call_rcu() and let it move to the freelist. > > A couple of issues. One, the chunks are fully used. There's no place to put > a "rcu_head" in them. Well, we may be able to make use of them. This could be the first (16?) bytes of the memory chunk. > Second, if there's a lot of tasks exiting and forking, we can easily run > out of chunks that are waiting to be "freed" via call_rcu(). but this is a general RCU problem and not new here. The task_struct and everything around it (including stack) is RCU freed. > > > > > Also understanding what this is used for helps in understanding the scope > > > of protection needed. > > > > > > The pid_list is created when you add anything into one of the pid files in > > > tracefs. Let's use /sys/kernel/tracing/set_ftrace_pid: > > > > > > # cd /sys/kernel/tracing > > > # echo $$ > set_ftrace_pid > > > # echo 1 > options/function-fork > > > # cat set_ftrace_pid > > > 2716 > > > 2936 > > > # cat set_ftrace_pid > > > 2716 > > > 2945 > > > > > > What the above did was to create a pid_list for the function tracer. I > > > added the bash process pid using $$ (2716). Then when I cat the file, it > > > showed the pid for the bash process as well as the pid for the cat > > > process, > > > as the cat process is a child of the bash process. The function-fork > > > option > > > means to add any child process to the set_ftrace_pid if the parent is > > > already in the list. It also means to remove the pid if a process in the > > > list exits. > > > > This adding/ add-on-fork, removing and remove-on-exit is the only write > > side? > > That and manual writes to the set_ftrace_pid file. This looks like minimal. I miss understood then that context switch can also contribute to it. > > > What we are protecting against is when one chunk is freed, but then > > > allocated again for a different set of PIDs. Where the reader has the > > > chunk, > > > it was freed and re-allocated and the bit that is about to be checked > > > doesn't represent the bit it is checking for. > > > > This I assumed. > > And the kfree() at the end can not happen while there is still a reader? > > Correct. That's done by the pid_list user: > > In clear_ftrace_pids(): > > /* Wait till all users are no longer using pid filtering */ > synchronize_rcu(); > > if ((type & TRACE_PIDS) && pid_list) > trace_pid_list_free(pid_list); > > if ((type & TRACE_NO_PIDS) && no_pid_list) > trace_pid_list_free(no_pid_list); And the callers of trace_pid_list_is_set() are always in the RCU read section then? I assume so, since it wouldn't make sense otherwise. > -- Steve Sebastian
