Hi Sebastian, Thank you for your review and the thoughtful questions.
1. Performance Data We encountered this issue in a production environment with 288 cores where enabling set_ftrace_pid caused system CPU usage (sys%) to increase from 10% to over 90%. In our 92-core VM test environment: Before patch (spinlock): - Without filtering: cs=2395401/s, sys%=7% - With filtering: cs=1828261/s, sys%=40% After patch (seqlock): - Without filtering: cs=2397032/s, sys%=6% - With filtering: cs=2398922/s, sys%=6% The seqlock approach eliminates the pid_list->lock contention that was previously causing sys% to increase from 7% to 40%. 2. Reader Retry Behavior Yes, if the write side is continuously busy, the reader might spin and retry. However, in practice: - Writes are infrequent (only when setting ftrace_pid filter or during task fork/exit with function-fork enabled) - For readers, trace_pid_list_is_set() is called on every task switch, which can occur at a very high frequency. 3. Result Accuracy You're correct that the result might change immediately after the read. For trace_ignore_this_task(), we don't require absolute accuracy. Slight race conditions (where a task might be traced or not in borderline cases) are acceptable. Best regards, Yongliang On Thu, Nov 13, 2025 at 3:34 PM Sebastian Andrzej Siewior <[email protected]> wrote: > > On 2025-11-13 08:02:52 [+0800], Yongliang Gao wrote: > > --- a/kernel/trace/pid_list.c > > +++ b/kernel/trace/pid_list.c > > @@ -138,14 +139,16 @@ bool trace_pid_list_is_set(struct trace_pid_list > > *pid_list, unsigned int pid) > > if (pid_split(pid, &upper1, &upper2, &lower) < 0) > > return false; > > > > - raw_spin_lock_irqsave(&pid_list->lock, flags); > > - upper_chunk = pid_list->upper[upper1]; > > - if (upper_chunk) { > > - lower_chunk = upper_chunk->data[upper2]; > > - if (lower_chunk) > > - ret = test_bit(lower, lower_chunk->data); > > - } > > - raw_spin_unlock_irqrestore(&pid_list->lock, flags); > > + do { > > + seq = read_seqcount_begin(&pid_list->seqcount); > > + ret = false; > > + upper_chunk = pid_list->upper[upper1]; > > + if (upper_chunk) { > > + lower_chunk = upper_chunk->data[upper2]; > > + if (lower_chunk) > > + ret = test_bit(lower, lower_chunk->data); > > + } > > + } while (read_seqcount_retry(&pid_list->seqcount, seq)); > > How is this better? Any numbers? > If the write side is busy and the lock is handed over from one CPU to > another then it is possible that the reader spins here and does several > loops, right? > And in this case, how accurate would it be? I mean the result could > change right after the sequence here is completed because the write side > got active again. How bad would it be if there would be no locking and > RCU ensures that the chunks (and data) don't disappear while looking at > it? > > > return ret; > > } > > Sebastian
