On Thu, 2025-05-01 at 15:45 -0400, Steven Rostedt wrote:
> On Tue, 22 Apr 2025 20:33:13 +0200
> Paul Cacheux via B4 Relay <[email protected]>
> wrote:
> 
> > From: Paul Cacheux <[email protected]>
> 
> Sorry for the late reply, I just noticed this patch.

No problem at all, thanks for looking at my patch.

> 
> > 
> > When creating a trace probe a global variable is modified and this
> > data used when an error is raised and the error message generated.
> > 
> > Modification of this global variable is done without any lock and
> > multiple trace operations will race, causing some potential issues
> > when generating the error.
> > 
> > This commit moves away from the global variable and passes the
> > error context as a regular function argument.
> > 
> > Fixes: ab105a4fb894 ("tracing: Use tracing error_log with probe
> > events")
> > 
> > Signed-off-by: Paul Cacheux <[email protected]>
> > ---
> > As reported in [1] a race exists in the shared trace probe log
> > used to build error messages. This can cause kernel crashes
> > when building the actual error message, but the race happens
> > even for non-error tracefs uses, it's just not visible.
> > 
> > Reproducer first reported that is still crashing:
> > 
> >   # 'p4' is invalid command which make kernel run into
> > trace_probe_log_err()
> >   cd /sys/kernel/debug/tracing
> >   while true; do
> >     echo 'p4:myprobe1 do_sys_openat2 dfd=%ax filename=%dx flags=%cx
> > mode=+4($stack)' >> kprobe_events &
> >     echo 'p4:myprobe2 do_sys_openat2' >> kprobe_events &
> >     echo 'p4:myprobe3 do_sys_openat2 dfd=%ax filename=%dx' >>
> > kprobe_events &
> >   done;
> > 
> > The original email suggested to use a mutex or to allocate the
> > trace_probe_log on the stack. The mutex can cause performance
> > issues, and require high confidence in the correctness of the
> > current trace_probe_log_clear calls. This patch implements
> > the stack solution instead and passes a pointer to using
> > functions.
> > 
> > [1]
> > https://lore.kernel.org/all/[email protected]/T/
> 
> Honestly, I don't like either approach.
> 
> What could be done is wrap the internals of the function in a mutex
> so they
> are not re-entrant (using guard(mutex)). If two error codes are
> happening
> together, just let it get corrupted. There should never be two
> additions at
> the same time, and if the admin is doing that then they deserve what
> they
> get.

Just to double check, what you are suggesting here is to include a
mutex in the shared trace_probe_log entry, and to lock it in all
accessors functions (trace_probe_log_{init,set_index,clear,err})?

> 
> I don't care if the error log gets garbage if there's multiple
> accesses at
> the same time. The fix should only prevent it from crashing.
> 
> -- Steve
> 
> 
> -- Steve

Thanks for the feedback,
Paul Cacheux

Reply via email to