On Thu, 17 Jul 2025 12:37:37 +0000 Fusheng Huang(黄富生) <fusheng.hu...@luxshare-ict.com> wrote:
> Dear rostedt, mhiramat: > > We encountered a small problem of ftrace module, and need your help . > > 【issue background】During the process of starting and loading ko in > the Linux kernel, we encountered a crash of the CPU 7 null pointer > caused by the synchronization operation of ftrace_events list, with a > probability of less than 1/1000 > > 【call stack】 > > CPU7 > > load_module() > prepare_coming_module(inline) > blocking_notifier_call_chain_robust() > notifier_call_chain_robust(inline) > notifier_call_chain(inline) > trace_module_notify() > trace_module_add_evals(inline) > trace_insert_eval_map(inline) > trace_event_eval_update() > | > |... > |---down_write(&trace_event_sem); > list_for_each_entry_safe(call, p, &ftrace_events, > list) { > /* events are usually grouped together with > systems */ > if (!last_system || call->class->system != > last_system) { > first = true; > last_i = 0; > last_system = call->class->system; > } > ...... > > > CPU4 > > load_module() Hmm, for some reason I thought module loading was synchronized and couldn't run in parallel. It may have been that way long ago, but apparently it isn't now. > prepare_coming_module(inline) > blocking_notifier_call_chain_robust() > notifier_call_chain_robust() > trace_module_notify() > trace_module_add_events(inline) > __register_event(inline) > |... > |--list_add(&call->list, &ftrace_events); > |... > > > > > Function1: trace_event_eval_update --->traverse the ftrace_events list > > Function2: __register_event --->list_add ftrace_events ,but no write > lock protect. > > Function1 and Function2 are running at the same time. > > > We want to know Is this a small bug or are there other considerations? This looks like a small bug. > > Should we could add write lock protection to optimize this issue ? > > __register_event(inline) > |... > |+++down_write(&trace_event_sem); > | > |---list_add(&call->list, &ftrace_events); > | > |+++down_up(&trace_event_sem); > |... > Can you see if this patch works for you? diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c index 120531268abf..f00d3901cd3b 100644 --- a/kernel/trace/trace_events.c +++ b/kernel/trace/trace_events.c @@ -3675,10 +3675,12 @@ static void trace_module_add_events(struct module *mod) start = mod->trace_events; end = mod->trace_events + mod->num_trace_events; + down_write(&trace_event_sem); for_each_event(call, start, end) { __register_event(*call, mod); __add_event_to_tracers(*call); } + up_write(&trace_event_sem); update_cache_events(mod); } The remove takes that semaphore, I don't know why the addition doesn't. Looks to be a long standing bug. Goes back to 110bf2b764eb6 that was added in 2009. Probably because I assumed load_module() was synchronized :-p Thanks for the report! -- Steve