On Thu, 17 Jul 2025 12:37:37 +0000
Fusheng Huang(黄富生) <fusheng.hu...@luxshare-ict.com> wrote:

> Dear rostedt, mhiramat:
> 
> We encountered a small problem of ftrace module, and need your help .
> 
> 【issue background】During the process of starting and loading ko in
> the Linux kernel, we encountered a crash of the CPU 7 null pointer
> caused by the synchronization operation of ftrace_events list, with a
> probability of less than 1/1000
> 
> 【call stack】
> 
>        CPU7
> 
>        load_module()
>        prepare_coming_module(inline)
>        blocking_notifier_call_chain_robust()
>        notifier_call_chain_robust(inline)
>        notifier_call_chain(inline)
>        trace_module_notify()
>        trace_module_add_evals(inline)
>        trace_insert_eval_map(inline)
>        trace_event_eval_update()
>                      |
>                      |...
>                      |---down_write(&trace_event_sem);
>                             list_for_each_entry_safe(call, p, &ftrace_events, 
> list) {
>                             /* events are usually grouped together with 
> systems */
>                             if (!last_system || call->class->system != 
> last_system) {
>                                    first = true;
>                                    last_i = 0;
>                                    last_system = call->class->system;
>                             }
>                             ......
> 
> 
>             CPU4
> 
>         load_module()

Hmm, for some reason I thought module loading was synchronized and
couldn't run in parallel. It may have been that way long ago, but
apparently it isn't now.

>          prepare_coming_module(inline)
>          blocking_notifier_call_chain_robust()
>          notifier_call_chain_robust()
>          trace_module_notify()
>          trace_module_add_events(inline)
>          __register_event(inline)
>                      |...
>                      |--list_add(&call->list, &ftrace_events);
>                      |...
> 
> 
> 
> 
>        Function1: trace_event_eval_update --->traverse the ftrace_events list
> 
>        Function2: __register_event --->list_add ftrace_events ,but no write 
> lock protect.
> 
>      Function1 and Function2 are running at the same time.
> 
> 
>       We want to know Is this a small bug or are there other considerations?

This looks like a small bug.

> 
>       Should we could add write lock protection to optimize this issue ?
> 
>       __register_event(inline)
>                 |...
>                 |+++down_write(&trace_event_sem);
>                 |
>                 |---list_add(&call->list, &ftrace_events);
>                 |
>                 |+++down_up(&trace_event_sem);
>               |...
> 

Can you see if this patch works for you?

diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index 120531268abf..f00d3901cd3b 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -3675,10 +3675,12 @@ static void trace_module_add_events(struct module *mod)
        start = mod->trace_events;
        end = mod->trace_events + mod->num_trace_events;
 
+       down_write(&trace_event_sem);
        for_each_event(call, start, end) {
                __register_event(*call, mod);
                __add_event_to_tracers(*call);
        }
+       up_write(&trace_event_sem);
 
        update_cache_events(mod);
 }

The remove takes that semaphore, I don't know why the addition doesn't.
Looks to be a long standing bug. Goes back to 110bf2b764eb6 that was
added in 2009. Probably because I assumed load_module() was
synchronized :-p

Thanks for the report!

-- Steve

Reply via email to