On Mon, Jul 21, 2025 at 10:23:20AM +0200, Gabriele Monaco wrote: > DA monitor can be accessed from multiple cores simultaneously, this is > likely, for instance when dealing with per-task monitors reacting on > events that do not always occur on the CPU where the task is running. > This can cause race conditions where two events change the next state > and we see inconsistent values. E.g.: > > [62] event_srs: 27: sleepable x sched_wakeup -> running (final) > [63] event_srs: 27: sleepable x sched_set_state_sleepable -> sleepable > [63] error_srs: 27: event sched_switch_suspend not expected in the state > running > > In this case the monitor fails because the event on CPU 62 wins against > the one on CPU 63, although the correct state should have been > sleepable, since the task get suspended. > > Detect if the current state was modified by using try_cmpxchg while > storing the next value. If it was, try again reading the current state. > After a maximum number of failed retries, react by calling a special > tracepoint, print on the console and reset the monitor. > > Remove the functions da_monitor_curr_state() and da_monitor_set_state() > as they only hide the underlying implementation in this case. > > Monitors where this type of condition can occur must be able to account > for racing events in any possible order, as we cannot know the winner. > > Cc: Ingo Molnar <[email protected]> > Cc: Peter Zijlstra <[email protected]> > Signed-off-by: Gabriele Monaco <[email protected]> > --- > > static inline bool > \ > da_event_##name(struct da_monitor *da_mon, enum events_##name event) > \ > { > \ > - type curr_state = da_monitor_curr_state_##name(da_mon); > \ > - type next_state = model_get_next_state_##name(curr_state, event); > \ > - > \ > - if (next_state != INVALID_STATE) { > \ > - da_monitor_set_state_##name(da_mon, next_state); > \ > - > \ > - trace_event_##name(model_get_state_name_##name(curr_state), > \ > - model_get_event_name_##name(event), > \ > - model_get_state_name_##name(next_state), > \ > - model_is_final_state_##name(next_state)); > \ > - > \ > - return true; > \ > + enum states_##name curr_state, next_state; > \ > + > \ > + curr_state = READ_ONCE(da_mon->curr_state); > \ > + for (int i = 0; i < MAX_DA_RETRY_RACING_EVENTS; i++) { > \ > + next_state = model_get_next_state_##name(curr_state, event); > \ > + if (next_state == INVALID_STATE) { > \ > + cond_react_##name(curr_state, event); > \ > + > trace_error_##name(model_get_state_name_##name(curr_state), \ > + model_get_event_name_##name(event)); > \ > + return false; > \ > + } > \ > + if (likely(try_cmpxchg(&da_mon->curr_state, &curr_state, > next_state))) { \ > + > trace_event_##name(model_get_state_name_##name(curr_state), \ > + model_get_event_name_##name(event), > \ > + > model_get_state_name_##name(next_state), \ > + > model_is_final_state_##name(next_state)); \ > + return true; > \ > + } > \ > } > \ > > \ > - cond_react_##name(curr_state, event); > \ > - > \ > - trace_error_##name(model_get_state_name_##name(curr_state), > \ > - model_get_event_name_##name(event)); > \ > - > \ > + trace_rv_retries_error(#name, smp_processor_id()); > \ > + pr_warn("rv: " __stringify(MAX_DA_RETRY_RACING_EVENTS) > \ > + " retries reached, resetting monitor %s", #name); > \
smp_processor_id() requires preemption to be disabled. At the moment, trace point handler is called with preemption disabled, so we are fine. But there is plan to change that: https://lore.kernel.org/lkml/[email protected]/T/#u Perhaps use get_cpu() and put_cpu() instead? Nam
