On Fri 2026-06-26 15:35:19, Bradley Morgan wrote:
> On June 26, 2026 3:26:11 PM GMT+01:00, Petr Mladek <[email protected]>
> wrote:
> >On Fri 2026-06-26 13:32:38, Bradley Morgan wrote:
> >> On June 26, 2026 1:17:13 PM GMT+01:00, Bradley Morgan
> ><[email protected]>
> >> wrote:
> >> >On June 26, 2026 1:14:14 PM GMT+01:00, Petr Mladek <[email protected]>
> >> >wrote:
> >> >>On Fri 2026-06-26 12:23:50, Petr Mladek wrote:
> >> >>> On Thu 2026-06-25 15:25:58, Bradley Morgan wrote:
> >> >>> But it all becomes very hairy. We have several levels:
> >> >>> 
> >> >>>    + watchdog-all_bt-specific option, e.g.
> >> >>sysctl_hardlockup_all_cpu_backtrace
> >> >>> 
> >> >>>    + watchdog-specific si_info preferences, e.g. hardlockup_si_mask
> >> >>> 
> >> >>>    + panic-specific si_info: panic_print
> >> >>> 
> >> >>>    + universal fallback for any layer: kernel_si_info
> >> >>> 
> >> >>> Now, we try to check all these variables back and forth to
> >> >>> trigger all backtraces or to avoid triggering them.
> >> >>> And it clearly does not work well and the code is more and more
> >> >>> hairy.
> >> >>> 
> >> >>> I think about another approach. The word "waterfall" comes to my
> >mind.
> >> >>> Instead of checking all the settings back and forth, let's process
> >> >>> each setting one by one and just remember what has been done and
> >> >>> skip this in the next level.
> >> >>> 
> >> >>> All the si_info actions seems to dump a global system state.
> >> >>> So, it would make sense to remember the state in a global variable
> >> >>> even when it might be modified by more CPUs in parallel.
> >> >>> 
> >> Hmm.. new idea 
> >> 
> >> kernel/dump_filter.c ?
> >> 
> >> What this file could do is to handle a generic lockup state machine
> >> so any subsystem can log what it already dumped?
> >> 
> >> I know it may bloat, but it's better then cramming fixes in.
> >
> >I am not sure what exactly you would like to achieve but it sounds
> >a bit scary ;-)
> >
> >Anyway, we should not synchronize the watchdog reports against
> >each other, definitely. They are running in non-compatible contexts
> >(task vs interrupt vs NMI). Also we should not add any locking
> >because they usually print something when the system has enough
> >troubles.
> >
> >Also I think that it is not worth preventing duplicated backtraces
> >or reports from a single CPU. IMHO, it is not a big problem
> >in practice.
> >
> >So, we are down to large reports, like backtraces from all CPUs,
> >timers, locks, ... which are handled by sys_info(). So, I think
> >that it should be enough to handle this inside the sys_info() API.
> >
> >I do not want to say that my proposal was the best solution.
> >I am sure that there are better ones. But we need to consider
> >the gain vs. complexity.
> >
> >Honestly, I am already a bit scared by the complexity which
> >we the sys_info() API added. And it is hard to imagine that
> >adding another API would make it easier. But I might be wrong.
> >
> >Instead, it might make sense to integrate the conflicting
> >subsystem-specific calls under the sys_info() API.
> >I mean that, for example watchdog_hardlockup_check() won't
> >call trigger_allbutcpu_cpu_backtrace() directly but
> >it would call it via sys_info() API so that sys_info()
> >could keep track of it. Something like:
> >
> >void sys_info_allbutcpu_bt(int cpu)
> >{
> >     trigger_allbutcpu_cpu_backtrace(cpu);
> >     /*
> >      * The caller likely printed backtrace of the given @cpu
> >      * on its own. Prevent duplicate backtraces from all
> >      * CPUs with potential next sys_info() call.
> >      */
> >     sys_info_done(SYS_INFO_ALL_BT);
> >}
> >
> >But I am not sure if it is really easier to follow
> >than calling sys_info_done() from the watchdog code.
> >
> >Some watchdogs try to optimize the output and print backtraces
> >only from CPUs which are relevant for the given lockup.
> >We should keep the logic for selecting the set of CPUs
> >in the watchdog code. We just need to solve how to elegantly
> >make sys_info() aware of it or at least about the more massive
> >reports.
> >
> >Anyway, I would prefer to keep it simple until we see some problems
> >in practice.
> >
> >Best Regards,
> >Petr
> >
> 
> 
> I understand it's scary. To make a new file in the first place.
> 
> But I was a bit vague of what I wanted, and I'm sorry.
> 
> So, the reason why I'd suggest a new file, is because if any subsystem
> Theoretically bypasses sys_info to log a lockup, this completely misses
> the filter and duplicates the dump
> 
> My file would act as a generic lockless state machine that any
> subsystem can update regardless of how they dump logs.
> 
> If you have any questions, feel absolutely free to ask! :)
> 
> Discussion is a way to make everyone happy!

Honestly, I am more and more wondering whether your are a real person
or AI bot.

Best Regards,
Petr

Reply via email to