On Fri 2026-06-26 15:35:19, Bradley Morgan wrote: > On June 26, 2026 3:26:11 PM GMT+01:00, Petr Mladek <[email protected]> > wrote: > >On Fri 2026-06-26 13:32:38, Bradley Morgan wrote: > >> On June 26, 2026 1:17:13 PM GMT+01:00, Bradley Morgan > ><[email protected]> > >> wrote: > >> >On June 26, 2026 1:14:14 PM GMT+01:00, Petr Mladek <[email protected]> > >> >wrote: > >> >>On Fri 2026-06-26 12:23:50, Petr Mladek wrote: > >> >>> On Thu 2026-06-25 15:25:58, Bradley Morgan wrote: > >> >>> But it all becomes very hairy. We have several levels: > >> >>> > >> >>> + watchdog-all_bt-specific option, e.g. > >> >>sysctl_hardlockup_all_cpu_backtrace > >> >>> > >> >>> + watchdog-specific si_info preferences, e.g. hardlockup_si_mask > >> >>> > >> >>> + panic-specific si_info: panic_print > >> >>> > >> >>> + universal fallback for any layer: kernel_si_info > >> >>> > >> >>> Now, we try to check all these variables back and forth to > >> >>> trigger all backtraces or to avoid triggering them. > >> >>> And it clearly does not work well and the code is more and more > >> >>> hairy. > >> >>> > >> >>> I think about another approach. The word "waterfall" comes to my > >mind. > >> >>> Instead of checking all the settings back and forth, let's process > >> >>> each setting one by one and just remember what has been done and > >> >>> skip this in the next level. > >> >>> > >> >>> All the si_info actions seems to dump a global system state. > >> >>> So, it would make sense to remember the state in a global variable > >> >>> even when it might be modified by more CPUs in parallel. > >> >>> > >> Hmm.. new idea > >> > >> kernel/dump_filter.c ? > >> > >> What this file could do is to handle a generic lockup state machine > >> so any subsystem can log what it already dumped? > >> > >> I know it may bloat, but it's better then cramming fixes in. > > > >I am not sure what exactly you would like to achieve but it sounds > >a bit scary ;-) > > > >Anyway, we should not synchronize the watchdog reports against > >each other, definitely. They are running in non-compatible contexts > >(task vs interrupt vs NMI). Also we should not add any locking > >because they usually print something when the system has enough > >troubles. > > > >Also I think that it is not worth preventing duplicated backtraces > >or reports from a single CPU. IMHO, it is not a big problem > >in practice. > > > >So, we are down to large reports, like backtraces from all CPUs, > >timers, locks, ... which are handled by sys_info(). So, I think > >that it should be enough to handle this inside the sys_info() API. > > > >I do not want to say that my proposal was the best solution. > >I am sure that there are better ones. But we need to consider > >the gain vs. complexity. > > > >Honestly, I am already a bit scared by the complexity which > >we the sys_info() API added. And it is hard to imagine that > >adding another API would make it easier. But I might be wrong. > > > >Instead, it might make sense to integrate the conflicting > >subsystem-specific calls under the sys_info() API. > >I mean that, for example watchdog_hardlockup_check() won't > >call trigger_allbutcpu_cpu_backtrace() directly but > >it would call it via sys_info() API so that sys_info() > >could keep track of it. Something like: > > > >void sys_info_allbutcpu_bt(int cpu) > >{ > > trigger_allbutcpu_cpu_backtrace(cpu); > > /* > > * The caller likely printed backtrace of the given @cpu > > * on its own. Prevent duplicate backtraces from all > > * CPUs with potential next sys_info() call. > > */ > > sys_info_done(SYS_INFO_ALL_BT); > >} > > > >But I am not sure if it is really easier to follow > >than calling sys_info_done() from the watchdog code. > > > >Some watchdogs try to optimize the output and print backtraces > >only from CPUs which are relevant for the given lockup. > >We should keep the logic for selecting the set of CPUs > >in the watchdog code. We just need to solve how to elegantly > >make sys_info() aware of it or at least about the more massive > >reports. > > > >Anyway, I would prefer to keep it simple until we see some problems > >in practice. > > > >Best Regards, > >Petr > > > > > I understand it's scary. To make a new file in the first place. > > But I was a bit vague of what I wanted, and I'm sorry. > > So, the reason why I'd suggest a new file, is because if any subsystem > Theoretically bypasses sys_info to log a lockup, this completely misses > the filter and duplicates the dump > > My file would act as a generic lockless state machine that any > subsystem can update regardless of how they dump logs. > > If you have any questions, feel absolutely free to ask! :) > > Discussion is a way to make everyone happy!
Honestly, I am more and more wondering whether your are a real person or AI bot. Best Regards, Petr
