On June 26, 2026 3:47:12 PM GMT+01:00, Petr Mladek <[email protected]> wrote: >On Fri 2026-06-26 15:35:19, Bradley Morgan wrote: >> On June 26, 2026 3:26:11 PM GMT+01:00, Petr Mladek <[email protected]> >> wrote: >> >On Fri 2026-06-26 13:32:38, Bradley Morgan wrote: >> >> On June 26, 2026 1:17:13 PM GMT+01:00, Bradley Morgan >> ><[email protected]> >> >> wrote: >> >> >On June 26, 2026 1:14:14 PM GMT+01:00, Petr Mladek ><[email protected]> >> >> >wrote: >> >> >>On Fri 2026-06-26 12:23:50, Petr Mladek wrote: >> >> >>> On Thu 2026-06-25 15:25:58, Bradley Morgan wrote: >> >> >>> But it all becomes very hairy. We have several levels: >> >> >>> >> >> >>> + watchdog-all_bt-specific option, e.g. >> >> >>sysctl_hardlockup_all_cpu_backtrace >> >> >>> >> >> >>> + watchdog-specific si_info preferences, e.g. >hardlockup_si_mask >> >> >>> >> >> >>> + panic-specific si_info: panic_print >> >> >>> >> >> >>> + universal fallback for any layer: kernel_si_info >> >> >>> >> >> >>> Now, we try to check all these variables back and forth to >> >> >>> trigger all backtraces or to avoid triggering them. >> >> >>> And it clearly does not work well and the code is more and more >> >> >>> hairy. >> >> >>> >> >> >>> I think about another approach. The word "waterfall" comes to my >> >mind. >> >> >>> Instead of checking all the settings back and forth, let's >process >> >> >>> each setting one by one and just remember what has been done and >> >> >>> skip this in the next level. >> >> >>> >> >> >>> All the si_info actions seems to dump a global system state. >> >> >>> So, it would make sense to remember the state in a global >variable >> >> >>> even when it might be modified by more CPUs in parallel. >> >> >>> >> >> Hmm.. new idea >> >> >> >> kernel/dump_filter.c ? >> >> >> >> What this file could do is to handle a generic lockup state machine >> >> so any subsystem can log what it already dumped? >> >> >> >> I know it may bloat, but it's better then cramming fixes in. >> > >> >I am not sure what exactly you would like to achieve but it sounds >> >a bit scary ;-) >> > >> >Anyway, we should not synchronize the watchdog reports against >> >each other, definitely. They are running in non-compatible contexts >> >(task vs interrupt vs NMI). Also we should not add any locking >> >because they usually print something when the system has enough >> >troubles. >> > >> >Also I think that it is not worth preventing duplicated backtraces >> >or reports from a single CPU. IMHO, it is not a big problem >> >in practice. >> > >> >So, we are down to large reports, like backtraces from all CPUs, >> >timers, locks, ... which are handled by sys_info(). So, I think >> >that it should be enough to handle this inside the sys_info() API. >> > >> >I do not want to say that my proposal was the best solution. >> >I am sure that there are better ones. But we need to consider >> >the gain vs. complexity. >> > >> >Honestly, I am already a bit scared by the complexity which >> >we the sys_info() API added. And it is hard to imagine that >> >adding another API would make it easier. But I might be wrong. >> > >> >Instead, it might make sense to integrate the conflicting >> >subsystem-specific calls under the sys_info() API. >> >I mean that, for example watchdog_hardlockup_check() won't >> >call trigger_allbutcpu_cpu_backtrace() directly but >> >it would call it via sys_info() API so that sys_info() >> >could keep track of it. Something like: >> > >> >void sys_info_allbutcpu_bt(int cpu) >> >{ >> > trigger_allbutcpu_cpu_backtrace(cpu); >> > /* >> > * The caller likely printed backtrace of the given @cpu >> > * on its own. Prevent duplicate backtraces from all >> > * CPUs with potential next sys_info() call. >> > */ >> > sys_info_done(SYS_INFO_ALL_BT); >> >} >> > >> >But I am not sure if it is really easier to follow >> >than calling sys_info_done() from the watchdog code. >> > >> >Some watchdogs try to optimize the output and print backtraces >> >only from CPUs which are relevant for the given lockup. >> >We should keep the logic for selecting the set of CPUs >> >in the watchdog code. We just need to solve how to elegantly >> >make sys_info() aware of it or at least about the more massive >> >reports. >> > >> >Anyway, I would prefer to keep it simple until we see some problems >> >in practice. >> > >> >Best Regards, >> >Petr >> > >> >> >> I understand it's scary. To make a new file in the first place. >> >> But I was a bit vague of what I wanted, and I'm sorry. >> >> So, the reason why I'd suggest a new file, is because if any subsystem >> Theoretically bypasses sys_info to log a lockup, this completely misses >> the filter and duplicates the dump >> >> My file would act as a generic lockless state machine that any >> subsystem can update regardless of how they dump logs. >> >> If you have any questions, feel absolutely free to ask! :) >> >> Discussion is a way to make everyone happy! > >Honestly, I am more and more wondering whether your are a real person >or AI bot.
Sigh.. I can verify myself through video call if you don't believe I am human :) why I suggested a new file is because AI said it would be a good idea. I told it what I should do, and it told me to do a new file. I knew it was over engineering slightly, but I was a bit stressed, and I wanted some sort of just new API which is less buggy imho I should've told you that I used AI to figure the whole new file idea, Really sorry petr.. >Best Regards, >Petr > Thanks!
