On Tue, Dec 30, 2025 at 11:11 AM Aaron Tomlin <[email protected]> wrote:
>
> On Mon, Dec 29, 2025 at 10:45:56AM +0800, Yafang Shao wrote:
> > We maintain a vmcore analysis script on each server that automatically
> > parses /var/crash/XXXX/vmcore-dmesg.txt to categorize vmcores. This helps
> > us save considerable effort by avoiding analysis of known bugs.
> >
> > For vmcores triggered by a driver bug, the system calls print_modules() to
> > list the loaded modules. However, print_modules() does not output module
> > version information. Across a large fleet of servers, there are often many
> > different module versions running simultaneously, and we need to know which
> > driver version caused a given vmcore.
> >
> > Currently, the only reliable way to obtain the module version associated
> > with a vmcore is to analyze the /var/crash/XXXX/vmcore file itself—an
> > operation that is resource-intensive. Therefore, we propose printing the
> > driver version directly in the log, which is far more efficient.
> >
> > - Before this patch
> >
> > Modules linked in: xfs nvidia-535.274.02(PO) nvme_core-1.0 mlx_compat(O)
> > Unloaded tainted modules: nvidia_peermem(PO):1
> >
> > - After this patch
> >
> > Modules linked in: xfs nvidia(PO) nvme_core mlx_compat(O)
> > Unloaded tainted modules: nvidia_peermem(PO):1
> >
> > Signed-off-by: Yafang Shao <[email protected]>
> > ---
> > kernel/module/main.c | 5 ++++-
> > 1 file changed, 4 insertions(+), 1 deletion(-)
> >
> > diff --git a/kernel/module/main.c b/kernel/module/main.c
> > index 710ee30b3bea..1ad9afec8730 100644
> > --- a/kernel/module/main.c
> > +++ b/kernel/module/main.c
> > @@ -3901,7 +3901,10 @@ void print_modules(void)
> > list_for_each_entry_rcu(mod, &modules, list) {
> > if (mod->state == MODULE_STATE_UNFORMED)
> > continue;
> > - pr_cont(" %s%s", mod->name, module_flags(mod, buf, true));
> > + pr_cont(" %s", mod->name);
> > + if (mod->version)
> > + pr_cont("-%s", mod->version);
> > + pr_cont("%s", module_flags(mod, buf, true));
> > }
> >
> > print_unloaded_tainted_modules();
> > --
> > 2.43.5
> >
>
> Hi Yafang,
>
> While I certainly appreciate the operational burden of managing a
> large-scale fleet and the desire to automate crash triage, I am somewhat
> hesitant to support this change in its current form.
>
> Perhaps the more appropriate approach would be to extend the existing
> module information infrastructure to include the version only when it is
> explicitly requested: introduce print_module_versions().
Isn't that redundant since print_modules() already outputs module names?
>
> In my view, while the requirement for better version visibility is valid,
> we must ensure that the change does not compromise the readability of the
> crash report for the rest of the community.
I understand your concern, but could you elaborate on the potential
troubles? The extraction is straightforward with simple text
processing.
$ cat vmcore-dmesg.txt | awk -F': ' '/Modules linked
in:/{gsub(/\([^)]*\)/, "", $2); n=split($2,a," "); for(i=1;i<=n;i++)
if(a[i]!="") print a[i]}'
Besides, kernel logs aren't an ABI—developers are expected to adapt to
upstream changes. Otherwise, the kernel itself would become
unmaintainable.
>
> Nacked-by: Aaron Tomlin <[email protected]>
--
Regards
Yafang