On Mon, Dec 29, 2025 at 10:45:56AM +0800, Yafang Shao wrote:
> We maintain a vmcore analysis script on each server that automatically
> parses /var/crash/XXXX/vmcore-dmesg.txt to categorize vmcores. This helps
> us save considerable effort by avoiding analysis of known bugs.
> 
> For vmcores triggered by a driver bug, the system calls print_modules() to
> list the loaded modules. However, print_modules() does not output module
> version information. Across a large fleet of servers, there are often many
> different module versions running simultaneously, and we need to know which
> driver version caused a given vmcore.
> 
> Currently, the only reliable way to obtain the module version associated
> with a vmcore is to analyze the /var/crash/XXXX/vmcore file itself—an
> operation that is resource-intensive. Therefore, we propose printing the
> driver version directly in the log, which is far more efficient.
> 
> - Before this patch
> 
>   Modules linked in: xfs nvidia-535.274.02(PO) nvme_core-1.0 mlx_compat(O)
>   Unloaded tainted modules: nvidia_peermem(PO):1
> 
> - After this patch
> 
>   Modules linked in: xfs nvidia(PO) nvme_core mlx_compat(O)
>   Unloaded tainted modules: nvidia_peermem(PO):1
> 
> Signed-off-by: Yafang Shao <[email protected]>
> ---
>  kernel/module/main.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/module/main.c b/kernel/module/main.c
> index 710ee30b3bea..1ad9afec8730 100644
> --- a/kernel/module/main.c
> +++ b/kernel/module/main.c
> @@ -3901,7 +3901,10 @@ void print_modules(void)
>       list_for_each_entry_rcu(mod, &modules, list) {
>               if (mod->state == MODULE_STATE_UNFORMED)
>                       continue;
> -             pr_cont(" %s%s", mod->name, module_flags(mod, buf, true));
> +             pr_cont(" %s", mod->name);
> +             if (mod->version)
> +                     pr_cont("-%s", mod->version);
> +             pr_cont("%s", module_flags(mod, buf, true));
>       }
>  
>       print_unloaded_tainted_modules();
> -- 
> 2.43.5
> 

Hi Yafang,

While I certainly appreciate the operational burden of managing a
large-scale fleet and the desire to automate crash triage, I am somewhat
hesitant to support this change in its current form.

Perhaps the more appropriate approach would be to extend the existing
module information infrastructure to include the version only when it is
explicitly requested: introduce print_module_versions().

In my view, while the requirement for better version visibility is valid,
we must ensure that the change does not compromise the readability of the
crash report for the rest of the community.

Nacked-by: Aaron Tomlin <[email protected]>


Kind regards,
-- 
Aaron Tomlin

Attachment: signature.asc
Description: PGP signature

Reply via email to