We maintain a vmcore analysis script on each server that automatically parses /var/crash/XXXX/vmcore-dmesg.txt to categorize vmcores. This helps us save considerable effort by avoiding analysis of known bugs.
For vmcores triggered by a driver bug, the system calls print_modules() to list the loaded modules. However, print_modules() does not output module version information. Across a large fleet of servers, there are often many different module versions running simultaneously, and we need to know which driver version caused a given vmcore. Currently, the only reliable way to obtain the module version associated with a vmcore is to analyze the /var/crash/XXXX/vmcore file itself—an operation that is resource-intensive. Therefore, we propose printing the driver version directly in the log, which is far more efficient. - Before this patch Modules linked in: xfs nvidia-535.274.02(PO) nvme_core-1.0 mlx_compat(O) Unloaded tainted modules: nvidia_peermem(PO):1 - After this patch Modules linked in: xfs nvidia(PO) nvme_core mlx_compat(O) Unloaded tainted modules: nvidia_peermem(PO):1 Signed-off-by: Yafang Shao <[email protected]> --- kernel/module/main.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/kernel/module/main.c b/kernel/module/main.c index 710ee30b3bea..1ad9afec8730 100644 --- a/kernel/module/main.c +++ b/kernel/module/main.c @@ -3901,7 +3901,10 @@ void print_modules(void) list_for_each_entry_rcu(mod, &modules, list) { if (mod->state == MODULE_STATE_UNFORMED) continue; - pr_cont(" %s%s", mod->name, module_flags(mod, buf, true)); + pr_cont(" %s", mod->name); + if (mod->version) + pr_cont("-%s", mod->version); + pr_cont("%s", module_flags(mod, buf, true)); } print_unloaded_tainted_modules(); -- 2.43.5
