numa: Register information into Kmemdump

David Hildenbrand Mon, 25 Aug 2025 07:07:41 -0700

On 25.08.25 15:36, Eugen Hristev wrote:



On 8/25/25 16:20, David Hildenbrand wrote:


IIRC, kernel/vmcore_info.c is never built as a module, as it also
accesses non-exported symbols.


Hello David,

I am looking again into this, and there are some things which in my
opinion would be difficult to achieve.
For example I looked into my patch #11 , which adds the `runqueues` into
kmemdump.

The runqueues is a variable of `struct rq` which is defined in
kernel/sched/sched.h , which is not supposed to be included outside of
sched.
Now moving all the struct definition outside of sched.h into another
public header would be rather painful and I don't think it's a really
good option (The struct would be needed to compute the sizeof inside
vmcoreinfo). Secondly, it would also imply moving all the nested struct
definitions outside as well. I doubt this is something that we want for
the sched subsys. How the subsys is designed, out of my understanding,
is to keep these internal structs opaque outside of it.


All the kmemdump module needs is a start and a length, correct? So the
only tricky part is getting the length.


I also have in mind the kernel user case. How would a kernel programmer
want to add some kernel structs/info/buffers into kmemdump such that the
dump would contain their data ? Having "KMEMDUMP_VAR(...)" looks simple
enough.

The other way around, why should anybody have a saying in adding theirdata to kmemdump? Why do we have that all over the kernel?


Is your mechanism really so special?

A single composer should take care of that, and it's really just start +len of physical memory areas.

Otherwise maybe the programmer has to write helpers to compute lengths
etc, and stitch them into kmemdump core.
I am not saying it's impossible, but just tiresome perhaps.

In your patch set, how many of these instances did you encounter wherethat was a problem?


One could just add a const variable that holds this information, or even
better, a simple helper function to calculate that.

Maybe someone else reading along has a better idea.


This could work, but it requires again adding some code into the
specific subsystem. E.g. struct_rq_get_size()
I am open to ideas , and thank you very much for your thoughts.


Interestingly, runqueues is a percpu variable, which makes me wonder if
what you had would work as intended (maybe it does, not sure).


I would not really need to dump the runqueues. But the crash tool which
I am using for testing, requires it. Without the runqueues it will not
progress further to load the kernel dump.
So I am not really sure what it does with the runqueues, but it works.
Perhaps using crash/gdb more, to actually do something with this data,
would give more insight about its utility.
For me, it is a prerequisite to run crash, and then to be able to
extract the log buffer from the dump.

I have the faint recollection that percpu vars might not be stored in asingle contiguous physical memory area, but maybe my memory is justwrong, that's why I was raising it.


  From my perspective it's much simpler and cleaner to just add the
kmemdump annotation macro inside the sched/core.c as it's done in my
patch. This macro translates to a noop if kmemdump is not selected.


I really don't like how we are spreading kmemdump all over the kernel,
and adding complexity with __section when really, all we need is a place
to obtain a start and a length.


I understand. The section idea was suggested by Thomas. Initially I was
skeptic, but I like how it turned out.


Yeah, I don't like it. Taste differs ;)

I am in particular unhappy about custom memblock wrappers.

[...]


To have this working outside of printk, it would be required to walk
through all the printk structs/allocations and select the required info.
Is this something that we want to do outside of printk ?


I don't follow, please elaborate.

How is e.g., log_buf_len_get() + log_buf_addr_get() not sufficient,
given that you run your initialization after setup_log_buf() ?


My initial thought was the same. However I got some feedback from Petr
Mladek here :

https://lore.kernel.org/lkml/aBm5QH2p6p9Wxe_M@localhost.localdomain/

Where he explained how to register the structs correctly.
It can be that setup_log_buf is called again at a later time perhaps.

setup_log_buf() is a __init function, so there is only a certain timeframe where it can be called.

In particular, once the buddy is up, memblock allocations are impossibleand it would be deeply flawed to call this function again.


Let's not over-engineer this.

Peter is on CC, so hopefully he can share his thoughts.

--
Cheers

David / dhildenb

Re: [RFC][PATCH v2 22/29] mm/numa: Register information into Kmemdump

Reply via email to