On Fri 03-07-20 11:24:17, Michal Hocko wrote: > [Cc Andi] > > On Fri 03-07-20 11:10:01, Michal Suchanek wrote: > > On Wed, Jul 01, 2020 at 02:21:10PM +0200, Michal Hocko wrote: > > > On Wed 01-07-20 13:30:57, David Hildenbrand wrote: > [...] > > > > Yep, looks like it. > > > > > > > > [ 0.009726] SRAT: PXM 1 -> APIC 0x00 -> Node 0 > > > > [ 0.009727] SRAT: PXM 1 -> APIC 0x01 -> Node 0 > > > > [ 0.009727] SRAT: PXM 1 -> APIC 0x02 -> Node 0 > > > > [ 0.009728] SRAT: PXM 1 -> APIC 0x03 -> Node 0 > > > > [ 0.009731] ACPI: SRAT: Node 0 PXM 1 [mem 0x00000000-0x0009ffff] > > > > [ 0.009732] ACPI: SRAT: Node 0 PXM 1 [mem 0x00100000-0xbfffffff] > > > > [ 0.009733] ACPI: SRAT: Node 0 PXM 1 [mem 0x100000000-0x13fffffff] > > > > > > This begs a question whether ppc can do the same thing? > > Or x86 stop doing it so that you can see on what node you are running? > > > > What's the point of this indirection other than another way of avoiding > > empty node 0? > > Honestly, I do not have any idea. I've traced it down to > Author: Andi Kleen <a...@suse.de> > Date: Tue Jan 11 15:35:48 2005 -0800 > > [PATCH] x86_64: Fix ACPI SRAT NUMA parsing > > Fix fallout from the recent nodemask_t changes. The node ids assigned > in the SRAT parser were off by one. > > I added a new first_unset_node() function to nodemask.h to allocate > IDs sanely. > > Signed-off-by: Andi Kleen <a...@suse.de> > Signed-off-by: Linus Torvalds <torva...@osdl.org> > > which doesn't really tell all that much. The historical baggage and a > long term behavior which is not really trivial to fix I suspect.
Thinking about this some more, this logic makes some sense afterall. Especially in the world without memory hotplug which was very likely the case back then. It is much better to have compact node mask rather than sparse one. After all node numbers shouldn't really matter as long as you have a clear mapping to the HW. I am not sure we export that information (except for the kernel ring buffer) though. The memory hotplug changes that somehow because you can hotremove numa nodes and therefore make the nodemask sparse but that is not a common case. I am not sure what would happen if a completely new node was added and its corresponding node was already used by the renumbered one though. It would likely conflate the two I am afraid. But I am not sure this is really possible with x86 and a lack of a bug report would suggest that nobody is doing that at least. -- Michal Hocko SUSE Labs