On 19.06.23 18:17, Aneesh Kumar K.V wrote:
David Hildenbrand <da...@redhat.com> writes:

On 09.06.23 08:08, Aneesh Kumar K.V wrote:
Certain devices can possess non-standard memory capacities, not constrained
to multiples of 1GB. Provide a kernel parameter so that we can map the
device memory completely on memory hotplug.

So, the unfortunate thing is that these devices would have worked out of
the box before the memory block size was increased from 256 MiB to 1 GiB
in these setups. Now, one has to fine-tune the memory block size. The
only other arch that I know, which supports setting the memory block
size, is x86 for special (large) UV systems -- and at least in the past
128 MiB vs. 2 GiB memory blocks made a performance difference during
boot (maybe no longer today, who knows).


Obviously, less tunable and getting stuff simply working out of the box
is preferable.

Two questions:

1) Isn't there a way to improve auto-detection to fallback to 256 MiB in
these setups, to avoid specifying these parameters?

The patch does try to detect as much as possible by looking at device tree
nodes and aperture window size. But there are still cases where we find
a memory aperture of size X GB and device driver hotplug X.YGB memory.


Okay, and I assume we can't detect that case easily.

Which interface is that device driver using to hotplug memory? It's quite surprising I have to say ...


2) Is the 256 MiB -> 1 GiB memory block size switch really worth it? On
x86-64, experiments (with direct map fragmentation) showed that the
effective performance boost is pretty insignificant, so I wonder how big
the 1 GiB direct map performance improvement is.


Tarun is running some tests to evaluate the impact. We used to use 1GiB
mapping always. This was later switched to use memory block size to fix
issues with memory unplug
commit af9d00e93a4f ("powerpc/mm/radix: Create separate mappings for hot-plugged 
memory")
explains some details related to that change.


IIUC, that commit (conditionally) increased the memory block size to avoid the splitting, correct? By that, it broke the device driver use case.




I guess the only real issue with 256 MiB memory blocks and 1 GiB direct
mapping is memory unplug of boot memory: when unplugging a 256 MiB
block, one would have to remap the 1 GiB range using 2 MiB ranges.


... I was wondering what would happen if you simply leave the direct
mapping in this corner case in place instead of doing this remapping.
IOW, remove the memory but keep the direct map pointing at the removed
memory. Nobody should be touching it, or are there any cases where that
could hurt?


Or is there any other reason why we really want 1 GiB memory blocks
instead of to defaulting to 256 MiB the way it used to be?


The idea we are working towards is to keep the memory block size small

That would be preferable, yes ...

but map the boot memory using 1G. An unplug request can split that 1G
mapping later. We could look at the possibility of leaving that mapping
without splitting. But not sure why we would want to do that if we can
correctly split things. Right now there is no splitting support in powerpc.

If splitting over-complicates the matter (and well, it will even consume more memory), it might at least be worth looking into that. Yes, it's cleaner.

I think there is also the option to fail memory offlining (and therefore unplug) if we have a 1 GiB mapping and don't want to split. For hotplugged memory it would always work to unplug again. aarch64 blocks any boot memory from getting unplugged.

But I guess that might break existing use cases (unplug boot memory) on ppc64 that rely on ZONE_MOVABLE to have it working with guarantees, right? Could be optimized but not sure if that's the best approach.


--
Cheers,

David / dhildenb

Reply via email to