It's probably better to use 0x3fff00 to match Mesa and PAL. There is no benefit in using a different limit, not even a perf benefit, and it's better to be consistent with all UMDs.
Marek On Wed, Sep 10, 2025 at 7:54 AM Christian König <christian.koe...@amd.com> wrote: > On 10.09.25 11:34, Timur Kristóf wrote: > > On Wed, 2025-09-10 at 10:34 +0200, Christian König wrote: > >> On 09.09.25 18:56, Timur Kristóf wrote: > >>>> Even when we apply it I think we should drop that, the value the > >>>> kernel uses is correct. > >>> > >>> Hi Christian, > >>> > >>> The kernel and Mesa disagree on the limits for almost all SDMA > >>> versions, so it would be nice to actually understand what the > >>> limits of > >>> the SDMA HW are and use the same limit in the kernel and Mesa, or > >>> if > >>> that isn't viable, let's document why the different limits make > >>> sense. > >>> > >>> I'm adding Marek to CC, he wrote the comment that I referenced > >>> here. > >>> As far as I understand from my conversation with Marek, the kernel > >>> is > >>> actually wrong. > >>> > >>> If the limits depend on alignment, then we should either set a > >>> limit > >>> that is always safe, or make sure SDMA copies in the kernel are > >>> always > >>> aligned and add assertions about it. > >> > >> That's already done. See the size restrictions applied to BOs and the > >> callers of amdgpu_copy_buffer(). > > > > Based on the code comments I cited, as far as I understand, the issue > > is with copying the last 256 bytes of 2^22-1. Do I understood your > > response correctly that you are saying that the kernel isn't affected > > by this issue because it always copies things that are 256 byte > > aligned? > > Yes, see the kernel always works with at least PAGE_SIZE buffers (between > 4k and 64k depending on CPU architecture). You never get anything smaller > than that. > > The only exception is the debugger interface, but that always copies < > PAGE_SIZE, so the SDMA limits are irrelevant. > > > I checked the callers of amdgpu_copy_buffer and can't find what you are > > referring to. However, assuming that all callers use amdgpu_copy_buffer > > on 256 byte aligned addresses, there is still an issue with large BOs: > > > > When the kernel copies a BO that is larger than 0x3fffe0 bytes then it > > needs to emit multiple SDMA copy packets, and the copy done by the > > second packet (and other subsequent packets) won't be 256 byte aligned. > > I've just double checked the documentation and for SI it clearly states > that the limit is 0x3fffe0. Documentation for later HW says 2^22 - 1 - > start_addr[4:2] (which is 0x1f in the worst case so you end up with > 0x3fffe0 again). > > For backward, tiled and windowed copies a 256byte bounce buffer is used > instead of the 32byte buffer for the linear copy. So the limit is then > 0x3fff00. > > Looks like that buffer is also used for byte aligned copies and that limit > applies there as well. > > >> > >> We could add another warning to amdgpu_copy_buffer(), but that is > >> just the backend function. > >> > >>> Looking at the implementation of > >>> amdgpu_copy_buffer in the kernel, I see that it relies on > >>> copy_max_bytes and doesn't take alignment into account, so with the > >>> current limit it could issue subsequent copies that aren't 256 byte > >>> aligned. > >> > >> "Due to HW limitation, the maximum count may not be 2^n-1, can only > >> be 2^n - 1 - start_addr[4:2]" > >> > >> Well according to this comments the size restriction is 32 bytes (256 > >> bits!) and not 256 bytes. > >> > >> Were do you actually get the 256 bytes restriction from? > > > > The comments I cited above are from the following sources: > > > > PAL uses 1<<22-256 = 0x3fff00 here: > > > https://github.com/GPUOpen-Drivers/pal/blob/bcec463efe5260776d486a5e3da0c549bc0a75d2/src/core/hw/ossip/oss2/oss2DmaCmdBuffer.cpp#L308 > > > > Mesa also uses 0x3fff00 here: > > > https://gitlab.freedesktop.org/mesa/mesa/-/blob/47f5d25f93fd36224c112ee2d48e511ae078f8c2/src/amd/common/sid.h#L390 > > > > The limit in Mesa was added by this commit: > > > https://gitlab.freedesktop.org/mesa/mesa/-/commit/2c1f249f2b61be50222411bc0d41c095004232ed > > According to the commit message, Dave added this limit when hitting an > > issue trying to use SDMA with buffers that are larger than this. > > > > For SDMA v5.2 and newer, a larger limit was added by Marek later: > > > https://gitlab.freedesktop.org/mesa/mesa/-/commit/a54bcb9429666fcbe38c04660cc4b3f8abbde259 > > Which confirms the same issue copying the last 256 bytes on these > > versions, although in this case the kernel isn't technically wrong > > because it uses a smaller overall maximum. > > Yeah, I mean that totally makes sense. When you want allow byte aligned > copies as well then you have to use 0x3fff00 as maximum. > > The PAL code even has extra checks to speed up copies where src and dst > are byte aligned, but size isn't: > > if (IsPow2Aligned(srcGpuAddr, sizeof(uint32)) && > IsPow2Aligned(dstGpuAddr, sizeof(uint32)) && > (*pBytesCopied >= sizeof(uint32))) > .... > > So it totally makes sense to use the lower limit in Mesa but not for the > kernel. > > Regards, > Christian. >