On Wed, 2025-09-10 at 11:10 -0400, Marek Olšák wrote: > I added the comment into Mesa that 0x3fff00 is the limit. I did > research on that bug separately from PAL, but I don't remember the > details. > > There is no performance to gain here. It's only about consistency and > clear communication to the public what the recommended SDMA > programming is. > > Marek
Hi Christian & Marek, I'd be happy to submit a new patch series if we can agree on what the limits are for each SDMA version and how to get the best performance from it. I trust Marek's word that he did his research when he came up with 0x3fff00 so I don't understand the pushback here. (1) According to a conversation between Marek and myself, the most optimal way to program the SDMA would be to use 256-byte-aligned copy packets. Christian, do you agree with this? If yes, I can write a patch for that. If not, let me know what you would prefer. (2) I'd like us to come to an agreement on the maximum amount of bytes copied per SDMA packet, and align the kernel code with Mesa. At the moment, the maximum limit is different for every SDMA version between the kernel and Mesa, which makes the whole topic super confusing. Please keep in mind that if there is any chance that the current limit is indeed incorrect (as it sounds like it is), we risk "random" SDMA hangs on users' machines. Thanks & best regards, Timur > > On Wed, Sep 10, 2025 at 10:57 AM Christian König > <christian.koe...@amd.com> wrote: > > On 10.09.25 16:52, Marek Olšák wrote: > > > It's probably better to use 0x3fff00 to match Mesa and PAL. There > > > is no benefit in using a different limit, not even a perf > > > benefit, and it's better to be consistent with all UMDs. > > > > Unification with Mesa is certainly a valid argument, but just using > > an approach because PAL does it has proven to be troublesome > > before. > > > > If we are interested in best performance we should actually round > > down the value to the next multiple of PAGE_SIZE. > > > > Christian. > > > > > > > > Marek > > > > > > On Wed, Sep 10, 2025 at 7:54 AM Christian König > > > <christian.koe...@amd.com <mailto:christian.koe...@amd.com>> > > > wrote: > > > > > > On 10.09.25 11:34, Timur Kristóf wrote: > > > > On Wed, 2025-09-10 at 10:34 +0200, Christian König wrote: > > > >> On 09.09.25 18:56, Timur Kristóf wrote: > > > >>>> Even when we apply it I think we should drop that, the > > > value the > > > >>>> kernel uses is correct. > > > >>> > > > >>> Hi Christian, > > > >>> > > > >>> The kernel and Mesa disagree on the limits for almost > > > all SDMA > > > >>> versions, so it would be nice to actually understand > > > what the > > > >>> limits of > > > >>> the SDMA HW are and use the same limit in the kernel and > > > Mesa, or > > > >>> if > > > >>> that isn't viable, let's document why the different > > > limits make > > > >>> sense. > > > >>> > > > >>> I'm adding Marek to CC, he wrote the comment that I > > > referenced > > > >>> here. > > > >>> As far as I understand from my conversation with Marek, > > > the kernel > > > >>> is > > > >>> actually wrong. > > > >>> > > > >>> If the limits depend on alignment, then we should either > > > set a > > > >>> limit > > > >>> that is always safe, or make sure SDMA copies in the > > > kernel are > > > >>> always > > > >>> aligned and add assertions about it. > > > >> > > > >> That's already done. See the size restrictions applied to > > > BOs and the > > > >> callers of amdgpu_copy_buffer(). > > > > > > > > Based on the code comments I cited, as far as I > > > understand, the issue > > > > is with copying the last 256 bytes of 2^22-1. Do I > > > understood your > > > > response correctly that you are saying that the kernel > > > isn't affected > > > > by this issue because it always copies things that are 256 > > > byte > > > > aligned? > > > > > > Yes, see the kernel always works with at least PAGE_SIZE > > > buffers (between 4k and 64k depending on CPU architecture). You > > > never get anything smaller than that. > > > > > > The only exception is the debugger interface, but that > > > always copies < PAGE_SIZE, so the SDMA limits are irrelevant. > > > > > > > I checked the callers of amdgpu_copy_buffer and can't find > > > what you are > > > > referring to. However, assuming that all callers use > > > amdgpu_copy_buffer > > > > on 256 byte aligned addresses, there is still an issue > > > with large BOs: > > > > > > > > When the kernel copies a BO that is larger than 0x3fffe0 > > > bytes then it > > > > needs to emit multiple SDMA copy packets, and the copy > > > done by the > > > > second packet (and other subsequent packets) won't be 256 > > > byte aligned. > > > > > > I've just double checked the documentation and for SI it > > > clearly states that the limit is 0x3fffe0. Documentation for > > > later HW says 2^22 - 1 - start_addr[4:2] (which is 0x1f in the > > > worst case so you end up with 0x3fffe0 again). > > > > > > For backward, tiled and windowed copies a 256byte bounce > > > buffer is used instead of the 32byte buffer for the linear copy. > > > So the limit is then 0x3fff00. > > > > > > Looks like that buffer is also used for byte aligned copies > > > and that limit applies there as well. > > > > > > >> > > > >> We could add another warning to amdgpu_copy_buffer(), but > > > that is > > > >> just the backend function. > > > >> > > > >>> Looking at the implementation of > > > >>> amdgpu_copy_buffer in the kernel, I see that it relies > > > on > > > >>> copy_max_bytes and doesn't take alignment into account, > > > so with the > > > >>> current limit it could issue subsequent copies that > > > aren't 256 byte > > > >>> aligned. > > > >> > > > >> "Due to HW limitation, the maximum count may not be 2^n- > > > 1, can only > > > >> be 2^n - 1 - start_addr[4:2]" > > > >> > > > >> Well according to this comments the size restriction is > > > 32 bytes (256 > > > >> bits!) and not 256 bytes. > > > >> > > > >> Were do you actually get the 256 bytes restriction from? > > > > > > > > The comments I cited above are from the following sources: > > > > > > > > PAL uses 1<<22-256 = 0x3fff00 here: > > > > > > > https://github.com/GPUOpen-Drivers/pal/blob/bcec463efe5260776d486a5e3da0c549bc0a75d2/src/core/hw/ossip/oss2/oss2DmaCmdBuffer.cpp#L308 > > > <https://github.com/GPUOpen-Drivers/pal/blob/bcec463efe5260776d48 > > > 6a5e3da0c549bc0a75d2/src/core/hw/ossip/oss2/oss2DmaCmdBuffer.cpp# > > > L308> > > > > > > > > Mesa also uses 0x3fff00 here: > > > > > > > https://gitlab.freedesktop.org/mesa/mesa/-/blob/47f5d25f93fd36224c112ee2d48e511ae078f8c2/src/amd/common/sid.h#L390 > > > <https://gitlab.freedesktop.org/mesa/mesa/-/blob/47f5d25f93fd3622 > > > 4c112ee2d48e511ae078f8c2/src/amd/common/sid.h#L390> > > > > > > > > The limit in Mesa was added by this commit: > > > > > > > https://gitlab.freedesktop.org/mesa/mesa/-/commit/2c1f249f2b61be50222411bc0d41c095004232ed > > > <https://gitlab.freedesktop.org/mesa/mesa/-/commit/2c1f249f2b61be > > > 50222411bc0d41c095004232ed> > > > > According to the commit message, Dave added this limit > > > when hitting an > > > > issue trying to use SDMA with buffers that are larger than > > > this. > > > > > > > > For SDMA v5.2 and newer, a larger limit was added by Marek > > > later: > > > > > > > https://gitlab.freedesktop.org/mesa/mesa/-/commit/a54bcb9429666fcbe38c04660cc4b3f8abbde259 > > > <https://gitlab.freedesktop.org/mesa/mesa/-/commit/a54bcb9429666f > > > cbe38c04660cc4b3f8abbde259> > > > > Which confirms the same issue copying the last 256 bytes > > > on these > > > > versions, although in this case the kernel isn't > > > technically wrong > > > > because it uses a smaller overall maximum. > > > > > > Yeah, I mean that totally makes sense. When you want allow > > > byte aligned copies as well then you have to use 0x3fff00 as > > > maximum. > > > > > > The PAL code even has extra checks to speed up copies where > > > src and dst are byte aligned, but size isn't: > > > > > > if (IsPow2Aligned(srcGpuAddr, sizeof(uint32)) && > > > IsPow2Aligned(dstGpuAddr, sizeof(uint32)) && > > > (*pBytesCopied >= sizeof(uint32))) > > > .... > > > > > > So it totally makes sense to use the lower limit in Mesa but > > > not for the kernel. > > > > > > Regards, > > > Christian. > > > > >