On 10.09.25 11:34, Timur Kristóf wrote:
> On Wed, 2025-09-10 at 10:34 +0200, Christian König wrote:
>> On 09.09.25 18:56, Timur Kristóf wrote:
>>>> Even when we apply it I think we should drop that, the value the
>>>> kernel uses is correct.
>>>
>>> Hi Christian,
>>>
>>> The kernel and Mesa disagree on the limits for almost all SDMA
>>> versions, so it would be nice to actually understand what the
>>> limits of
>>> the SDMA HW are and use the same limit in the kernel and Mesa, or
>>> if
>>> that isn't viable, let's document why the different limits make
>>> sense.
>>>
>>> I'm adding Marek to CC, he wrote the comment that I referenced
>>> here.
>>> As far as I understand from my conversation with Marek, the kernel
>>> is
>>> actually wrong.
>>>
>>> If the limits depend on alignment, then we should either set a
>>> limit
>>> that is always safe, or make sure SDMA copies in the kernel are
>>> always
>>> aligned and add assertions about it.
>>
>> That's already done. See the size restrictions applied to BOs and the
>> callers of amdgpu_copy_buffer().
> 
> Based on the code comments I cited, as far as I understand, the issue
> is with copying the last 256 bytes of 2^22-1. Do I understood your
> response correctly that you are saying that the kernel isn't affected
> by this issue because it always copies things that are 256 byte
> aligned?

Yes, see the kernel always works with at least PAGE_SIZE buffers (between 4k 
and 64k depending on CPU architecture). You never get anything smaller than 
that.

The only exception is the debugger interface, but that always copies < 
PAGE_SIZE, so the SDMA limits are irrelevant.

> I checked the callers of amdgpu_copy_buffer and can't find what you are
> referring to. However, assuming that all callers use amdgpu_copy_buffer
> on 256 byte aligned addresses, there is still an issue with large BOs:
> 
> When the kernel copies a BO that is larger than 0x3fffe0 bytes then it
> needs to emit multiple SDMA copy packets, and the copy done by the
> second packet (and other subsequent packets) won't be 256 byte aligned.

I've just double checked the documentation and for SI it clearly states that 
the limit is 0x3fffe0. Documentation for later HW says 2^22 - 1 - 
start_addr[4:2] (which is 0x1f in the worst case so you end up with 0x3fffe0 
again).

For backward, tiled and windowed copies a 256byte bounce buffer is used instead 
of the 32byte buffer for the linear copy. So the limit is then 0x3fff00.

Looks like that buffer is also used for byte aligned copies and that limit 
applies there as well.

>>
>> We could add another warning to amdgpu_copy_buffer(), but that is
>> just the backend function.
>>
>>> Looking at the implementation of
>>> amdgpu_copy_buffer in the kernel, I see that it relies on
>>> copy_max_bytes and doesn't take alignment into account, so with the
>>> current limit it could issue subsequent copies that aren't 256 byte
>>> aligned.
>>
>> "Due to HW limitation, the maximum count may not be 2^n-1, can only
>> be 2^n - 1 - start_addr[4:2]"
>>
>> Well according to this comments the size restriction is 32 bytes (256
>> bits!) and not 256 bytes.
>>
>> Were do you actually get the 256 bytes restriction from?
> 
> The comments I cited above are from the following sources:
> 
> PAL uses 1<<22-256 = 0x3fff00 here:
> https://github.com/GPUOpen-Drivers/pal/blob/bcec463efe5260776d486a5e3da0c549bc0a75d2/src/core/hw/ossip/oss2/oss2DmaCmdBuffer.cpp#L308
> 
> Mesa also uses 0x3fff00 here:
> https://gitlab.freedesktop.org/mesa/mesa/-/blob/47f5d25f93fd36224c112ee2d48e511ae078f8c2/src/amd/common/sid.h#L390
> 
> The limit in Mesa was added by this commit:
> https://gitlab.freedesktop.org/mesa/mesa/-/commit/2c1f249f2b61be50222411bc0d41c095004232ed
> According to the commit message, Dave added this limit when hitting an
> issue trying to use SDMA with buffers that are larger than this.
> 
> For SDMA v5.2 and newer, a larger limit was added by Marek later:
> https://gitlab.freedesktop.org/mesa/mesa/-/commit/a54bcb9429666fcbe38c04660cc4b3f8abbde259
> Which confirms the same issue copying the last 256 bytes on these
> versions, although in this case the kernel isn't technically wrong
> because it uses a smaller overall maximum.

Yeah, I mean that totally makes sense. When you want allow byte aligned copies 
as well then you have to use 0x3fff00 as maximum.

The PAL code even has extra checks to speed up copies where src and dst are 
byte aligned, but size isn't:

    if (IsPow2Aligned(srcGpuAddr, sizeof(uint32)) &&
        IsPow2Aligned(dstGpuAddr, sizeof(uint32)) &&
        (*pBytesCopied >= sizeof(uint32)))
....

So it totally makes sense to use the lower limit in Mesa but not for the kernel.

Regards,
Christian.

Reply via email to