It's probably better to use 0x3fff00 to match Mesa and PAL. There is no
benefit in using a different limit, not even a perf benefit, and it's
better to be consistent with all UMDs.

Marek

On Wed, Sep 10, 2025 at 7:54 AM Christian König <christian.koe...@amd.com>
wrote:

> On 10.09.25 11:34, Timur Kristóf wrote:
> > On Wed, 2025-09-10 at 10:34 +0200, Christian König wrote:
> >> On 09.09.25 18:56, Timur Kristóf wrote:
> >>>> Even when we apply it I think we should drop that, the value the
> >>>> kernel uses is correct.
> >>>
> >>> Hi Christian,
> >>>
> >>> The kernel and Mesa disagree on the limits for almost all SDMA
> >>> versions, so it would be nice to actually understand what the
> >>> limits of
> >>> the SDMA HW are and use the same limit in the kernel and Mesa, or
> >>> if
> >>> that isn't viable, let's document why the different limits make
> >>> sense.
> >>>
> >>> I'm adding Marek to CC, he wrote the comment that I referenced
> >>> here.
> >>> As far as I understand from my conversation with Marek, the kernel
> >>> is
> >>> actually wrong.
> >>>
> >>> If the limits depend on alignment, then we should either set a
> >>> limit
> >>> that is always safe, or make sure SDMA copies in the kernel are
> >>> always
> >>> aligned and add assertions about it.
> >>
> >> That's already done. See the size restrictions applied to BOs and the
> >> callers of amdgpu_copy_buffer().
> >
> > Based on the code comments I cited, as far as I understand, the issue
> > is with copying the last 256 bytes of 2^22-1. Do I understood your
> > response correctly that you are saying that the kernel isn't affected
> > by this issue because it always copies things that are 256 byte
> > aligned?
>
> Yes, see the kernel always works with at least PAGE_SIZE buffers (between
> 4k and 64k depending on CPU architecture). You never get anything smaller
> than that.
>
> The only exception is the debugger interface, but that always copies <
> PAGE_SIZE, so the SDMA limits are irrelevant.
>
> > I checked the callers of amdgpu_copy_buffer and can't find what you are
> > referring to. However, assuming that all callers use amdgpu_copy_buffer
> > on 256 byte aligned addresses, there is still an issue with large BOs:
> >
> > When the kernel copies a BO that is larger than 0x3fffe0 bytes then it
> > needs to emit multiple SDMA copy packets, and the copy done by the
> > second packet (and other subsequent packets) won't be 256 byte aligned.
>
> I've just double checked the documentation and for SI it clearly states
> that the limit is 0x3fffe0. Documentation for later HW says 2^22 - 1 -
> start_addr[4:2] (which is 0x1f in the worst case so you end up with
> 0x3fffe0 again).
>
> For backward, tiled and windowed copies a 256byte bounce buffer is used
> instead of the 32byte buffer for the linear copy. So the limit is then
> 0x3fff00.
>
> Looks like that buffer is also used for byte aligned copies and that limit
> applies there as well.
>
> >>
> >> We could add another warning to amdgpu_copy_buffer(), but that is
> >> just the backend function.
> >>
> >>> Looking at the implementation of
> >>> amdgpu_copy_buffer in the kernel, I see that it relies on
> >>> copy_max_bytes and doesn't take alignment into account, so with the
> >>> current limit it could issue subsequent copies that aren't 256 byte
> >>> aligned.
> >>
> >> "Due to HW limitation, the maximum count may not be 2^n-1, can only
> >> be 2^n - 1 - start_addr[4:2]"
> >>
> >> Well according to this comments the size restriction is 32 bytes (256
> >> bits!) and not 256 bytes.
> >>
> >> Were do you actually get the 256 bytes restriction from?
> >
> > The comments I cited above are from the following sources:
> >
> > PAL uses 1<<22-256 = 0x3fff00 here:
> >
> https://github.com/GPUOpen-Drivers/pal/blob/bcec463efe5260776d486a5e3da0c549bc0a75d2/src/core/hw/ossip/oss2/oss2DmaCmdBuffer.cpp#L308
> >
> > Mesa also uses 0x3fff00 here:
> >
> https://gitlab.freedesktop.org/mesa/mesa/-/blob/47f5d25f93fd36224c112ee2d48e511ae078f8c2/src/amd/common/sid.h#L390
> >
> > The limit in Mesa was added by this commit:
> >
> https://gitlab.freedesktop.org/mesa/mesa/-/commit/2c1f249f2b61be50222411bc0d41c095004232ed
> > According to the commit message, Dave added this limit when hitting an
> > issue trying to use SDMA with buffers that are larger than this.
> >
> > For SDMA v5.2 and newer, a larger limit was added by Marek later:
> >
> https://gitlab.freedesktop.org/mesa/mesa/-/commit/a54bcb9429666fcbe38c04660cc4b3f8abbde259
> > Which confirms the same issue copying the last 256 bytes on these
> > versions, although in this case the kernel isn't technically wrong
> > because it uses a smaller overall maximum.
>
> Yeah, I mean that totally makes sense. When you want allow byte aligned
> copies as well then you have to use 0x3fff00 as maximum.
>
> The PAL code even has extra checks to speed up copies where src and dst
> are byte aligned, but size isn't:
>
>     if (IsPow2Aligned(srcGpuAddr, sizeof(uint32)) &&
>         IsPow2Aligned(dstGpuAddr, sizeof(uint32)) &&
>         (*pBytesCopied >= sizeof(uint32)))
> ....
>
> So it totally makes sense to use the lower limit in Mesa but not for the
> kernel.
>
> Regards,
> Christian.
>

Reply via email to