On Sat, Sep 13, 2025 at 1:28 AM <timur.kris...@gmail.com> wrote: > > On Fri, 2025-09-12 at 15:38 -0400, Alex Deucher wrote: > > On Thu, Sep 11, 2025 at 2:18 PM Alex Deucher <alexdeuc...@gmail.com> > > wrote: > > > > > > On Thu, Sep 11, 2025 at 1:25 PM Alex Deucher > > > <alexander.deuc...@amd.com> wrote: > > > > > > > > SDMA 5.2.x has increased transfer limits. > > > > > > > > v2: fix harder, use shifts to make it more obvious > > > > > > > > Signed-off-by: Alex Deucher <alexander.deuc...@amd.com> > > > > --- > > > > drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c | 4 ++-- > > > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c > > > > b/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c > > > > index a8e39df29f343..bf227eadbe487 100644 > > > > --- a/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c > > > > +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c > > > > @@ -2065,11 +2065,11 @@ static void > > > > sdma_v5_2_emit_fill_buffer(struct amdgpu_ib *ib, > > > > } > > > > > > > > static const struct amdgpu_buffer_funcs sdma_v5_2_buffer_funcs = > > > > { > > > > - .copy_max_bytes = 0x400000, > > > > + .copy_max_bytes = 1 << 30, > > > > .copy_num_dw = 7, > > > > .emit_copy_buffer = sdma_v5_2_emit_copy_buffer, > > > > > > > > - .fill_max_bytes = 0x400000, > > > > + .fill_max_bytes = 1 << 30, > > > > > > The hw docs and PAL differ here. I've asked the hw designers to > > > clarify. > > > > The HW team verified that the hardware supports the extended range > > for > > both copies and fills. > > > > Alex > > Hi Alex, > > This is still pretty confusing. > According to PAL, only SDMA v6 has the extended range for fills, and it > can do 4 bytes fewer. > > Are you sure that PAL is wrong about this?
I can talk to the PAL team as well. I talked to the hardware designers and they verified that the hardware has the higher limit. It's the same underlying hardware so it makes sense that both copies and fills would have the same limit. > > For reference: > https://github.com/GPUOpen-Drivers/pal/blob/dev/src/core/hw/gfxip/sdma/gfx10/gfx10DmaCmdBuffer.cpp > https://github.com/GPUOpen-Drivers/pal/blob/dev/src/core/hw/gfxip/sdma/gfx12/gfx12DmaCmdBuffer.cpp > > MaxCopySize on GFX10: 1 << 22 > MaxCopySize on GFX10.3+: 1 << 30 > > MaxFillSize on GFX10-10.3: (1 << 22 - 1) & ~3 > MaxFillSize on GFX11+: (1 << 30 - 1) & ~3 > This makes sense because they program the count field in the packet > using the byte count minus four. They are setting up the packet for dword fill rather than byte fill so count becomes dword aligned: // Because we will set fillsize = 2, the low two bits of our "count" are ignored, but we still program // this in terms of bytes. Alex