On 6/10/26 10:32, Pavel Ondračka wrote: > > r100_copy_blit() copies BOs as 1024-pixel-wide ARGB8888 blits, so one > GPU page becomes one blit row. Large copies are split into chunks of at > most 8191 rows. > > The kernel register header names the packet coordinate dwords SRC_Y_X > and DST_Y_X. In the BITBLT_MULTI description in > R5xx_Acceleration_v1.5.pdf docs, these correspond to [SRC_X1 | SRC_Y1] > and [DST_X1 | DST_Y1], which are signed 13-bit coordinates in the > -8192..8191 range. The old code kept SRC/DST_PITCH_OFFSET at the BO base > and used SRC_Y_X/DST_Y_X as the chunk address, so large BO moves could > exceed that coordinate range. > > Compute per-chunk SRC/DST_PITCH_OFFSET bases and emit zero source and > destination coordinates. r100_copy_blit() already packs > SRC/DST_PITCH_OFFSET as pitch plus base offset, so large chunk addresses > belong there rather than in the coordinate fields. > > This fixes Prison Architect corruption with 4096x4096 mipped textures > after they are evicted to GTT under memory pressure on RV530.
Wow, impressive piece of work. > Closes: https://gitlab.freedesktop.org/mesa/mesa/-/work_items/6716 > Cc: [email protected] > Signed-off-by: Pavel Ondračka <[email protected]> Acked-by: Christian König <[email protected]> Thanks a lot for digging into this, Christian. > --- > drivers/gpu/drm/radeon/r100.c | 13 +++++++++---- > 1 file changed, 9 insertions(+), 4 deletions(-) > > diff --git a/drivers/gpu/drm/radeon/r100.c b/drivers/gpu/drm/radeon/r100.c > index 3ac1a79b6f13..533215d6e9cb 100644 > --- a/drivers/gpu/drm/radeon/r100.c > +++ b/drivers/gpu/drm/radeon/r100.c > @@ -906,6 +906,7 @@ struct radeon_fence *r100_copy_blit(struct radeon_device > *rdev, > { > struct radeon_ring *ring = &rdev->ring[RADEON_RING_TYPE_GFX_INDEX]; > struct radeon_fence *fence; > + uint64_t cur_src_offset, cur_dst_offset; > uint32_t cur_pages; > uint32_t stride_bytes = RADEON_GPU_PAGE_SIZE; > uint32_t pitch; > @@ -934,6 +935,10 @@ struct radeon_fence *r100_copy_blit(struct radeon_device > *rdev, > cur_pages = 8191; > } > num_gpu_pages -= cur_pages; > + cur_src_offset = src_offset + > + (uint64_t)num_gpu_pages * RADEON_GPU_PAGE_SIZE; > + cur_dst_offset = dst_offset + > + (uint64_t)num_gpu_pages * RADEON_GPU_PAGE_SIZE; > > /* pages are in Y direction - height > page width in X direction - width */ > @@ -950,13 +955,13 @@ struct radeon_fence *r100_copy_blit(struct > radeon_device *rdev, > RADEON_DP_SRC_SOURCE_MEMORY | > RADEON_GMC_CLR_CMP_CNTL_DIS | > RADEON_GMC_WR_MSK_DIS); > - radeon_ring_write(ring, (pitch << 22) | (src_offset >> 10)); > - radeon_ring_write(ring, (pitch << 22) | (dst_offset >> 10)); > + radeon_ring_write(ring, (pitch << 22) | (cur_src_offset >> > 10)); > + radeon_ring_write(ring, (pitch << 22) | (cur_dst_offset >> > 10)); > radeon_ring_write(ring, (0x1fff) | (0x1fff << 16)); > radeon_ring_write(ring, 0); > radeon_ring_write(ring, (0x1fff) | (0x1fff << 16)); > - radeon_ring_write(ring, num_gpu_pages); > - radeon_ring_write(ring, num_gpu_pages); > + radeon_ring_write(ring, 0); > + radeon_ring_write(ring, 0); > radeon_ring_write(ring, cur_pages | (stride_pixels << 16)); > } > radeon_ring_write(ring, PACKET0(RADEON_DSTCACHE_CTLSTAT, 0)); > -- > 2.52.0 >
