Module: Mesa Branch: main Commit: 4420251947443e5f29ecc702900e560e66e73f0e URL: http://cgit.freedesktop.org/mesa/mesa/commit/?id=4420251947443e5f29ecc702900e560e66e73f0e
Author: Francisco Jerez <[email protected]> Date: Wed Oct 19 16:13:24 2022 -0700 intel/rt: Fix L3 bank performance bottlenecks due to SW stack stride alignment. Power-of-two SW stack sizes are prone to causing collisions in the hashing function used by the L3 to map memory addresses to banks, which can cause stack accesses from most DSSes to bottleneck on a single L3 bank. Fix it by padding the SW stack stride by a single cacheline if it was a power of two. This has been reported by Felix DeGrood to improve Quake2 RTX performance by ~30% on DG2-512 in combination with other RT patches Lionel Landwerlin has been working on. Many thanks to Felix DeGrood for doing much of the legwork and providing several iterations of Q2RTX performance counter dumps which eventually prompted me to consider the hash collision theory and motivated this patch, and for providing additional performance counter dumps confirming that there is no longer an appreciable imbalance in traffic across L3 banks after this change. Reviewed-by: Lionel Landwerlin <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21461> --- src/intel/compiler/brw_rt.h | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/src/intel/compiler/brw_rt.h b/src/intel/compiler/brw_rt.h index d03187636f6..15c024072f1 100644 --- a/src/intel/compiler/brw_rt.h +++ b/src/intel/compiler/brw_rt.h @@ -230,6 +230,18 @@ brw_rt_compute_scratch_layout(struct brw_rt_scratch_layout *layout, assert(size % 64 == 0); layout->sw_stack_start = size; layout->sw_stack_size = ALIGN(sw_stack_size, 64); + + /* Currently it's always the case that sw_stack_size is a power of + * two, but power-of-two SW stack sizes are prone to causing + * collisions in the hashing function used by the L3 to map memory + * addresses to banks, which can cause stack accesses from most + * DSSes to bottleneck on a single L3 bank. Fix it by padding the + * SW stack by a single cacheline if it was a power of two. + */ + if (layout->sw_stack_size > 64 && + util_is_power_of_two_nonzero(layout->sw_stack_size)) + layout->sw_stack_size += 64; + size += num_stack_ids * layout->sw_stack_size; layout->total_size = size;
