Module: Mesa Branch: staging/21.3 Commit: fa191f93db3a5afd56c3b1dd87488c29456f3122 URL: http://cgit.freedesktop.org/mesa/mesa/commit/?id=fa191f93db3a5afd56c3b1dd87488c29456f3122
Author: Lionel Landwerlin <[email protected]> Date: Wed Feb 16 23:14:15 2022 +0200 nir: fix lower_memcpy memcpy is divided into chunks that are vec4 sized max. The problem here happens with a structure of 24 bytes : struct { float3 a; float3 b; } If you memcpy that struct, the lowering will emit 2 load/store, one of sized 8, next one sized 16. But both end up located at offset 0, so we effectively drop 2 floats. Signed-off-by: Lionel Landwerlin <[email protected]> Fixes: a3177cca996145 ("nir: Add a lowering pass to lower memcpy") Reviewed-by: Jason Ekstrand <[email protected]> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15049> (cherry picked from commit 768930a73a43e48172df00b6c934de582bd9422b) --- .pick_status.json | 2 +- src/compiler/nir/nir_lower_memcpy.c | 11 +++++++---- 2 files changed, 8 insertions(+), 5 deletions(-) diff --git a/.pick_status.json b/.pick_status.json index df6040d400b..b7ce30669d9 100644 --- a/.pick_status.json +++ b/.pick_status.json @@ -787,7 +787,7 @@ "description": "nir: fix lower_memcpy", "nominated": true, "nomination_type": 1, - "resolution": 0, + "resolution": 1, "main_sha": null, "because_sha": "a3177cca9961452b436b12fd0790c6ffaa8f0eee" }, diff --git a/src/compiler/nir/nir_lower_memcpy.c b/src/compiler/nir/nir_lower_memcpy.c index b7a3f1752cb..768537a3478 100644 --- a/src/compiler/nir/nir_lower_memcpy.c +++ b/src/compiler/nir/nir_lower_memcpy.c @@ -111,11 +111,14 @@ lower_memcpy_impl(nir_function_impl *impl) uint64_t size = nir_src_as_uint(cpy->src[2]); uint64_t offset = 0; while (offset < size) { - uint64_t remaining = offset - size; - /* For our chunk size, we choose the largest power-of-two that - * divides size with a maximum of 16B (a vec4). + uint64_t remaining = size - offset; + /* Find the largest chunk size power-of-two (MSB in remaining) + * and limit our chunk to 16B (a vec4). It's important to do as + * many 16B chunks as possible first so that the index + * computation is correct for + * memcpy_(load|store)_deref_elem_imm. */ - unsigned copy_size = 1u << MIN2(ffsll(remaining) - 1, 4); + unsigned copy_size = 1u << MIN2(util_last_bit64(remaining) - 1, 4); const struct glsl_type *copy_type = copy_type_for_byte_size(copy_size);
