The UBO push analysis pass incorrectly assumed that all values would fit within a 32B chunk, and only recorded a bit for the 32B chunk containing the starting offset.
For example, if a UBO contained the following, tightly packed: vec4 a; // [0, 16) float b; // [16, 20) vec4 c; // [20, 36) then, c would start at offset 20 / 32 = 0 and end at 36 / 32 = 1, which means that we ought to record two 32B chunks in the bitfield. Similarly, dvec4s would suffer from the same problem. v2: Rewrite the accounting, my calculations were wrong. Reviewed-by: Rafael Antognolli <rafael.antogno...@intel.com> [v1] --- src/intel/compiler/brw_nir_analyze_ubo_ranges.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/src/intel/compiler/brw_nir_analyze_ubo_ranges.c b/src/intel/compiler/brw_nir_analyze_ubo_ranges.c index d58fe3dd2e3..31026ca65ba 100644 --- a/src/intel/compiler/brw_nir_analyze_ubo_ranges.c +++ b/src/intel/compiler/brw_nir_analyze_ubo_ranges.c @@ -141,10 +141,17 @@ analyze_ubos_block(struct ubo_analysis_state *state, nir_block *block) if (offset >= 64) continue; + /* The value might span multiple 32-byte chunks. */ + const int bytes = nir_intrinsic_dest_components(intrin) * + (nir_dest_bit_size(intrin->dest) / 8); + const int start = ROUND_DOWN_TO(offset_const->u32[0], 32); + const int end = ALIGN(offset_const->u32[0] + bytes, 32); + const int chunks = (end - start) / 32; + /* TODO: should we count uses in loops as higher benefit? */ struct ubo_block_info *info = get_block_info(state, block); - info->offsets |= 1ull << offset; + info->offsets |= ((1ull << chunks) - 1) << offset; info->uses[offset]++; } } -- 2.17.0 _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev