On June 13, 2018 19:17:21 Kenneth Graunke <kenn...@whitecape.org> wrote:

On Wednesday, June 13, 2018 6:52:07 PM PDT Jason Ekstrand wrote:
On Wed, Jun 13, 2018 at 6:14 PM, Kenneth Graunke <kenn...@whitecape.org>
wrote:

The UBO push analysis pass incorrectly assumed that all values would fit
within a 32B chunk, and only recorded a bit for the 32B chunk containing
the starting offset.

For example, if a UBO contained the following, tightly packed:

vec4 a;  // [0, 16)
float b; // [16, 20)
vec4 c;  // [20, 36)

then, c would start at offset 20 / 32 = 0 and end at 36 / 32 = 1,
which means that we ought to record two 32B chunks in the bitfield.

Similarly, dvec4s would suffer from the same problem.

v2: Rewrite the accounting, my calculations were wrong.

Reviewed-by: Rafael Antognolli <rafael.antogno...@intel.com> [v1]
---
src/intel/compiler/brw_nir_analyze_ubo_ranges.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/src/intel/compiler/brw_nir_analyze_ubo_ranges.c
b/src/intel/compiler/brw_nir_analyze_ubo_ranges.c
index d58fe3dd2e3..31026ca65ba 100644
--- a/src/intel/compiler/brw_nir_analyze_ubo_ranges.c
+++ b/src/intel/compiler/brw_nir_analyze_ubo_ranges.c
@@ -141,10 +141,17 @@ analyze_ubos_block(struct ubo_analysis_state *state,
nir_block *block)
   if (offset >= 64)
      continue;

This should take end into account, not just offset.

I disagree, actually.  This check is to avoid left shifting by a value
larger than 64, which is undefined behavior in C.  Shifting a larger
mask by 63 and having bits fall of the end (e.g. 3ull << 64) is well
defined behavior.

Of course, that means that we'll be pushing a partial value.  That
actually works fine today, surprisingly.  In my commit message's
packed <vec4, float, vec4> example, this bug would cause us to only
push .xyz of the last vec4.  The backend compiler correctly falls
back to pull loads for the .w component.

I'm convinced. Please add a comment to that effect.


Note that this code doesn't handle the 3DSTATE_CONSTANT_XS restriction
that the sum of all four read lengths has to be <= 64.  It can't,
realistically, because it has no idea what GL is going to do with
regular push uniforms.  So, the backend has code to prune ranges to
fit in the restricted length - which can cut off parts of vectors - so
it already properly supports this case.

+         /* The value might span multiple 32-byte chunks. */
+         const int bytes = nir_intrinsic_dest_components(intrin) *
+                           (nir_dest_bit_size(intrin->dest) / 8);
+         const int start = ROUND_DOWN_TO(offset_const->u32[0], 32);
+         const int end = ALIGN(offset_const->u32[0] + bytes, 32);

You could probably use / and DIV_ROUND_UP here too.  I'm not sure which is
better TBH.

--Jason

I could, but I botched that several times.  This way rounds the start
down, and the end up, to the nearest 32B address...so we expand the
range to cover the right region, all in bytes.  Then we divide.

I figured that was simpler.

Fine by me.  R-b

--Jason


+         const int chunks = (end - start) / 32;
+
   /* TODO: should we count uses in loops as higher benefit? */

   struct ubo_block_info *info = get_block_info(state, block);
-         info->offsets |= 1ull << offset;
+         info->offsets |= ((1ull << chunks) - 1) << offset;
   info->uses[offset]++;
}
}
--
2.17.0



_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Reply via email to