Samuel Iglesias Gonsálvez <[email protected]> writes: > From: Iago Toral Quiroga <[email protected]> > > UNIFORM_PULL_CONSTANT_LOAD is used to load a contiguous vec4 starting at a > constant offset that is 16-byte aligned. If we need to access an unaligned > offset we emit a load with an aligned offset and use the remaining constant > offset to select the component into the vec4 result that we are interested > in. This component must be computed in units of the type size, since that > is what fs_reg::set_smear expects. > > This patch does this change in the two places where we use this message: > In demote_pull_constants when we lower uniform access with constant offset > into the pull constant buffer and in UBO loads with constant offset. > --- > src/mesa/drivers/dri/i965/brw_fs.cpp | 3 ++- > src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 4 +++- > 2 files changed, 5 insertions(+), 2 deletions(-) > > diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp > b/src/mesa/drivers/dri/i965/brw_fs.cpp > index 0e69be8..dff13ea 100644 > --- a/src/mesa/drivers/dri/i965/brw_fs.cpp > +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp > @@ -2268,7 +2268,8 @@ fs_visitor::lower_constant_loads() > inst->src[i].file = VGRF; > inst->src[i].nr = dst.nr; > inst->src[i].reg_offset = 0; > - inst->src[i].set_smear(pull_index & 3); > + unsigned type_slots = MAX2(1, type_sz(inst->dst.type) / 4); > + inst->src[i].set_smear((pull_index & 3) / type_slots); >
This cannot be right, why should we care what the destination type of
the instruction is while lowering a uniform source? Also I don't think
the MAX2 call is correct because *if* type_sz(inst->dst.type) / 4 < 1
you'll force type_slots to 1 and end up interpreting the pull_index in
the wrong units. How about:
| inst->src[i].set_smear((pull_index & 3) * 4 /
| type_sz(inst->src[i].type));
> brw_mark_surface_used(prog_data, index);
> }
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> index 4cd219a..532ca65 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> @@ -2980,8 +2980,10 @@ fs_visitor::nir_emit_intrinsic(const fs_builder &bld,
> nir_intrinsic_instr *instr
> bld.emit(FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD, packed_consts,
> surf_index, const_offset_reg);
>
> + unsigned component_base =
> + (const_offset->u32[0] % 16) / MAX2(1, type_sz(dest.type));
Rather than dividing by the type size only to let set_smear multiply by
the type size again, I think it would be cleaner to do something like:
| const fs_reg consts = byte_offset(packed_consts,
const_offset->u32[0] % 16);
> for (unsigned i = 0; i < instr->num_components; i++) {
then here:
| bld.MOV(offset(dest, bld, i), component(consts, i));
and then remove the rest of the loop.
> - packed_consts.set_smear(const_offset->u32[0] % 16 / 4 + i);
> + packed_consts.set_smear(component_base + i);
>
> /* The std140 packing rules don't allow vectors to cross 16-byte
> * boundaries, and a reg is 32 bytes.
> --
> 2.5.0
>
> _______________________________________________
> mesa-dev mailing list
> [email protected]
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
signature.asc
Description: PGP signature
_______________________________________________ mesa-dev mailing list [email protected] https://lists.freedesktop.org/mailman/listinfo/mesa-dev
