On Tuesday, February 27, 2018 1:38:27 PM PST Francisco Jerez wrote:
> The main motivation is to enable HDC surface opcodes on ICL which no
> longer allows the sample mask to be provided in a message header, but
> this is enabled all the way back to IVB when possible because it
> decreases the instruction count of some shaders using HDC messages
> significantly, e.g. one of the SynMark2 CSDof compute shaders
> decreases instruction count by about 40% due to the removal of header
> setup boilerplate which in turn makes a number of send message
> payloads more easily CSE-able. Shader-db results on SKL:
>
> total instructions in shared programs: 15325319 -> 15314384 (-0.07%)
> instructions in affected programs: 311532 -> 300597 (-3.51%)
> helped: 491
> HURT: 1
>
> Shader-db results on BDW where the optimization needs to be disabled
> in some cases due to hardware restrictions:
>
> total instructions in shared programs: 15604794 -> 15598028 (-0.04%)
> instructions in affected programs: 220863 -> 214097 (-3.06%)
> helped: 351
> HURT: 0
>
> The FPS of SynMark2 CSDof improves by 5.09% ±0.36% (n=10) on my SKL
> laptop with this change.
> ---
> src/intel/compiler/brw_fs.cpp | 42 +++++++++++++++++++++++++++++++++++++++++-
> 1 file changed, 41 insertions(+), 1 deletion(-)
>
> diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
> index 0b87d8ab14e..639432b4f49 100644
> --- a/src/intel/compiler/brw_fs.cpp
> +++ b/src/intel/compiler/brw_fs.cpp
> @@ -4432,6 +4432,8 @@ static void
> lower_surface_logical_send(const fs_builder &bld, fs_inst *inst, opcode op,
> const fs_reg &sample_mask)
> {
> + const gen_device_info *devinfo = bld.shader->devinfo;
> +
> /* Get the logical send arguments. */
> const fs_reg &addr = inst->src[0];
> const fs_reg &src = inst->src[1];
> @@ -4442,7 +4444,20 @@ lower_surface_logical_send(const fs_builder &bld,
> fs_inst *inst, opcode op,
> /* Calculate the total number of components of the payload. */
> const unsigned addr_sz = inst->components_read(0);
> const unsigned src_sz = inst->components_read(1);
> - const unsigned header_sz = (sample_mask.file == BAD_FILE ? 0 : 1);
> + /* From the BDW PRM Volume 7, page 147:
> + *
> + * "For the Data Cache Data Port*, the header must be present for the
> + * following message types: [...] Typed read/write/atomics"
> + *
> + * Earlier generations have a similar wording. Because of this
> restriction
> + * we don't attempt to implement sample masks via predication for such
> + * messages prior to Gen9, since we have to provide a header anyway. On
> + * Gen11+ the header has been removed so we can only use predication.
> + */
> + const unsigned header_sz = devinfo->gen < 9 &&
> + (op == SHADER_OPCODE_TYPED_SURFACE_READ ||
> + op == SHADER_OPCODE_TYPED_SURFACE_WRITE ||
> + op == SHADER_OPCODE_TYPED_ATOMIC) ? 1 : 0;
> const unsigned sz = header_sz + addr_sz + src_sz;
>
> /* Allocate space for the payload. */
> @@ -4462,6 +4477,31 @@ lower_surface_logical_send(const fs_builder &bld,
> fs_inst *inst, opcode op,
>
> bld.LOAD_PAYLOAD(payload, components, sz, header_sz);
>
> + /* Predicate the instruction on the sample mask if no header is
> + * provided.
> + */
> + if (!header_sz && sample_mask.file != BAD_FILE &&
> + sample_mask.file != IMM) {
> + const fs_builder ubld = bld.group(1, 0).exec_all();
> + if (inst->predicate) {
> + assert(inst->predicate == BRW_PREDICATE_NORMAL);
> + assert(!inst->predicate_inverse);
> + assert(inst->flag_subreg < 2);
> + /* Combine the sample mask with the existing predicate by using a
> + * vertical predication mode.
> + */
> + inst->predicate = BRW_PREDICATE_ALIGN1_ALLV;
> + ubld.MOV(retype(brw_flag_subreg(inst->flag_subreg + 2),
> + sample_mask.type),
> + sample_mask);I was surprised to see flag_subreg remain unchanged here, but then I re-read how allv works, and it does f0.0 & f1.0, or f0.1 & f1.1. So, we can leave it as 0 or 1 and it'll implicitly use 2 or 3 as well. Series is: Reviewed-by: Kenneth Graunke <[email protected]> > + } else { > + inst->flag_subreg = 2; > + inst->predicate = BRW_PREDICATE_NORMAL; > + ubld.MOV(retype(brw_flag_subreg(inst->flag_subreg), > sample_mask.type), > + sample_mask); > + } > + } > + > /* Update the original instruction. */ > inst->opcode = op; > inst->mlen = header_sz + (addr_sz + src_sz) * inst->exec_size / 8; >
signature.asc
Description: This is a digitally signed message part.
_______________________________________________ mesa-dev mailing list [email protected] https://lists.freedesktop.org/mailman/listinfo/mesa-dev
