Kenneth Graunke <kenn...@whitecape.org> writes: > On Tuesday, May 24, 2016 5:27:59 PM PDT Francisco Jerez wrote: >> Jason Ekstrand <ja...@jlekstrand.net> writes: >> >> > On Tue, May 24, 2016 at 12:18 AM, Francisco Jerez <curroje...@riseup.net> >> > wrote: >> > >> >> Due to a Gen7-specific hardware bug native 32-wide instructions get >> >> the lower 16 bits of the execution mask applied incorrectly to both >> >> halves of the instruction, so the MOV trick we currently use wouldn't >> >> work. Instead emit multiple 16-wide MOV instructions in 32-wide mode >> >> in order to cover the whole execution mask. >> >> --- >> >> src/mesa/drivers/dri/i965/brw_eu_emit.c | 25 +++++++++++++++++-------- >> >> 1 file changed, 17 insertions(+), 8 deletions(-) >> >> >> >> diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c >> >> b/src/mesa/drivers/dri/i965/brw_eu_emit.c >> >> index af7caed..d36877c 100644 >> >> --- a/src/mesa/drivers/dri/i965/brw_eu_emit.c >> >> +++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c >> >> @@ -3330,6 +3330,7 @@ void >> >> brw_find_live_channel(struct brw_codegen *p, struct brw_reg dst) >> >> { >> >> const struct brw_device_info *devinfo = p->devinfo; >> >> + const unsigned exec_size = 1 << brw_inst_exec_size(devinfo, >> >> p->current); >> >> brw_inst *inst; >> >> >> >> assert(devinfo->gen >= 7); >> >> @@ -3359,15 +3360,23 @@ brw_find_live_channel(struct brw_codegen *p, >> >> struct brw_reg dst) >> >> >> >> brw_MOV(p, flag, brw_imm_ud(0)); >> >> >> >> - /* Run a 16-wide instruction returning zero with execution >> >> masking >> >> - * and a conditional modifier enabled in order to get the >> >> current >> >> - * execution mask in f1.0. >> >> + /* Run enough instructions returning zero with execution masking >> >> and >> >> + * a conditional modifier enabled in order to get the full >> >> execution >> >> + * mask in f1.0. We could use a single 32-wide move here if it >> >> + * weren't because of the hardware bug that causes channel >> >> enables to >> >> + * be applied incorrectly to the second half of 32-wide >> >> instructions >> >> + * on Gen7. >> >> */ >> >> - inst = brw_MOV(p, brw_null_reg(), brw_imm_ud(0)); >> >> - brw_inst_set_exec_size(devinfo, inst, BRW_EXECUTE_16); >> >> - brw_inst_set_mask_control(devinfo, inst, BRW_MASK_ENABLE); >> >> - brw_inst_set_cond_modifier(devinfo, inst, BRW_CONDITIONAL_Z); >> >> - brw_inst_set_flag_reg_nr(devinfo, inst, 1); >> >> + const unsigned lower_size = MIN2(16, exec_size); >> >> + for (unsigned i = 0; i < exec_size / lower_size; i++) { >> >> + inst = brw_MOV(p, retype(brw_null_reg(), >> >> BRW_REGISTER_TYPE_UW), >> >> + brw_imm_uw(0)); >> >> >> > >> > Is there a reason this is changing from D to UW? >> > >> >> It's likely to have lower execution latency than an instruction with >> 32-bit integer execution type. It shouldn't have any practical >> implications other than that, the result of the instruction is only used >> to set bits of the flag register. > > I've never heard anything about them having different latencies. > That doesn't mean that you're wrong, though. :) > AFAIUI the FPU pipeline is 4-wide (i.e. it can process four elements per clock at a given stage of the pipeline) when the execution type is F/D/UD, 8-wide when it is HF/W/UW, and 2-wide when it is DF/Q/UQ (this is not accounting for hybrid-issue and such). Other than that if the execution type is D the instructions would have to be compressed when the execution size of the FIND_LIVE_CHANNEL instruction is 16 or 32.
> --Ken
signature.asc
Description: PGP signature
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev