On Thu, Jan 5, 2017 at 5:07 AM, Samuel Iglesias Gonsálvez <sigles...@igalia.com> wrote: > From: "Juan A. Suarez Romero" <jasua...@igalia.com> > > On Ivybridge/Valleyview, when converting a float (F) to a double > precision float (DF), the hardware automatically duplicates the source > horizontal stride, hence converting only the values in odd positions. > > This commit adds a new lowering step, exclusively for IVB/VLV, where the > sources are first copied in a temporal register with stride 2, and > then converted from this temporal register. Thus, we do not lose any > value.
Curro explained how he thinks the hardware works to me. I'll try to reproduce that description here. The FPU channels are 32-bits wide on IVB/BYT. Normally, for example when operating on 8 float channels, the FPU is given a channel of the source register to operate on, and each FPU channel produces a value which is written to the channels of the destination. But when operating on doubles, each *pair* of FPU channels operates on one (double-precision) value. Unfortunately the hardware designers didn't seem to update the input and output logic, so for instance every pair of float channels from the source region are given as input to the FPU, even though only the low (or even numbered) channel will be used. This is why it appears that the hardware doubles the stride, but it's really just ignoring all of the odd channels. A similar thing happens on output. The output elements are 64-bits (even if the output type is float), and so a destination stride of 1 means the writes are strided by 64-bits. This explains the strange looking behavior you discovered of an instruction like mov(8) gX<1>F gY<8,8,1>DF. With that understanding, we actually can read consecutive float channels and convert them to doubles in one instruction -- by using a <1,2,0> region. Each float channel is read twice, and the second read will be ignored by the FPU. So we can replace this patch with the one I have attached. A nice side effect of this is that we can simplify VEC4_OPCODE_TO_DOUBLE.
From 5637d035982c89415d47be119ad6a1c9d2e14e42 Mon Sep 17 00:00:00 2001 From: Matt Turner <matts...@gmail.com> Date: Thu, 12 Jan 2017 18:05:58 -0800 Subject: [PATCH] i965: Use source region <1,2,0> when converting to DF. Doing so allows us to use a single MOV in VEC4_OPCODE_TO_DOUBLE instead of two. --- src/mesa/drivers/dri/i965/brw_eu_emit.c | 28 +++++++++++++++++++++++- src/mesa/drivers/dri/i965/brw_vec4_generator.cpp | 13 +---------- 2 files changed, 28 insertions(+), 13 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c b/src/mesa/drivers/dri/i965/brw_eu_emit.c index 5f81b7a..ebdd557 100644 --- a/src/mesa/drivers/dri/i965/brw_eu_emit.c +++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c @@ -1101,7 +1101,6 @@ void brw_##OP(struct brw_codegen *p, \ } -ALU1(MOV) ALU2(SEL) ALU1(NOT) ALU2(AND) @@ -1135,6 +1134,33 @@ ALU2(SUBB) ROUND(RNDZ) ROUND(RNDE) +brw_inst * +brw_MOV(struct brw_codegen *p, struct brw_reg dest, struct brw_reg src0) +{ + const struct gen_device_info *devinfo = p->devinfo; + + /* When converting F->DF on IVB/BYT, every odd source channel is ignored. + * To avoid the problems that causes, we use a <1,2,0> source region to read + * each element twice. + */ + if (devinfo->gen == 7 && !devinfo->is_haswell && + brw_inst_access_mode(devinfo, p->current) == BRW_ALIGN_1 && + dest.type == BRW_REGISTER_TYPE_DF && + (src0.type == BRW_REGISTER_TYPE_F || + src0.type == BRW_REGISTER_TYPE_D || + src0.type == BRW_REGISTER_TYPE_UD) && + !has_scalar_region(src0)) { + assert(src0.vstride == BRW_VERTICAL_STRIDE_4 && + src0.width == BRW_WIDTH_4 && + src0.hstride == BRW_HORIZONTAL_STRIDE_1); + + src0.vstride = BRW_VERTICAL_STRIDE_1; + src0.width = BRW_WIDTH_2; + src0.hstride = BRW_HORIZONTAL_STRIDE_0; + } + + return brw_alu1(p, BRW_OPCODE_MOV, dest, src0); +} brw_inst * brw_ADD(struct brw_codegen *p, struct brw_reg dest, diff --git a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp index f68baab..847a01b 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp @@ -1958,18 +1958,7 @@ generate_code(struct brw_codegen *p, brw_set_default_access_mode(p, BRW_ALIGN_1); - struct brw_reg tmp = retype(dst, src[0].type); - tmp.hstride = BRW_HORIZONTAL_STRIDE_2; - tmp.width = BRW_WIDTH_4; - src[0].vstride = BRW_VERTICAL_STRIDE_4; - src[0].hstride = BRW_HORIZONTAL_STRIDE_1; - src[0].width = BRW_WIDTH_4; - brw_MOV(p, tmp, src[0]); - - tmp.vstride = BRW_VERTICAL_STRIDE_8; - tmp.hstride = BRW_HORIZONTAL_STRIDE_2; - tmp.width = BRW_WIDTH_4; - brw_MOV(p, dst, tmp); + brw_MOV(p, dst, src[0]); brw_set_default_access_mode(p, BRW_ALIGN_16); break; -- 2.7.3
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev