On Thu, Jan 5, 2017 at 5:07 AM, Samuel Iglesias Gonsálvez
<sigles...@igalia.com> wrote:
> From: "Juan A. Suarez Romero" <jasua...@igalia.com>
>
> On Ivybridge/Valleyview, when converting a float (F) to a double
> precision float (DF), the hardware automatically duplicates the source
> horizontal stride, hence converting only the values in odd positions.
>
> This commit adds a new lowering step, exclusively for IVB/VLV, where the
> sources are first copied in a temporal register with stride 2, and
> then converted from this temporal register. Thus, we do not lose any
> value.

Curro explained how he thinks the hardware works to me. I'll try to
reproduce that description here.

The FPU channels are 32-bits wide on IVB/BYT. Normally, for example
when operating on 8 float channels, the FPU is given a channel of the
source register to operate on, and each FPU channel produces a value
which is written to the channels of the destination.

But when operating on doubles, each *pair* of FPU channels operates on
one (double-precision) value. Unfortunately the hardware designers
didn't seem to update the input and output logic, so for instance
every pair of float channels from the source region are given as input
to the FPU, even though only the low (or even numbered) channel will
be used. This is why it appears that the hardware doubles the stride,
but it's really just ignoring all of the odd channels.

A similar thing happens on output. The output elements are 64-bits
(even if the output type is float), and so a destination stride of 1
means the writes are strided by 64-bits. This explains the strange
looking behavior you discovered of an instruction like mov(8) gX<1>F
gY<8,8,1>DF.

With that understanding, we actually can read consecutive float
channels and convert them to doubles in one instruction -- by using a
<1,2,0> region. Each float channel is read twice, and the second read
will be ignored by the FPU.

So we can replace this patch with the one I have attached. A nice side
effect of this is that we can simplify VEC4_OPCODE_TO_DOUBLE.
From 5637d035982c89415d47be119ad6a1c9d2e14e42 Mon Sep 17 00:00:00 2001
From: Matt Turner <matts...@gmail.com>
Date: Thu, 12 Jan 2017 18:05:58 -0800
Subject: [PATCH] i965: Use source region <1,2,0> when converting to DF.

Doing so allows us to use a single MOV in VEC4_OPCODE_TO_DOUBLE instead
of two.
---
 src/mesa/drivers/dri/i965/brw_eu_emit.c          | 28 +++++++++++++++++++++++-
 src/mesa/drivers/dri/i965/brw_vec4_generator.cpp | 13 +----------
 2 files changed, 28 insertions(+), 13 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c b/src/mesa/drivers/dri/i965/brw_eu_emit.c
index 5f81b7a..ebdd557 100644
--- a/src/mesa/drivers/dri/i965/brw_eu_emit.c
+++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c
@@ -1101,7 +1101,6 @@ void brw_##OP(struct brw_codegen *p,					      \
 }
 
 
-ALU1(MOV)
 ALU2(SEL)
 ALU1(NOT)
 ALU2(AND)
@@ -1135,6 +1134,33 @@ ALU2(SUBB)
 ROUND(RNDZ)
 ROUND(RNDE)
 
+brw_inst *
+brw_MOV(struct brw_codegen *p, struct brw_reg dest, struct brw_reg src0)
+{
+   const struct gen_device_info *devinfo = p->devinfo;
+
+   /* When converting F->DF on IVB/BYT, every odd source channel is ignored.
+    * To avoid the problems that causes, we use a <1,2,0> source region to read
+    * each element twice.
+    */
+   if (devinfo->gen == 7 && !devinfo->is_haswell &&
+       brw_inst_access_mode(devinfo, p->current) == BRW_ALIGN_1 &&
+       dest.type == BRW_REGISTER_TYPE_DF &&
+       (src0.type == BRW_REGISTER_TYPE_F ||
+        src0.type == BRW_REGISTER_TYPE_D ||
+        src0.type == BRW_REGISTER_TYPE_UD) &&
+       !has_scalar_region(src0)) {
+      assert(src0.vstride == BRW_VERTICAL_STRIDE_4 &&
+             src0.width == BRW_WIDTH_4 &&
+             src0.hstride == BRW_HORIZONTAL_STRIDE_1);
+
+      src0.vstride = BRW_VERTICAL_STRIDE_1;
+      src0.width = BRW_WIDTH_2;
+      src0.hstride = BRW_HORIZONTAL_STRIDE_0;
+   }
+
+   return brw_alu1(p, BRW_OPCODE_MOV, dest, src0);
+}
 
 brw_inst *
 brw_ADD(struct brw_codegen *p, struct brw_reg dest,
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
index f68baab..847a01b 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
@@ -1958,18 +1958,7 @@ generate_code(struct brw_codegen *p,
 
          brw_set_default_access_mode(p, BRW_ALIGN_1);
 
-         struct brw_reg tmp = retype(dst, src[0].type);
-         tmp.hstride = BRW_HORIZONTAL_STRIDE_2;
-         tmp.width = BRW_WIDTH_4;
-         src[0].vstride = BRW_VERTICAL_STRIDE_4;
-         src[0].hstride = BRW_HORIZONTAL_STRIDE_1;
-         src[0].width = BRW_WIDTH_4;
-         brw_MOV(p, tmp, src[0]);
-
-         tmp.vstride = BRW_VERTICAL_STRIDE_8;
-         tmp.hstride = BRW_HORIZONTAL_STRIDE_2;
-         tmp.width = BRW_WIDTH_4;
-         brw_MOV(p, dst, tmp);
+         brw_MOV(p, dst, src[0]);
 
          brw_set_default_access_mode(p, BRW_ALIGN_16);
          break;
-- 
2.7.3

_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Reply via email to