Samuel Iglesias Gonsálvez <sigles...@igalia.com> writes:

> From: "Juan A. Suarez Romero" <jasua...@igalia.com>
>
> In IVB and BYT, both regioning parameters and execution sizes are measured as
> 32-bits element size.
>
> So when we have something like:
>
> mov(8) g2<1>DF g3<4,4,1>DF
>
> We are not actually moving 8 doubles (our intention), but 4 doubles.
>
> We need to double the parameters to cope with this issue. However,
> horizontal strides don't behave as they're supposed to on IVB
> for DF regions, they will cause each 32-bit half of DF sources to be
> strided individually, and doubling the value won't make any difference.
>
> v2:
> - Use devinfo directly (Matt).
> - Use Baytrail instead of Valleview (Matt).
> - Use IvyBridge instead of Ivy (Matt)
> - Double the exec_size in code emission (Curro)
>
> v3:
> - Change hstride doubling by an assert and fix commit log (Curro).
> - Substitute remaining compiler->devinfo by devinfo (Curro).
>
> Signed-off-by: Samuel Iglesias Gonsálvez <sigles...@igalia.com>
> ---
>  src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 51 
> ++++++++++++++++++++++----
>  1 file changed, 44 insertions(+), 7 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> index 26ffbb169d2..b0d5732ac5c 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> @@ -54,13 +54,14 @@ brw_file_from_reg(fs_reg *reg)
>  }
>  
>  static struct brw_reg
> -brw_reg_from_fs_reg(fs_inst *inst, fs_reg *reg, unsigned gen, bool 
> compressed)
> +brw_reg_from_fs_reg(const struct gen_device_info *devinfo, fs_inst *inst,
> +                    fs_reg *reg, bool compressed)
>  {
>     struct brw_reg brw_reg;
>  
>     switch (reg->file) {
>     case MRF:
> -      assert((reg->nr & ~BRW_MRF_COMPR4) < BRW_MAX_MRF(gen));
> +      assert((reg->nr & ~BRW_MRF_COMPR4) < BRW_MAX_MRF(devinfo->gen));
>        /* Fallthrough */
>     case VGRF:
>        if (reg->stride == 0) {
> @@ -93,6 +94,35 @@ brw_reg_from_fs_reg(fs_inst *inst, fs_reg *reg, unsigned 
> gen, bool compressed)
>           const unsigned width = MIN2(reg_width, phys_width);
>           brw_reg = brw_vecn_reg(width, brw_file_from_reg(reg), reg->nr, 0);
>           brw_reg = stride(brw_reg, width * reg->stride, width, reg->stride);
> +         /* From the IvyBridge PRM (EU Changes by Processor Generation, page 
> 13):
> +          *  "Each DF (Double Float) operand uses an element size of 4 rather
> +          *   than 8 and all regioning parameters are twice what the values
> +          *   would be based on the true element size: ExecSize, Width,
> +          *   HorzStride, and VertStride. Each DF operand uses a pair of
> +          *   channels and all masking and swizzing should be adjusted
> +          *   appropriately."
> +          *
> +          * From the IvyBridge PRM (Special Requirements for Handling Double
> +          * Precision Data Types, page 71):
> +          *  "In Align1 mode, all regioning parameters like stride, execution
> +          *   size, and width must use the syntax of a pair of packed
> +          *   floats. The offsets for these data types must be 64-bit
> +          *   aligned. The execution size and regioning parameters are in 
> terms
> +          *   of floats."
> +          *
> +          * Summarized: when handling DF-typed arguments, ExecSize,
> +          * VertStride, and Width must be doubled, and HorzStride must be
> +          * doubled when the region is not scalar.
> +          *

The comment above seems misleading still, doubling the HorzStride value
is almost guaranteed not to give the intended result, regardless of
whether the region is scalar.  With that clarified patch is:

Reviewed-by: Francisco Jerez <curroje...@riseup.net>

> +          * It applies to BayTrail too.
> +          */
> +         if (devinfo->gen == 7 && !devinfo->is_haswell &&
> +             type_sz(reg->type) == 8) {
> +            brw_reg.width++;
> +            if (brw_reg.vstride > 0)
> +               brw_reg.vstride++;
> +            assert(brw_reg.hstride == BRW_HORIZONTAL_STRIDE_1);
> +         }
>        }
>  
>        brw_reg = retype(brw_reg, reg->type);
> @@ -1586,9 +1616,8 @@ fs_generator::generate_code(const cfg_t *cfg, int 
> dispatch_width)
>        brw_set_default_group(p, inst->group);
>  
>        for (unsigned int i = 0; i < inst->sources; i++) {
> -         src[i] = brw_reg_from_fs_reg(inst, &inst->src[i], devinfo->gen,
> -                                      compressed);
> -
> +         src[i] = brw_reg_from_fs_reg(devinfo, inst,
> +                                      &inst->src[i], compressed);
>        /* The accumulator result appears to get used for the
>         * conditional modifier generation.  When negating a UD
>         * value, there is a 33rd bit generated for the sign in the
> @@ -1599,7 +1628,8 @@ fs_generator::generate_code(const cfg_t *cfg, int 
> dispatch_width)
>               inst->src[i].type != BRW_REGISTER_TYPE_UD ||
>               !inst->src[i].negate);
>        }
> -      dst = brw_reg_from_fs_reg(inst, &inst->dst, devinfo->gen, compressed);
> +      dst = brw_reg_from_fs_reg(devinfo, inst,
> +                                &inst->dst, compressed);
>  
>        brw_set_default_access_mode(p, BRW_ALIGN_1);
>        brw_set_default_predicate_control(p, inst->predicate);
> @@ -1608,7 +1638,14 @@ fs_generator::generate_code(const cfg_t *cfg, int 
> dispatch_width)
>        brw_set_default_saturate(p, inst->saturate);
>        brw_set_default_mask_control(p, inst->force_writemask_all);
>        brw_set_default_acc_write_control(p, inst->writes_accumulator);
> -      brw_set_default_exec_size(p, cvt(inst->exec_size) - 1);
> +
> +      unsigned exec_size = inst->exec_size;
> +      if (devinfo->gen == 7 && !devinfo->is_haswell &&
> +          (get_exec_type_size(inst) == 8 || type_sz(inst->dst.type) == 8)) {
> +         exec_size *= 2;
> +      }
> +
> +      brw_set_default_exec_size(p, cvt(exec_size) - 1);
>  
>        assert(inst->force_writemask_all || inst->exec_size >= 4);
>        assert(inst->force_writemask_all || inst->group % inst->exec_size == 
> 0);
> -- 
> 2.11.0
>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Attachment: signature.asc
Description: PGP signature

_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Reply via email to