Samuel Iglesias Gonsálvez <sigles...@igalia.com> writes:

> From: "Juan A. Suarez Romero" <jasua...@igalia.com>
>
> Previous to Broadwell, we have 8 registers for MOV_INDIRECT.
>
> According to the IVB and HSW PRMs:
>
> "2.When the destination requires two registers and the sources are
>  indirect, the sources must use 1x1 regioning mode. In addition, the
>  sources must be assembled from GRF registers each accessed by adjacent
>  index registers in 1x1 regioning modes."
>
> So for DF instructions the execution size is not limited by the number
> of address registers that are available, but by the EU decompression
> logic not handling VxH indirect addressing correctly.
>
> This patch limits the SIMD width to 4 in this case.
>
> v2:
> - Fix typo (Matt).
> - Fix condition (Curro)
>
> Signed-off-by: Samuel Iglesias Gonsálvez <sigles...@igalia.com>
> Signed-off-by: Juan A. Suarez Romero <jasua...@igalia.com>
> ---
>  src/mesa/drivers/dri/i965/brw_fs.cpp | 11 ++++++++---
>  1 file changed, 8 insertions(+), 3 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs.cpp
> index a2ba0fde9fd..c9b6c995dc9 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
> @@ -4870,11 +4870,16 @@ get_lowered_simd_width(const struct gen_device_info 
> *devinfo,
>     case SHADER_OPCODE_URB_WRITE_SIMD8_MASKED_PER_SLOT:
>        return MIN2(8, inst->exec_size);
>  
> -   case SHADER_OPCODE_MOV_INDIRECT:
> -      /* Prior to Broadwell, we only have 8 address subregisters */
> +   case SHADER_OPCODE_MOV_INDIRECT: {

Let's move the second sentence you put in the comment below up here, and
quote the relevant hardware spec text from the IVB and HSW PRMs:

 "2.When the destination requires two registers and the sources are
    indirect, the sources must use 1x1 regioning mode."

> +      const unsigned max_size = (devinfo->gen >= 8 ? 2 : 1) * REG_SIZE;
> +      /* Prior to Broadwell, we only have 8 address subregisters. In case of
> +       * DF instructions in HSW/IVB, the exec_size is limited by the EU
> +       * decompression logic not handling VxH indirect addressing correctly.
> +       */
>        return MIN3(devinfo->gen >= 8 ? 16 : 8,
> -                  2 * REG_SIZE / (inst->dst.stride * 
> type_sz(inst->dst.type)),
> +                  max_size / (inst->dst.stride * type_sz(inst->dst.type)),
>                    inst->exec_size);
> +   }
>  
>     case SHADER_OPCODE_LOAD_PAYLOAD: {
>        const unsigned reg_count =
> -- 
> 2.11.0
>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Attachment: signature.asc
Description: PGP signature

_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Reply via email to