fs: Lower arithmetic instructions with register regions of unsupported width.

Connor Abbott Wed, 05 Aug 2015 11:14:51 -0700

FWIW, both patches are:

Reviewed-by: Connor Abbott <connor.w.abb...@intel.com>


I'm working on FP64 support (I've been using no16 up till now) so this
is obviously very useful to me.

On Wed, Aug 5, 2015 at 10:38 AM, Francisco Jerez <curroje...@riseup.net> wrote:
> This extends the SIMD lowering pass to enforce the hardware limitation
> that no directly-addressed source may read more than 2 physical GRFs.
> One can easily go over this limit when doing 64-bit arithmetic
> (e.g. FP64 or extended-precision integer MULs) or SIMD32, so it's nice
> to be able to just emit an instruction of the intended execution size
> from the visitor and let the lowering pass deal with this restriction
> transparently.
>
> Some hardware arithmetic instructions are not handled here, including
> all instructions that use the accumulator implicitly (which the SIMD
> lowering pass deliberately doesn't handle), instructions with
> non-per-channel sources (e.g. LINE or PLANE) and SEND-like
> instructions, which need special handling most likely as virtual
> opcodes.
> ---
>  src/mesa/drivers/dri/i965/brw_fs.cpp | 62 
> ++++++++++++++++++++++++++++++++++++
>  1 file changed, 62 insertions(+)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs.cpp
> index f9773bd..fa5ed4f 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
> @@ -4130,6 +4130,68 @@ get_lowered_simd_width(const struct brw_device_info 
> *devinfo,
>                         const fs_inst *inst)
>  {
>     switch (inst->opcode) {
> +   case BRW_OPCODE_MOV:
> +   case BRW_OPCODE_SEL:
> +   case BRW_OPCODE_NOT:
> +   case BRW_OPCODE_AND:
> +   case BRW_OPCODE_OR:
> +   case BRW_OPCODE_XOR:
> +   case BRW_OPCODE_SHR:
> +   case BRW_OPCODE_SHL:
> +   case BRW_OPCODE_ASR:
> +   case BRW_OPCODE_CMP:
> +   case BRW_OPCODE_CMPN:
> +   case BRW_OPCODE_CSEL:
> +   case BRW_OPCODE_F32TO16:
> +   case BRW_OPCODE_F16TO32:
> +   case BRW_OPCODE_BFREV:
> +   case BRW_OPCODE_BFE:
> +   case BRW_OPCODE_BFI1:
> +   case BRW_OPCODE_BFI2:
> +   case BRW_OPCODE_ADD:
> +   case BRW_OPCODE_MUL:
> +   case BRW_OPCODE_AVG:
> +   case BRW_OPCODE_FRC:
> +   case BRW_OPCODE_RNDU:
> +   case BRW_OPCODE_RNDD:
> +   case BRW_OPCODE_RNDE:
> +   case BRW_OPCODE_RNDZ:
> +   case BRW_OPCODE_LZD:
> +   case BRW_OPCODE_FBH:
> +   case BRW_OPCODE_FBL:
> +   case BRW_OPCODE_CBIT:
> +   case BRW_OPCODE_SAD2:
> +   case BRW_OPCODE_MAD:
> +   case BRW_OPCODE_LRP:
> +   case SHADER_OPCODE_RCP:
> +   case SHADER_OPCODE_RSQ:
> +   case SHADER_OPCODE_SQRT:
> +   case SHADER_OPCODE_EXP2:
> +   case SHADER_OPCODE_LOG2:
> +   case SHADER_OPCODE_POW:
> +   case SHADER_OPCODE_INT_QUOTIENT:
> +   case SHADER_OPCODE_INT_REMAINDER:
> +   case SHADER_OPCODE_SIN:
> +   case SHADER_OPCODE_COS: {
> +      /* According to the PRMs:
> +       *  "A. In Direct Addressing mode, a source cannot span more than 2
> +       *      adjacent GRF registers.
> +       *   B. A destination cannot span more than 2 adjacent GRF registers."
> +       *
> +       * Look for the source or destination with the largest register region
> +       * which is the one that is going to limit the overal execution size of
> +       * the instruction due to this rule.
> +       */
> +      unsigned reg_count = inst->regs_written;
> +
> +      for (unsigned i = 0; i < inst->sources; i++)
> +         reg_count = MAX2(reg_count, (unsigned)inst->regs_read(i));
> +
> +      /* Calculate the maximum execution size of the instruction based on the
> +       * factor by which it goes over the hardware limit of 2 GRFs.
> +       */
> +      return inst->exec_size / DIV_ROUND_UP(reg_count, 2);
> +   }
>     case SHADER_OPCODE_MULH:
>        /* MULH is lowered to the MUL/MACH sequence using the accumulator, 
> which
>         * is 8-wide on Gen7+.
> --
> 2.4.6
>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/2] i965/fs: Lower arithmetic instructions with register regions of unsupported width.

Reply via email to