Chad Versace <chad.vers...@linux.intel.com> writes:
> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp 
> b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
> index ebf8990..b5f1aae 100644
> --- a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
> @@ -348,6 +348,143 @@ vec4_visitor::emit_math(enum opcode opcode,
>  }
>  
>  void
> +vec4_visitor::emit_pack_half_2x16(dst_reg dst, src_reg src0)
> +{
> +   if (intel->gen < 7)
> +      assert(!"ir_unop_pack_half_2x16 should be lowered");
> +
> +   /* uint dst; */
> +   assert(dst.type == BRW_REGISTER_TYPE_UD);
> +
> +   /* vec2 src0; */
> +   assert(src0.type == BRW_REGISTER_TYPE_F);
> +
> +   /* uvec2 tmp;
> +    *
> +    * The PRM lists the destination type of f32to16 as W.  However, I've
> +    * experimentally confirmed on gen7 that it must be a 32-bit size, such as
> +    * UD, in align16 mode.
> +    */
> +   dst_reg tmp_dst(this, glsl_type::uvec2_type);
> +   src_reg tmp_src(tmp_dst);
> +
> +   /* tmp.xy = f32to16(src0); */
> +   tmp_dst.writemask = WRITEMASK_XY;
> +   emit(new(mem_ctx) vec4_instruction(this, BRW_OPCODE_F32TO16,
> +                                      tmp_dst, src0));
> +
> +   /* The result's high 16 bits are in the low 16 bits of the temporary
> +    * register's Y channel.  The result's low 16 bits are in the low 16 bits
> +    * of the X channel.
> +    *
> +    * In experiments on gen7 I've found the that, in the temporary register,
> +    * the hight 16 bits of the X and Y channels are zeros. This is critical

            "high"

> +    * for the SHL and OR instructions below to work as expected.
> +    */

The docs say that the high bits are unchanged.  The temporary reg will
often have already had 0 in it to begin with, but sometimes not.  Have
you confirmed that the high bits of the x channel were changed to 0 if
you had initialized them to non-zero?

> +   /* Idea for reducing the above number of registers and instructions
> +    * ----------------------------------------------------------------
> +    *
> +    * It should be possible to remove the temporary register and replace the
> +    * SHL and OR instructions above with a single MOV instruction mode in
> +    * align1 mode that uses clever register region addressing. (It is
> +    * impossible to specify the necessary register regions in align16 mode).
> +    * Unfortunately, it is difficult to emit an align1 instruction here.
> +    *
> +    * In particular, I want to do this:
> +    *
> +    *   # Give dst the form:
> +    *   #
> +    *   #    w z          y          x w z          y          x
> +    *   #  |0|0|0x0000hhhh|0x0000llll|0|0|0x0000hhhh|0x0000llll|
> +    *   #
> +    *   f32to16(8) dst<1>.xy:UD src<4;4,1>:F {align16}
> +    *
> +    *   # Transform dst into the form of packHalf2x16's output.
> +    *   #
> +    *   #    w z          y          x w z          y          x
> +    *   #  |0|0|0x00000000|0xhhhhllll|0|0|0x00000000|0xhhhhllll|
> +    *   #
> +    *   # Use width=2 in order to move the Y channel's high 16 bits
> +    *   # into the low 16 bits, thus clearing the Y channel to zero.
> +    *   #
> +    *   mov(4) dst.1<1>:UW dst.2<8;2,1>:UW {align1}
> +    */

I like the sound of this, and it would be a matter of making a new
VS_OPCODE that the generator implements.

> +}

Attachment: pgp9zCUXIpXC5.pgp
Description: PGP signature

_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Reply via email to