On 04/03/17 01:26, Francisco Jerez wrote: > Samuel Iglesias Gonsálvez <sigles...@igalia.com> writes: > >> From: Matt Turner <matts...@gmail.com> >> >> Otherwise for a pack_double_2x32_split opcode, we emit: >> >> vec1 64 ssa_135 = pack_double_2x32_split ssa_133, ssa_134 >> mov(8) g5<1>UD g5<4>.xUD { align16 1Q >> compacted }; >> mov(8) g7<2>UD g5<4,4,1>UD { align1 1Q >> }; >> ERROR: When the destination spans two registers, the source must >> span two registers >> (exceptions for scalar source and packed-word to packed-dword >> expansion) >> mov(8) g8<2>UD g5.4<4,4,1>UD { align1 2N >> }; >> ERROR: The offset from the two source registers must be the same >> mov(8) g5<1>UD g6<4>.xUD { align16 1Q >> compacted }; >> mov(8) g7.1<2>UD g5<4,4,1>UD { align1 1Q >> }; >> ERROR: When the destination spans two registers, the source must >> span two registers >> (exceptions for scalar source and packed-word to packed-dword >> expansion) >> mov(8) g8.1<2>UD g5.4<4,4,1>UD { align1 2N >> }; >> ERROR: The offset from the two source registers must be the same >> >> The intention was to emit mov(4)s for the instructions that have ERROR >> annotations. >> >> See tests/spec/arb_gpu_shader_fp64/execution/vs-isinf-dvec.shader_test >> for example. >> >> Reviewed-by: Samuel Iglesias Gonsálvez <sigles...@igalia.com> >> --- >> src/mesa/drivers/dri/i965/brw_vec4_generator.cpp | 2 ++ >> 1 file changed, 2 insertions(+) >> >> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp >> b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp >> index b570792badd..f6034bc8b76 100644 >> --- a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp >> +++ b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp >> @@ -2025,6 +2025,7 @@ generate_code(struct brw_codegen *p, >> assert(type_sz(dst.type) == 8); >> >> brw_set_default_access_mode(p, BRW_ALIGN_1); >> + brw_set_default_exec_size(p, BRW_EXECUTE_4); >> > > NAK, we're missing a bug elsewhere if the exec_size coming in from the > IR is not accurate. You don't happen to be doubling the execution size > of this single-precision instruction, do you? >
The exec_size from the IR is correct and Matt's patch too. The problem is that the original instruction (VEC4_OPCODE_SET_{HIGH,LOW}_32BIT) is a double-precision one because dst is DF, so we doubled the exec_size at the beginning of this function -generate_code()-. However, this opcode reads 32-bits elements from source and write them into the respective 32-bit position (low or high) of the dst, so we retyped the destination to UD to do it (see below), hence this instruction is actually a 32-bit one. We can either use Matt patch or add an exception to the doubling exec_size condition which is at the beginning of this function. In either case, I will add a comment explaining this, something like: /* Don't double exec_size for VEC4_OPCODE_SET_{HIGH,LOW}_32BIT * because dst is retyped to a 32-bit type to copy source's values, * so the instruction is effectively a 32-bit one. */ Which solution do you prefer? Sam >> dst = retype(dst, BRW_REGISTER_TYPE_UD); >> if (inst->opcode == VEC4_OPCODE_SET_HIGH_32BIT) >> @@ -2037,6 +2038,7 @@ generate_code(struct brw_codegen *p, >> src[0].hstride = BRW_HORIZONTAL_STRIDE_1; >> brw_MOV(p, dst, src[0]); >> >> + brw_set_default_exec_size(p, BRW_EXECUTE_8); >> brw_set_default_access_mode(p, BRW_ALIGN_16); >> break; >> } >> -- >> 2.11.0 >> >> _______________________________________________ >> mesa-dev mailing list >> mesa-dev@lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
signature.asc
Description: OpenPGP digital signature
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev