On Wed, 2016-08-17 at 15:15 -0700, Francisco Jerez wrote: > Iago Toral <ito...@igalia.com> writes: > > > > > On Tue, 2016-08-02 at 18:27 -0700, Francisco Jerez wrote: > > > > > > Iago Toral Quiroga <ito...@igalia.com> writes: > > > > > > > > > > > > > > > --- > > > > src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 18 > > > > ++++++++++++++++++ > > > > 1 file changed, 18 insertions(+) > > > > > > > > diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp > > > > b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp > > > > index 1525a3d..4014020 100644 > > > > --- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp > > > > +++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp > > > > @@ -1497,6 +1497,24 @@ vec4_visitor::nir_emit_alu(nir_alu_instr > > > > *instr) > > > > emit(CMP(dst, op[0], brw_imm_f(0.0f), > > > > BRW_CONDITIONAL_NZ)); > > > > break; > > > > > > > > + case nir_op_d2b: { > > > > + /* two-argument instructions can't take 64-bit > > > > immediates */ > > > > + dst_reg zero = dst_reg(this, glsl_type::dvec4_type); > > > > + emit(MOV(zero, brw_imm_df(0.0))); > > > > + > > > > + dst_reg tmp = dst_reg(this, glsl_type::dvec4_type); > > > > + emit(CMP(tmp, op[0], src_reg(zero), > > > > BRW_CONDITIONAL_NZ)); > > > > + > > > > + /* Convert the double CMP result to a single boolean > > > > result. > > > > For that > > > > + * we take the low 32-bit chunk of each DF component in > > > > the > > > > result. > > > > + * and do a final MOV to honor the original writemask > > > > + */ > > > > + dst_reg result = dst_reg(this, glsl_type::bvec4_type); > > > > + emit(VEC4_OPCODE_PICK_LOW_32BIT, result, src_reg(tmp)); > > > > + emit(MOV(dst, src_reg(result))); > > > Couldn't you just do a single CMP instruction of the double- > > > precision > > > argument and a single-precision 0.0 immediate? > > It never occurred to me that we could do that but it seems to work > > just > > fine. > > > > > > > > I think you could > > > potentially also use a 32bit destination type on the CMP > > > instruction > > > so > > > you don't need to emit the PICK_LOW+MOV instructions afterwards. > > This does not seem to work though. > > > > > > > > It may > > > hit an instruction decompression bug but it could be cleaned up > > > by > > > the > > > SIMD lowering pass afterwards, AFAICT the result would be two > > > uncompressed instructions instead of the two uncompressed plus > > > two > > > compressed instructions above. > > I tried splitting the instruction just in case. Notice that doing > > this > > requires that we improve the splitting pass to support splitting > > instructions that write only one register (the original series did > > not > > implement support for that because so far the only cared to split > > DF > > instructions that wrote more than one register). I implemented a > > couple > > of quick hacks to get this done so I could test this but the result > > is > > still incorrect. > > > > For reference, if we don't split, the resulting CMP looks like this > > (after full scalarization): > > > > cmp.nz.f0(8) g8<1>.xD g6<2,2,1>.xyxyDF 0F { align16 1Q }; > > cmp.nz.f0(8) g8<1>.yD g6<2,2,1>.zwzwDF 0F { align16 1Q }; > > cmp.nz.f0(8) g8<1>.zD g6.2<0,2,1>.xyxyDF 0F { align16 1Q }; > > cmp.nz.f0(8) g8<1>.wD g6.2<0,2,1>.zwzwDF 0F { align16 1Q }; > > > > And if we split the instruction in two we produce this: > > > > cmp.nz.f0(4) g6<1>.xD g1<0,2,1>.xyxyDF 0F { align16 1N }; > > cmp.nz.f0(4) g6<1>.yD g1<0,2,1>.zwzwDF 0F { align16 1N }; > > cmp.nz.f0(4) g6<1>.zD g1.2<0,2,1>.xyxyDF 0F { align16 1N }; > > cmp.nz.f0(4) g6<1>.wD g1.2<0,2,1>.zwzwDF 0F { align16 1N }; > > cmp.nz.f0(4) g6.4<1>.xD g1<0,2,1>.xyxyDF 0F { align16 2N }; > > cmp.nz.f0(4) g6.4<1>.yD g1<0,2,1>.zwzwDF 0F { align16 2N }; > > cmp.nz.f0(4) g6.4<1>.zD g1.2<0,2,1>.xyxyDF 0F { align16 2N }; > > cmp.nz.f0(4) g6.4<1>.wD g1.2<0,2,1>.zwzwDF 0F { align16 2N }; > > > > Both sequences of instructions look correct to me as far as the > > splitting and DF regioning go, so I guess the 32-bit CMP result is > > not > > doing what we want. > > > > I am not too surprised about this. Even if we use a 32b dst type I > > suppose the instruction execution type is still 64b so I guess > > there > > should be a 64b to 32b conversion involved in writing to a 32b > > result. > > Seeing how 64b to 32b conversions require that we emit specific > > code > > using ALIGN1 mode to deal with alignment requirements I guess it is > > not > > too surprising that this does not work, right? > > > Yeah, that's not too surprising, don't worry about it. Another idea > would be to do something along the lines of 'MOV.nz null, <df source > goes here>' to check whether the source is non-zero (so you avoid > loading the immediate which is a non-trivial operation on Gen7), and > then use a single-precision SEL instruction (or two predicated MOV > instructions) to write true or false to the destination (so you don't > need the PICK_LOW+MOV sequence).
Yeah, that sounds like a good idea, I'll give it a try. Thanks! Iago > > > > I can send you a trace if you want to have a look at it. > > > > Iago > > > > > > > > > > > > > > > > > + break; > > > > + } > > > > + > > > > case nir_op_i2b: > > > > emit(CMP(dst, op[0], brw_imm_d(0), BRW_CONDITIONAL_NZ)); > > > > break; _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev