On Mon, 2018-05-21 at 13:49 +0300, Eero Tamminen wrote: > Hi, > > On 21.05.2018 10:42, Iago Toral wrote: > > On Fri, 2018-05-18 at 12:08 +0300, Eero Tamminen wrote: > > > On 17.05.2018 14:25, Eero Tamminen wrote: > > > > On 17.05.2018 11:46, Iago Toral Quiroga wrote: > > > > > The PRM for MAD states that F, DF and HF are supported, > > > > > however, > > > > > then > > > > > it requires that the instruction includes a 2-bit mask > > > > > specifying > > > > > the types of each operand like this: > > > > > > > > > > > > > > 00: 32-bit float > > > > > 01: 32-bit signed integer > > > > > 10: 32-bit unsigned integer > > > > > 11: 64-bit float > > > > > > > > Where this was? > > > > This is in the decription of the MAD instruction here (for SKL): > > > > https://01.org/sites/default/files/documentation/intel-gfx-prm-osrc > > -skl > > -vol02a-commandreference-instructions.pdf > > > > It guess the contents for this were copy & pasted from previous > > PRMs > > that didn't support HF... > > Ouch. That looks pretty different from what's in here: > https://01.org/sites/default/files/documentation/intel-gfx-prm-osrc-s > kl-vol07-3d_media_gpgpu.pdf > > > > > In > > > > https://01.org/sites/default/files/documentation/intel-gfx-bspe > > > > c-os > > > > rc-chv-bsw-vol07-3d-media-gpgpu-engine.pdf > > I'll ask around who's currently maintaining the 01.org docs, and > could > she/he update the opcode docs. All of them from BSW to KBL seem to > have > the old information, while the Media GPGPU docs have newer info.
Thanks Eero! > > Btw. Did you have test-cases utilizing mad() instructions, and did > they work OK with your patchset? Yes, but they only worked because I was dropping the MAD and open coding it as MUL+ADD for HF operands. > (If yes, better test-cases may be required.) The existing tests catch the problem if I don't attempt to lower the MAD for HF, so they are good enough for this. BTW, I implemented the solution using the Src1Type and Src2Type bits and it seems to work fine in BDW as well. Iago > > > > > Section "EU Changes by Processor Generation" states: > > > > ------------------------- > > > > These features or behaviors are added for CHV, BSW, continuing > > > > to > > > > later generations: > > > > ... > > > > In the 3-source instruction format, widen the SrcType and > > > > DstType > > > > fields > > > > and add an encoding for the HF (Half Float) type. > > > > ------------------------- > > > > > > > > (I.e. it applies to GEN9+ [1] and on GEN8 for BSW/CHV.) > > > > Actually, I have just verified that the BDW PRMs have the exact > > same > > thing, but stating BDW instead of BSW/CHV, so I guess BDW should be > > supported too. > > Yes, right. > > > > > > In section "GEN Instruction Format – 3-src" table, both "Dst > > > > Type" > > > > and > > > > "Src Type" fields are 3 bits, and there's additional 1 bit > > > > "Src1 > > > > Type" > > > > and "Src2 Type" fields to differentiate formats for src1 & > > > > src2. > > > > > > > > > > > Then, when looking at "Source or Destination Operand Fields > > > > (Alphabetically by Short Name)" section: > > > > ----------------------------------------------- > > > > DstType: > > > > > > > > Encoding for three source instructions: > > > > 000b = :f. Single precision Float (32-bit). > > > > 001b = :d. Signed Doubleword integer. > > > > 010b = :ud. Unsigned Doubleword integer. > > > > 011b = :df. Double precision Float (64-bit). > > > > 100b = :hf. Half precision Float (16-bit). > > > > 101b - 111b. Reserved. > > > > > > > > ... > > > > > > > > SrcType: > > > > > > > > Three source instructions use one SrcType field for all source > > > > operands, > > > > with a 3-bit encoding that allows fewer data types: > > > > > > > > Encoding for three source instructions: > > > > 000b = :f. Single precision Float (32-bit). > > > > 001b = :d. Signed Doubleword integer. > > > > 010b = :ud. Unsigned Doubleword integer. > > > > 011b = :df. Double precision Float (64-bit). > > > > 100b = :hf. Half precision Float (16-bit). > > > > 101b - 111b. Reserved. > > > > > > > > Three source instructions can use operands with mixed-mode > > > > precision. > > > > When SrcType field is set to :f or :hf it defines precision for > > > > source 0 > > > > only, and fields Src1Type and Src2Type define precision for > > > > other > > > > source > > > > operands: > > > > 0b = :f. Single precision Float (32-bit). > > > > 1b = :hf. Half precision Float (16-bit). > > > > ----------------------------------------------- > > > > > > Note also the section "Special Requirements for Handling Mixed > > > Mode > > > Float Operations". > > Both BDW & BSW specs list also this restriction, which is lifted > GEN9: > * Calculations using the HF (Half Float) type do not support > denormals > or gradual underflow. > > > > > Btw. Mesa supporting HF with mad() is important because pln() > > > doesn't support it in HW, but you can implement pln() HF version > > > with two mad()s. > > > - Eero > > > > > [1]: SkyLake & KabyLake PRMs lists same amount of bits, but > > > > don't > > > > tell > > > > what they represent (often one needs to look into PRM for the > > > > HW > > > > where > > > > these changes were first introduced, which in this case was > > > > Braswell / > > > > Cherryview). > > > > > > > > > > > > > So 16-bit float would not be supported. The driver also > > > > > asserts > > > > > that > > > > > the types involved in ALING16 3-src operations are one of > > > > > these > > > > > (MAD is always emitted as an align16 instruction prior to > > > > > gen10). > > > > > --- > > > > > src/intel/compiler/brw_fs_nir.cpp | 9 ++++++++- > > > > > 1 file changed, 8 insertions(+), 1 deletion(-) > > > > > > > > > > diff --git a/src/intel/compiler/brw_fs_nir.cpp > > > > > b/src/intel/compiler/brw_fs_nir.cpp > > > > > index 91283ab4911..58ddc456bae 100644 > > > > > --- a/src/intel/compiler/brw_fs_nir.cpp > > > > > +++ b/src/intel/compiler/brw_fs_nir.cpp > > > > > @@ -1525,7 +1525,14 @@ fs_visitor::nir_emit_alu(const > > > > > fs_builder > > > > > &bld, > > > > > nir_alu_instr *instr) > > > > > break; > > > > > case nir_op_ffma: > > > > > - inst = bld.MAD(result, op[2], op[1], op[0]); > > > > > + /* 3-src MAD doesn't support 16-bit operands */ > > > > > + if (nir_dest_bit_size(instr->dest.dest) >= 32) { > > > > > + inst = bld.MAD(result, op[2], op[1], op[0]); > > > > > + } else { > > > > > + fs_reg tmp = bld.vgrf(BRW_REGISTER_TYPE_HF); > > > > > + bld.MUL(tmp, op[1], op[0]); > > > > > + inst = bld.ADD(result, tmp, op[2]); > > > > > + } > > > > > inst->saturate = instr->dest.saturate; > > > > > break; > > > > > > > > > > > > > _______________________________________________ > > > > mesa-dev mailing list > > > > mesa-dev@lists.freedesktop.org > > > > https://lists.freedesktop.org/mailman/listinfo/mesa-dev > > > > > > > > _______________________________________________ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev