compiler: implement 16-bit multiply-add

Iago Toral Tue, 22 May 2018 04:58:27 -0700

On Mon, 2018-05-21 at 13:49 +0300, Eero Tamminen wrote:
> Hi,
> 
> On 21.05.2018 10:42, Iago Toral wrote:
> > On Fri, 2018-05-18 at 12:08 +0300, Eero Tamminen wrote:
> > > On 17.05.2018 14:25, Eero Tamminen wrote:
> > > > On 17.05.2018 11:46, Iago Toral Quiroga wrote:
> > > > > The PRM for MAD states that F, DF and HF are supported,
> > > > > however,
> > > > > then
> > > > > it requires that the instruction includes a 2-bit mask
> > > > > specifying
> > > > > the types of each operand like this:
> > > > 
> > > >   >
> > > > > 00: 32-bit float
> > > > > 01: 32-bit signed integer
> > > > > 10: 32-bit unsigned integer
> > > > > 11: 64-bit float
> > > > 
> > > > Where this was?
> > 
> > This is in the decription of the MAD instruction here (for SKL):
> > 
> > https://01.org/sites/default/files/documentation/intel-gfx-prm-osrc
> > -skl
> > -vol02a-commandreference-instructions.pdf
> > 
> > It guess the contents for this were copy & pasted from previous
> > PRMs
> > that didn't support HF...
> 
> Ouch.  That looks pretty different from what's in here:
> https://01.org/sites/default/files/documentation/intel-gfx-prm-osrc-s
> kl-vol07-3d_media_gpgpu.pdf
> 
> > > > In
> > > > https://01.org/sites/default/files/documentation/intel-gfx-bspe
> > > > c-os
> > > > rc-chv-bsw-vol07-3d-media-gpgpu-engine.pdf
> 
> I'll ask around who's currently maintaining the 01.org docs, and
> could 
> she/he update the opcode docs. All of them from BSW to KBL seem to
> have 
> the old information, while the Media GPGPU docs have newer info.


Thanks Eero!

> 
> Btw.  Did you have test-cases utilizing mad() instructions, and did
> they work OK with your patchset?

Yes, but they only worked because I was dropping the MAD and open
coding it as MUL+ADD for HF operands.

> (If yes, better test-cases may be required.)

The existing tests catch the problem if I don't attempt to lower the
MAD for HF, so they are good enough for this.

BTW, I implemented the solution using the Src1Type and Src2Type bits
and it seems to work fine in BDW as well.

Iago

> 
> > > > Section "EU Changes by Processor Generation" states:
> > > > -------------------------
> > > > These features or behaviors are added for CHV, BSW, continuing
> > > > to
> > > > later generations:
> > > > ...
> > > > In the 3-source instruction format, widen the SrcType and
> > > > DstType
> > > > fields
> > > > and add an encoding for the HF (Half Float) type.
> > > > -------------------------
> > > > 
> > > > (I.e. it applies to GEN9+ [1] and on GEN8 for BSW/CHV.)
> > 
> > Actually, I have just verified that the BDW PRMs have the exact
> > same
> > thing, but stating BDW instead of BSW/CHV, so I guess BDW should be
> > supported too.
> 
> Yes, right.
> 
> 
> > > > In section "GEN Instruction Format – 3-src" table, both "Dst
> > > > Type"
> > > > and
> > > > "Src Type" fields are 3 bits, and there's additional 1 bit
> > > > "Src1
> > > > Type"
> > > > and "Src2 Type" fields to differentiate formats for src1 &
> > > > src2.
> > > 
> > >   >
> > > > Then, when looking at "Source or Destination Operand Fields
> > > > (Alphabetically by Short Name)" section:
> > > > -----------------------------------------------
> > > > DstType:
> > > > 
> > > > Encoding for three source instructions:
> > > > 000b = :f. Single precision Float (32-bit).
> > > > 001b = :d. Signed Doubleword integer.
> > > > 010b = :ud. Unsigned Doubleword integer.
> > > > 011b = :df. Double precision Float (64-bit).
> > > > 100b = :hf. Half precision Float (16-bit).
> > > > 101b - 111b. Reserved.
> > > > 
> > > > ...
> > > > 
> > > > SrcType:
> > > > 
> > > > Three source instructions use one SrcType field for all source
> > > > operands,
> > > > with a 3-bit encoding that allows fewer data types:
> > > > 
> > > > Encoding for three source instructions:
> > > > 000b = :f. Single precision Float (32-bit).
> > > > 001b = :d. Signed Doubleword integer.
> > > > 010b = :ud. Unsigned Doubleword integer.
> > > > 011b = :df. Double precision Float (64-bit).
> > > > 100b = :hf. Half precision Float (16-bit).
> > > > 101b - 111b. Reserved.
> > > > 
> > > > Three source instructions can use operands with mixed-mode
> > > > precision.
> > > > When SrcType field is set to :f or :hf it defines precision for
> > > > source 0
> > > > only, and fields Src1Type and Src2Type define precision for
> > > > other
> > > > source
> > > > operands:
> > > > 0b = :f. Single precision Float (32-bit).
> > > > 1b = :hf. Half precision Float (16-bit).
> > > > -----------------------------------------------
> > > 
> > > Note also the section "Special Requirements for Handling Mixed
> > > Mode
> > > Float Operations".
> 
> Both BDW & BSW specs list also this restriction, which is lifted
> GEN9:
> * Calculations using the HF (Half Float) type do not support
> denormals 
> or gradual underflow.
> 
> 
> > > Btw. Mesa supporting HF with mad() is important because pln()
> > > doesn't support it in HW, but you can implement pln() HF version
> > > with two mad()s.
> 
> 
>       - Eero
> 
> > > > [1]: SkyLake & KabyLake PRMs lists same amount of bits, but
> > > > don't
> > > > tell
> > > > what they represent (often one needs to look into PRM for the
> > > > HW
> > > > where
> > > > these changes were first introduced, which in this case was
> > > > Braswell /
> > > > Cherryview).
> > > > 
> > > > 
> > > > > So 16-bit float would not be supported.  The driver also
> > > > > asserts
> > > > > that
> > > > > the types involved in ALING16 3-src operations are one of
> > > > > these
> > > > > (MAD is always emitted as an align16 instruction prior to
> > > > > gen10).
> > > > > ---
> > > > >    src/intel/compiler/brw_fs_nir.cpp | 9 ++++++++-
> > > > >    1 file changed, 8 insertions(+), 1 deletion(-)
> > > > > 
> > > > > diff --git a/src/intel/compiler/brw_fs_nir.cpp
> > > > > b/src/intel/compiler/brw_fs_nir.cpp
> > > > > index 91283ab4911..58ddc456bae 100644
> > > > > --- a/src/intel/compiler/brw_fs_nir.cpp
> > > > > +++ b/src/intel/compiler/brw_fs_nir.cpp
> > > > > @@ -1525,7 +1525,14 @@ fs_visitor::nir_emit_alu(const
> > > > > fs_builder
> > > > > &bld,
> > > > > nir_alu_instr *instr)
> > > > >          break;
> > > > >       case nir_op_ffma:
> > > > > -      inst = bld.MAD(result, op[2], op[1], op[0]);
> > > > > +      /* 3-src MAD doesn't support 16-bit operands */
> > > > > +      if (nir_dest_bit_size(instr->dest.dest) >= 32) {
> > > > > +         inst = bld.MAD(result, op[2], op[1], op[0]);
> > > > > +      } else {
> > > > > +         fs_reg tmp = bld.vgrf(BRW_REGISTER_TYPE_HF);
> > > > > +         bld.MUL(tmp, op[1], op[0]);
> > > > > +         inst = bld.ADD(result, tmp, op[2]);
> > > > > +      }
> > > > >          inst->saturate = instr->dest.saturate;
> > > > >          break;
> > > > > 
> > > > 
> > > > _______________________________________________
> > > > mesa-dev mailing list
> > > > mesa-dev@lists.freedesktop.org
> > > > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> > > 
> > > 
> 
> _______________________________________________
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 09/22] intel/compiler: implement 16-bit multiply-add

Reply via email to