On 23.08.2017 16:00, Roland Scheidegger wrote:
Am 23.08.2017 um 15:08 schrieb Nicolai Hähnle:
On 22.08.2017 22:39, Roland Scheidegger wrote:
Am 22.08.2017 um 19:10 schrieb Marek Olšák:
Hi,

I'd like to discuss 16-bit float and integer support in TGSI. I'm
proposing this:

   struct tgsi_instruction
   {
      unsigned Type       : 4;  /* TGSI_TOKEN_TYPE_INSTRUCTION */
      unsigned NrTokens   : 8;  /* UINT */
      unsigned Opcode     : 8;  /* TGSI_OPCODE_ */
      unsigned Saturate   : 1;  /* BOOL */
      unsigned NumDstRegs : 2;  /* UINT */
      unsigned NumSrcRegs : 4;  /* UINT */
      unsigned Label      : 1;
      unsigned Texture    : 1;
      unsigned Memory     : 1;
      unsigned Precise    : 1;
-   unsigned Padding    : 1;
+   unsigned HalfPrecision : 1;
   };

There won't be any 16-bit TEMPs in TGSI, but each instruction will
have the HalfPrecision flag, which is a hint for drivers that they can
use a 16-bit opcode. Even texture, load, and store instructions can
set HalfPrecision, which means they can accept and return 16-bit
values.

The catch is that drivers will have to insert 16-bit <-> 32-bit
conversions manually, because they won't be present in TGSI. The
advantage is that we don't have to add 200 new opcodes for the 3 new
16-bit types.

What do you think?


Flagging instructions as 16bit doesn't look too bad to me, but I'm
wondering if this isn't a bit problematic wrt register files. Clearly,
this is a restriction of tgsi "everything is a 32x4 value". Doubles, of
course, have a similar problem, but in the end they still have
well-defined interactions with the register files, because it's defined
what bits ultimately represent a 64bit value (at least in theory from
tgsi's point of view, it is perfectly valid to use some 32bit
calculations to set some reg, then just use double instructions directly
without conversion on these values - it may not be meaningful but it is
well defined).
But it looks like you want to avoid to have a well-defined mapping of
the registers to 16bit types (and with 16 bits instruction just being
hints, I can't see how it could exist).
Note that being able to flag instructions as HalfPrecision does not
necessarily mean you can't have any explicit 16bit conversion
instructions too.

Those already exist: PK2H and UP2H. Or did you have something else in mind?

More generally, there are really two use cases for this, and we need to
be careful not to mix them up:

- transparent downgrading to 16-bit of lowp and mediump
- support for extensions that explicitly introduce 16-bit types

For lowp and mediump, the approach of just having a HalfPrecision bit on
the instructions is probably fine.

The second case is different. I don't think there are ARB extensions for
that yet, but there are AMD_gpu_shader_{int16,half_float} with
explicitly 16-bit types. (There's also NV_half_float, but that's from
earlier days without GLSL.) For those, we'd really need to provide
exactly the required operation. No special handling of TGSI temporaries
is needed: an f16vec4 is represented as a normal 4-component vector in
TGSI, just that the upper 16 bits of each component are ignored.
That looks ok to me, albeit you could choose that differently, hence why
I mentioned it (you could pack your 4 16bit members into the x/y
components of the 4x32bit vector).

I thought about this as well, but packing 4 components into x/y would make swizzling a nightmare.


Here's another question: What does "low precision" mean on a texture
instruction? Are the offsets low precision or is it the output? Maybe we
can punt on this for now -- at least GCN doesn't have low precision
there anyway.

To sum it up:
- I think there have to be separate flags for "this is a true 16-bit
instruction" and for "optional low precision" -- in the latter, the
driver is responsible for on-the-fly conversion between half and full types
- Apart from potential future issues with texture instructions, I think
the flags on instructions are fine. So the plan is fine for GLES
lowp/mediump.

Also, we're running out of bits here, but some of those bits can be
moved into a separate instruction flags word when the time comes.


There's still some bits left in the instruction token if you really
really need them. Type doesn't need to be 4 bits (at least one bit can
go, even 2 is sufficient at least now, albeit you'd need to change all
tokens), the same is true for NumSrcRegs, where 4 bits is at least one
too many.

I am however still wondering if it really makes sense to have both
hinted and explicit 16bit instructions (because it looks like eventually
it's going to be more work for drivers, having to handle both some day).

I know, it's not a completely clear-cut decision.

The main thing is that truly going to 16-bits may not always be beneficial because we need to introduce the conversion instruction(s), so it'd be neat to communicate the optionality to the driver.

Cheers,
Nicolai
--
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Reply via email to