---
 src/compiler/nir/nir_lower_precision.cpp | 106 ++++++++++++++++++++++-
 1 file changed, 104 insertions(+), 2 deletions(-)

diff --git a/src/compiler/nir/nir_lower_precision.cpp 
b/src/compiler/nir/nir_lower_precision.cpp
index 3d05fa2b3c9..9647fb4d6a9 100644
--- a/src/compiler/nir/nir_lower_precision.cpp
+++ b/src/compiler/nir/nir_lower_precision.cpp
@@ -21,8 +21,110 @@
  * IN THE SOFTWARE.
  */
 
-/* TODO: Introduce helpers in C++ space for examining GLSL types and make
- *       this file just C.
+/*
+ * This lowering pass seeks to change precision for float values and for the
+ * expressions producing them. Decision making follows the rules described in
+ * GLES 3.2 Specification and section 4.7.3 Precision Qualifiers. Shortly the
+ * idea is that an arithmetic expression can be performed in 16-bit precision
+ * if and only if all its operands are either already fixed to 16-bits or are
+ * such that compiler is free to use either 32 or 16-bit precision.
+ *
+ * First step is to go over the variables as these are the only things that
+ * are marked by the shader author with explicit instructions if high
+ * precision is needed (or lower precision allowed respectively). This
+ * implementation sets precision unconditionally to 16-bits whenever allowed
+ * by the shader author. (This may not produce the most optimal end result
+ * but is a design choice to keep the complexity at bay).
+ *
+ * This is followed by manipulation of instructions themselves. Variable
+ * derefs and intrinsics dealing with derefs are straight-forward. For them
+ * one only needs to consult the variables themselves and adjust the precision
+ * of the instruction in question accordingly.
+ *
+ * At this point things get more complex as the rest are dependent on context.
+ * Precision for texturing return values (sample values) and constant loads 
+ * depend on the needs of consuming expressions. As there may be need for
+ * both 16 and 32-bit precision, one cannot simply just set them as 16-bits.
+ * There may be, for example, two separate multiplications of a sample value
+ * S. One multiplying it with 32-bit value A and the other with 16-bit
+ * value B. (Recall that the rules mandate that if one of the source operands
+ * has full precision then the rest need to have full precision as well).
+ * Hence A * S requires the sample value S with full precision. This in turn
+ * means that texturing needs to return full precision and needs to convert
+ * sample value as to 16-bits for the other multiplication (B * S).
+ * NOTE: Hardware may have capability for mixed mode instructions and it is
+ *       left for the backend to drop any unnecessary conversions.
+ *
+ * Here the implementation leaves all texturing and constant load operations
+ * to 32-bit precision until all instructions are analysed. Instead it inserts
+ * conversions from 32-bits to 16-bits for expressions that can operate with
+ * lower precision. In the example above, the multiplication of the sample
+ * with 16-bit value B would become C = B * f2f16(S). This is important for
+ * the analysis of the rest of the instructions. Once the pass examines
+ * expressions consuming C the pass can allow these expressions with lower
+ * precision if all the operands are allowed in lower precision. If one had
+ * left C with 32-bit precision it would have prevented the use of 16-bit
+ * precision in the consuming expressions even though all other operands would
+ * have allowed that.
+ *
+ * Once all instructions are examined there is separate pass that goes thru
+ * all the uses of texturing return values. If all are happy with lower
+ * precision, the pass removes the conversions (f2f16) and switches the
+ * texturing itself to directly return 16-bit samples (given that hardware
+ * support 16-bit sample values of course).
+ *
+ * For input varyings marked with lower precision there is an alternative to
+ * uploading 16-bit values into the shader. One can load them with using full
+ * precision but immediately convert them into 16-bits before they are used.
+ * This allows one to perform all calculations based on them in 16-bit
+ * precision but still keep the upload mechanism intact in the backend.
+ *
+ * TODO:
+ *
+ * 1) There is still major flaw: logic is against the rules as it considers
+ *    arithmetic expressions without consider to their consuming expressions.
+ *    As alus at nir level are just sub-expressions of larger expressions they
+ *    are subject to the uses and shouldn't be examined just based on their
+ *    own sources.
+ *    One should recursively examaine uses until either a fixed search depth
+ *    (heuristic to avoid runtime explosion) or it becomes clear which
+ *    precision is needed. Naturally there may be both low and high precision
+ *    uses. In order to keep things simple one could just force all lower
+ *    precision uses to high in case even one high precision use is found or
+ *    the search depth boundary is hit.
+ *
+ * 2) Validation (NIR) expects expression producing booleans to have
+ *    destination size of 32-bits. This prevents one from marking expressions
+ *    that operate with 16-bit sources and producing booleans as having
+ *    16-bit destination. Some hardware (such as Intel) works this way.
+ *    Implementation here inserts dummy conversions to 16-bits (namely i2i16
+ *    beacuse NIR allows treating SSA values produced as booleans to be
+ *    treated as integers without explicit conversions). This allows the
+ *    aforementioned analysis to use 16-bit precision in consuming expressions
+ *    such as logical and/or that have integer typed sources that could be
+ *    produced using, for example, with less-than comparison.
+ *
+ *    Similar treatment is needed for boolean sources produced using integer
+ *    expressions. For example, "bcsel" and "if" can operate with values
+ *    produced with 16-bit iand. In order to keep validation happy that
+ *    expects the bit size to be 32 one emits dummy i2i32.
+ *
+ *    At least Intel backend is capable enough to remove most of these
+ *    redundant expressions and adjust the consumers to use the original
+ *    value. In order to get rid of the rest, there is additional lowering
+ *    pass that adjust boolean sized ssa definitions. This is hooked after
+ *    all validation is done (see nir_lower_bool_size.c).
+ *
+ * 3) Allow interpolation of input varyings in 16-bit precision. Currently
+ *    the conversion from 32-bits to 16-bits takes place after the conversion.
+ *
+ * 4) When only 32-bit tex coordinates are supported, see if backend supports
+ *    conversion from 16-bits to 32-bits during the last alu producing the
+ *    actual value. (Most likely belongs to backend such as described above).
+ *
+ * 5) Introduce helpers in C++ space for examining GLSL types and make this
+ *    file just C.
+ *
  */
 
 #include "nir.h"
-- 
2.17.1

_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Reply via email to