Module: Mesa Branch: main Commit: a1c765a44c032fe505e4863297b6bbefa0ce426c URL: http://cgit.freedesktop.org/mesa/mesa/commit/?id=a1c765a44c032fe505e4863297b6bbefa0ce426c
Author: Emma Anholt <[email protected]> Date: Tue Aug 9 17:48:14 2022 -0700 freedreno/ir3: Consistently lower mediump inputs to 16-bit (when we can). If every use was a conversion to 16, then ir3_cf would fold it into the bary instruction. But if something had generated a highp comparison of the mediump input with a mediump op result, it would get stuck as highp, even though we could have used 16-bit values without upconverting. This fixes dEQP-GLES2.functional.shaders.algorithm.rgb_to_hsl_fragment on ANGLE on turnip, closing #7043. fossil-db results are mixed: fossil-db: Totals from 697 (4.65% of 14988) affected shaders: MaxWaves: 10712 -> 10736 (+0.22%) Instrs: 82394 -> 83572 (+1.43%); split: -1.31%, +2.74% CodeSize: 178280 -> 180118 (+1.03%); split: -0.46%, +1.49% NOPs: 15887 -> 16067 (+1.13%); split: -7.48%, +8.61% MOVs: 1297 -> 1328 (+2.39%); split: -6.86%, +9.25% Full: 3730 -> 3842 (+3.00%); split: -1.80%, +4.80% (ss): 1877 -> 1849 (-1.49%); split: -5.59%, +4.10% (sy): 1249 -> 1255 (+0.48%); split: -1.04%, +1.52% (ss)-stall: 6809 -> 6364 (-6.54%); split: -13.85%, +7.31% (sy)-stall: 17059 -> 17257 (+1.16%); split: -6.51%, +7.67% Cat0: 17220 -> 17400 (+1.05%); split: -6.90%, +7.94% Cat1: 5307 -> 6366 (+19.95%); split: -6.93%, +26.89% Cat2: 39138 -> 39101 (-0.09%); split: -0.31%, +0.22% Cat3: 16772 -> 16741 (-0.18%) Cat5: 1269 -> 1276 (+0.55%) I tried to pick some apps to test that looked the most impacted, and indeed the results are mixed: cookie_run_kingdom: +0.275514% +/- 0.0883816% (n=68) trex_200: +0.0943847% +/- 0.0297073% (n=1463) command_and_conquer_rivals: no difference (n=131) war_planet_online: no difference (n=120) lego_legacy: -0.192131% +/- 0.152083% (n=99) among_us: -0.625227% +/- 0.385419% (n=60) Given that the perf results are small and go both ways, and apparently we're an outlier in not always lowering mediump inputs to 16-bit, just do it for consistency with other drivers. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18506> --- .../freedreno-a630-skqp-gles_rendertests-fails.txt | 2 ++ src/freedreno/ci/traces-freedreno.yml | 6 ++-- src/freedreno/ir3/ir3_nir.c | 34 ++++++++++++++++++++++ 3 files changed, 39 insertions(+), 3 deletions(-) diff --git a/src/freedreno/ci/freedreno-a630-skqp-gles_rendertests-fails.txt b/src/freedreno/ci/freedreno-a630-skqp-gles_rendertests-fails.txt index 29dc326836a..baef6016744 100644 --- a/src/freedreno/ci/freedreno-a630-skqp-gles_rendertests-fails.txt +++ b/src/freedreno/ci/freedreno-a630-skqp-gles_rendertests-fails.txt @@ -1178,6 +1178,8 @@ image_from_yuv_textures,-1 image_scale_aligned,-1 imagealphathreshold_image,-1 imageblur,-1 +# A few pixels at the edge of the blur, nothing more noticeable at those points than the jaggedness elsewhere. +imageblurclampmode,180 imagefilters_xfermodes,-1 imagefiltersbase,-1 imagefiltersclipped,-1 diff --git a/src/freedreno/ci/traces-freedreno.yml b/src/freedreno/ci/traces-freedreno.yml index 30cb6ba0bb9..c44e29b727b 100644 --- a/src/freedreno/ci/traces-freedreno.yml +++ b/src/freedreno/ci/traces-freedreno.yml @@ -207,7 +207,7 @@ traces: freedreno-a530: checksum: 70e18ba06d56fea277cd3fb000729879 freedreno-a630: - checksum: 9eb6d261c0c5946d8aeb0f41b2b7c1b1 + checksum: 92944a4996ea019f51cfcdeaf4fc6926 glmark2/desktop:windows=4:effect=shadow-v2.trace: freedreno-a306: @@ -223,7 +223,7 @@ traces: freedreno-a530: checksum: 7712c8143a244c56922124f4ac207722 freedreno-a630: - checksum: 7712c8143a244c56922124f4ac207722 + checksum: 8a3b04aff4848fd36de9d956f24fac2f glmark2/effect2d:kernel=1,1,1,1,1;1,1,1,1,1;1,1,1,1,1;-v2.trace: freedreno-a306: @@ -231,7 +231,7 @@ traces: freedreno-a530: checksum: 50f1841f2bb96905c9fcd1815c4a95c0 freedreno-a630: - checksum: 50f1841f2bb96905c9fcd1815c4a95c0 + checksum: 05e7d72609840c897f734dca03748ead glmark2/function:fragment-steps=5:fragment-complexity=low-v2.trace: freedreno-a306: diff --git a/src/freedreno/ir3/ir3_nir.c b/src/freedreno/ir3/ir3_nir.c index a63b66f64bc..f9a35d31eb4 100644 --- a/src/freedreno/ir3/ir3_nir.c +++ b/src/freedreno/ir3/ir3_nir.c @@ -463,6 +463,40 @@ ir3_nir_post_finalize(struct ir3_shader *shader) if (compiler->gen >= 6 && s->info.stage == MESA_SHADER_FRAGMENT && !(ir3_shader_debug & IR3_DBG_NOFP16)) { + /* Lower FS mediump inputs to 16-bit. If you declared it mediump, you + * probably want 16-bit instructions (and have set + * mediump/RelaxedPrecision on most of the rest of the shader's + * instructions). If we don't lower it in NIR, then comparisons of the + * results of mediump ALU ops with the mediump input will happen in highp, + * causing extra conversions (and, incidentally, causing + * dEQP-GLES2.functional.shaders.algorithm.rgb_to_hsl_fragment on ANGLE to + * fail) + * + * However, we can't do flat inputs because flat.b doesn't have the + * destination type for how to downconvert the + * 32-bit-in-the-varyings-interpolator value. (also, even if it did, watch + * out for how gl_nir_lower_packed_varyings packs all flat-interpolated + * things together as ivec4s, so when we lower a formerly-float input + * you'd end up with an incorrect f2f16(i2i32(load_input())) instead of + * load_input). + */ + uint64_t mediump_varyings = 0; + nir_foreach_shader_in_variable(var, s) { + if ((var->data.precision == GLSL_PRECISION_MEDIUM || + var->data.precision == GLSL_PRECISION_LOW) && + var->data.interpolation != INTERP_MODE_FLAT) { + mediump_varyings |= BITFIELD64_BIT(var->data.location); + } + } + + if (mediump_varyings) { + NIR_PASS_V(s, nir_lower_mediump_io, + nir_var_shader_in, + mediump_varyings, + false); + } + + /* This should come after input lowering, to opportunistically lower non-mediump outputs. */ NIR_PASS_V(s, nir_lower_mediump_io, nir_var_shader_out, 0, false); }
