On 06/06/2018 05:39 PM, Ian Romanick wrote: > On 06/06/2018 04:26 PM, Matt Turner wrote: >> On Wed, Jun 6, 2018 at 2:33 PM, Ian Romanick <[email protected]> wrote: >>> From: Ian Romanick <[email protected]> >>> >>> Shader-db results: >>> >>> Skylake and Broadwell had similar results. (Skylake shown) >>> total instructions in shared programs: 14371513 -> 14346174 (-0.18%) >>> instructions in affected programs: 890389 -> 865050 (-2.85%) >>> helped: 3601 >>> HURT: 1 >>> helped stats (abs) min: 1 max: 92 x̄: 7.05 x̃: 4 >>> helped stats (rel) min: 0.10% max: 25.00% x̄: 3.95% x̃: 3.23% >>> HURT stats (abs) min: 43 max: 43 x̄: 43.00 x̃: 43 >>> HURT stats (rel) min: 0.90% max: 0.90% x̄: 0.90% x̃: 0.90% >>> 95% mean confidence interval for instructions value: -7.27 -6.80 >>> 95% mean confidence interval for instructions %-change: -4.05% -3.84% >>> Instructions are helped. >>> >>> total cycles in shared programs: 532435951 -> 532154282 (-0.05%) >>> cycles in affected programs: 69203137 -> 68921468 (-0.41%) >>> helped: 2654 >>> HURT: 981 >>> helped stats (abs) min: 1 max: 4496 x̄: 177.17 x̃: 76 >>> helped stats (rel) min: <.01% max: 71.34% x̄: 9.16% x̃: 5.42% >>> HURT stats (abs) min: 1 max: 33338 x̄: 192.20 x̃: 19 >>> HURT stats (rel) min: <.01% max: 36.36% x̄: 2.95% x̃: 1.46% >>> 95% mean confidence interval for cycles value: -113.38 -41.60 >>> 95% mean confidence interval for cycles %-change: -6.24% -5.53% >>> Cycles are helped. >>> >>> total spills in shared programs: 8114 -> 8122 (0.10%) >>> spills in affected programs: 152 -> 160 (5.26%) >>> helped: 0 >>> HURT: 2 >>> >>> total fills in shared programs: 11082 -> 11100 (0.16%) >>> fills in affected programs: 375 -> 393 (4.80%) >>> helped: 1 >>> HURT: 1 >>> >>> Haswell, Ivy Bridge, and Sandy Bridge had similar results. (Ivy Bridge >>> shown) >>> total instructions in shared programs: 9897654 -> 9890341 (-0.07%) >>> instructions in affected programs: 213092 -> 205779 (-3.43%) >>> helped: 775 >>> HURT: 18 >>> helped stats (abs) min: 1 max: 65 x̄: 9.62 x̃: 6 >>> helped stats (rel) min: 0.11% max: 25.00% x̄: 4.85% x̃: 3.70% >>> HURT stats (abs) min: 2 max: 20 x̄: 7.89 x̃: 6 >>> HURT stats (rel) min: 0.70% max: 2.59% x̄: 1.63% x̃: 1.70% >>> 95% mean confidence interval for instructions value: -9.93 -8.51 >>> 95% mean confidence interval for instructions %-change: -5.01% -4.40% >>> Instructions are helped. >>> >>> total cycles in shared programs: 87653348 -> 87562421 (-0.10%) >>> cycles in affected programs: 2411339 -> 2320412 (-3.77%) >>> helped: 612 >>> HURT: 227 >>> helped stats (abs) min: 1 max: 2103 x̄: 162.83 x̃: 53 >>> helped stats (rel) min: 0.05% max: 58.41% x̄: 6.50% x̃: 2.65% >>> HURT stats (abs) min: 1 max: 772 x̄: 38.43 x̃: 10 >>> HURT stats (rel) min: 0.04% max: 36.36% x̄: 3.60% x̃: 0.92% >>> 95% mean confidence interval for cycles value: -128.53 -88.22 >>> 95% mean confidence interval for cycles %-change: -4.39% -3.14% >>> Cycles are helped. >>> >>> No change on Iron Lake or GM45. >>> >>> Signed-off-by: Ian Romanick <[email protected]> >>> --- >>> src/intel/compiler/brw_nir.c | 14 ++++++++++++++ >>> 1 file changed, 14 insertions(+) >>> >>> diff --git a/src/intel/compiler/brw_nir.c b/src/intel/compiler/brw_nir.c >>> index 67c062d91f5..ca9b021767f 100644 >>> --- a/src/intel/compiler/brw_nir.c >>> +++ b/src/intel/compiler/brw_nir.c >>> @@ -557,7 +557,21 @@ brw_nir_optimize(nir_shader *nir, const struct >>> brw_compiler *compiler, >>> OPT(nir_copy_prop); >>> OPT(nir_opt_dce); >>> OPT(nir_opt_cse); >>> + >>> + /* Passing 0 to the peephole select pass causes it to convert >>> + * if-statements that contain only move instructions in the branches >>> + * regardless of the count. >>> + * >>> + * Passing 0 to the peephole select pass causes it to convert > > Passing 1 > > I thought I already fixed that. :( > >>> + * if-statements that contain at most a single ALU instruction >>> (total) >>> + * in both branches. The select instruction works somewhat >>> differently >>> + * on Gen5 and earlier, and adding this pass on those platforms was >> >> It does? Something about min/max requiring the CMP? > > I remember thinking the problem was obvious when I looked at the first > shader that was hurt, but I don't recall exactly what it was now. I'll > try it again.
Here are the ILK results: total instructions in shared programs: 7774514 -> 7773708 (-0.01%) instructions in affected programs: 89355 -> 88549 (-0.90%) helped: 162 HURT: 26 helped stats (abs) min: 2 max: 18 x̄: 6.46 x̃: 6 helped stats (rel) min: 0.17% max: 13.04% x̄: 2.29% x̃: 1.09% HURT stats (abs) min: 2 max: 20 x̄: 9.23 x̃: 8 HURT stats (rel) min: 0.70% max: 2.48% x̄: 1.66% x̃: 1.61% 95% mean confidence interval for instructions value: -5.25 -3.32 95% mean confidence interval for instructions %-change: -2.14% -1.35% Instructions are helped. total cycles in shared programs: 177899700 -> 177958996 (0.03%) cycles in affected programs: 753424 -> 812720 (7.87%) helped: 88 HURT: 100 helped stats (abs) min: 2 max: 76 x̄: 22.84 x̃: 16 helped stats (rel) min: 0.05% max: 6.16% x̄: 0.91% x̃: 0.63% HURT stats (abs) min: 4 max: 2946 x̄: 613.06 x̃: 512 HURT stats (rel) min: 0.33% max: 48.26% x̄: 12.16% x̃: 8.37% 95% mean confidence interval for cycles value: 236.67 394.14 95% mean confidence interval for cycles %-change: 4.41% 7.68% Cycles are HURT. Looking at the hurt shaders (both instructions and cycles), I'm now not sure why I wrote this comment. :( There are appear to be two separate, unrelated problems that cause shaders to be hurt: - Instructions are generally hurt when more Boolean resolves have to be inserted. It's actually a little hard to tell for sure because the smallest hurt shader is 277 instructions. - Cycles are hurt when math box instructions are moved out of if-statements. shaders/unity/24-Tree.shader_test VS (cycles hurt by 44%) is an example of this. A pow that was conditional became unconditional. I think both are solvable depending on how much work I feel like doing. On <= Gen5 we emit a lot of stuff like cmp.ge(f0) g32, g3, g8 and g32, g32, 1D ... cmp.nz(f0) NULL, -g32, 0D Where the second compare is the only use of g32. We should just emit cmp.ge(f0) g32, g3, g8 ... and.nz(f0) null, g32, 1D Looking at the existing code, I don't think this will be too hard. I am a little confused that we emit the resolves at code generation instead of treating it like a lowering pass in NIR. > _______________________________________________ > mesa-dev mailing list > [email protected] > https://lists.freedesktop.org/mailman/listinfo/mesa-dev _______________________________________________ mesa-dev mailing list [email protected] https://lists.freedesktop.org/mailman/listinfo/mesa-dev
