nir: use vectorization for non-scalar stages

Jason Ekstrand Thu, 18 Oct 2018 13:58:22 -0700

On Thu, Oct 18, 2018 at 3:46 PM Ian Romanick <[email protected]> wrote:


> On 10/18/2018 01:22 PM, Jason Ekstrand wrote:
> > On Thu, Oct 18, 2018 at 3:11 PM Ian Romanick <[email protected]
> > <mailto:[email protected]>> wrote:
> >
> >     On 10/17/2018 11:33 AM, Jason Ekstrand wrote:
> >     > From: Connor Abbott <[email protected] <mailto:
> [email protected]>>
> >     >
> >     > Shader-db results on Haswell:
> >     >
> >     >     total instructions in shared programs: 2180337 -> 2154080
> (-1.20%)
> >     >     instructions in affected programs: 959766 -> 933509 (-2.74%)
> >     >     helped: 5653
> >     >     HURT: 2560
> >     >
> >     >     total cycles in shared programs: 12339326 -> 12307102 (-0.26%)
> >     >     cycles in affected programs: 6102794 -> 6070570 (-0.53%)
> >     >     helped: 3838
> >     >     HURT: 4868
> >
> >     Here's the results I got with these 3 patches on 322a919a41f:
> >
> >     total instructions in shared programs: 13674046 -> 13643001 (-0.23%)
> >     instructions in affected programs: 1248672 -> 1217627 (-2.49%)
> >     helped: 7168
> >     HURT: 2841
> >     helped stats (abs) min: 1 max: 39 x̄: 5.40 x̃: 3
> >     helped stats (rel) min: 0.21% max: 33.33% x̄: 4.55% x̃: 3.54%
> >     HURT stats (abs)   min: 1 max: 21 x̄: 2.71 x̃: 3
> >     HURT stats (rel)   min: 0.19% max: 22.73% x̄: 3.86% x̃: 3.53%
> >     95% mean confidence interval for instructions value: -3.23 -2.97
> >     95% mean confidence interval for instructions %-change: -2.28% -2.05%
> >     Instructions are helped.
> >
> >     total cycles in shared programs: 373694400 -> 373745788 (0.01%)
> >     cycles in affected programs: 23171532 -> 23222920 (0.22%)
> >     helped: 4890
> >     HURT: 5632
> >     helped stats (abs) min: 2 max: 1268 x̄: 52.04 x̃: 34
> >     helped stats (rel) min: 0.04% max: 45.71% x̄: 7.43% x̃: 4.64%
> >     HURT stats (abs)   min: 2 max: 6042 x̄: 54.30 x̃: 32
> >     HURT stats (rel)   min: 0.05% max: 60.66% x̄: 8.19% x̃: 6.21%
> >     95% mean confidence interval for cycles value: 1.30 8.47
> >     95% mean confidence interval for cycles %-change: 0.73% 1.14%
> >     Cycles are HURT.
> >
> >     total spills in shared programs: 82569 -> 82572 (<.01%)
> >     spills in affected programs: 70 -> 73 (4.29%)
> >     helped: 0
> >     HURT: 3
> >
> >     total fills in shared programs: 93445 -> 93449 (<.01%)
> >     fills in affected programs: 71 -> 75 (5.63%)
> >     helped: 0
> >     HURT: 4
> >
> >     This is pretty different from your result... and not good. :(  What
> SHA
> >     of master were you on?
> >
> >
> > For one thing, I scrubbed all the non-vec4 programs from the results
> > because this doesn't affect FS.  I'm not sure what master; something
> > from the last two days; I just rebased.  Maybe you have a newer
> shader-db?
>
> I doubt my shader-db is newer.  The last time I updated was when you
> added a bunch of shaders. :)  Scrubbing FS programs should only affect
> the values shown in "total XXX in shader programs", right?  Or did you
> do something other than
>
./report.py <(grep -v SIMD before.txt) <(grep -v SIMD after.txt)
>
> When I do that, I get 'total ...' numbers a bit closer to, but still
> larger than, yours.
>
> Looking at the actual data, the shaders most hurt for cycles are all
> shaders that have been in shader-db for years... lots of Unigine and
> L4D2. :(  The spills / fills hurt are all TES in Tomb Raider.
>

One other thing is that my Haswell numbers were taking by using ./run -p
hsw on a KBL.  It's possible you have more/less features enabled or
something.

--Jason


> >     > Most of the hurt programs seem to be because we generate extra
> >     MOV's due
> >     > to vectorizing things. For example, in
> >     > shaders/non-free/steam/anomaly-2/158.shader_test, this:
> >     >
> >     > add(8)          g116<1>.xyF     g12<4,4,1>.xyyyF g1.4<0,4,1>.xyyyF
> >     { align16 NoDDClr 1Q };
> >     > add(8)          g117<1>.xyF     g12<4,4,1>.xyyyF g1.4<0,4,1>.zwwwF
> >     { align16 NoDDClr 1Q };
> >     > add(8)          g116<1>.zwF     g12<4,4,1>.xxxyF
> >     -g1.4<0,4,1>.xxxyF { align16 NoDDChk 1Q };
> >     > add(8)          g117<1>.zwF     g12<4,4,1>.xxxyF
> >     -g1.4<0,4,1>.zzzwF { align16 NoDDChk 1Q };
> >     >
> >     > Turns into this:
> >     >
> >     > add(8)          g13<1>F         g12<4,4,1>.xyxyF g1.4<0,4,1>F   {
> >     align16 1Q };
> >     > add(8)          g14<1>F         g12<4,4,1>.xyxyF -g1.4<0,4,1>F  {
> >     align16 1Q };
> >     > mov(8)          g116<1>.xyD     g13<4,4,1>.xyyyD                {
> >     align16 NoDDClr 1Q };
> >     > mov(8)          g117<1>.xyD     g13<4,4,1>.zwwwD                {
> >     align16 NoDDClr 1Q };
> >     > mov(8)          g116<1>.zwD     g14<4,4,1>.xxxyD                {
> >     align16 NoDDChk 1Q };
> >     > mov(8)          g117<1>.zwD     g14<4,4,1>.zzzwD                {
> >     align16 NoDDChk 1Q };
> >     >
> >     > So we eliminated two add's, but then had to introduce four mov's to
> >     > transpose the result.  Some of the hurt is because vectorization
> >     is a bit
> >     > over-aggressive and we vectorize something when we should have
> left it
> >     > as a scalar and CSEd it.  Unfortunately, this is all really tricky
> >     to do
> >     > as it involves the interactions between many different components.
> >     > ---
> >     >  src/intel/compiler/brw_nir.c | 6 ++++++
> >     >  1 file changed, 6 insertions(+)
> >     >
> >     > diff --git a/src/intel/compiler/brw_nir.c
> >     b/src/intel/compiler/brw_nir.c
> >     > index 297845b89b7..564fd004a94 100644
> >     > --- a/src/intel/compiler/brw_nir.c
> >     > +++ b/src/intel/compiler/brw_nir.c
> >     > @@ -568,6 +568,12 @@ brw_nir_optimize(nir_shader *nir, const
> >     struct brw_compiler *compiler,
> >     >        OPT(nir_copy_prop);
> >     >        OPT(nir_opt_dce);
> >     >        OPT(nir_opt_cse);
> >     > +
> >     > +      if (!is_scalar) {
> >     > +         OPT(nir_opt_vectorize);
> >     > +         OPT(nir_copy_prop);
> >     > +      }
> >     > +
> >     >        OPT(nir_opt_peephole_select, 0);
> >     >        OPT(nir_opt_intrinsics);
> >     >        OPT(nir_opt_algebraic);
> >     >
> >
>
>

_______________________________________________
mesa-dev mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 3/3] i965/nir: use vectorization for non-scalar stages

Reply via email to