ir: Improve Performance of Integer Multiplication

Rhys Perry Wed, 13 Jun 2018 15:04:12 -0700

This series improve the performance of integer multiplication by removing
much usage of the very slow IMAD and IMUL. It depends on the
SHLADD/IndirectPropagation patches.


The first and second patch add support for the XMAD instruction in codegen

The third patch replaces most IMADs and IMULs with a sequence of XMADs.
This is far faster but increases the total instructions in the shader-db
by 0.72%.

This number is significantly lowered with the next patch. It replaces many
multiplications with instructions that should be as fast or faster than
the XMAD approach. They are also typically be smaller and less register
heavy, so they decrease the total instruction count by -0.50%.

This series gives about a ~50% speedup in fragment-heavy scenaries with
Dolphin 5.0. All timings were made with interesting looking fifos from
Dolphin's bugtracker:
     Wind Waker: 18 FPS -> 26 FPS at 3x internal resolution
     Wind Waker:  8 FPS -> 11 FPS at 5x internal resolution
   Paper Mario?: 26 FPS -> 42 FPS at 5x internal resolution
SpongeBob Movie: 19 FPS -> 30 FPS at 5x internal resolution

Unigine Heaven and Unigine Valley seems to run the same at low quality with
no anti-aliasing and no tessellation. SuperTuxKart and 0 A.D. also show no
change.

It's possible these patches may break something, especially the fourth
one. Piglit shows no functionality regressions though they should probably
be tested for improvements or breakage with actual applications.

These patches can also be found on my github:
https://github.com/pendingchaos/mesa/tree/nv-xmad-v1

The final changes in shader-db are as follows:

total instructions in shared programs : 5256901 -> 5268293 (0.22%)
total gprs used in shared programs    : 624328 -> 624196 (-0.02%)
total shared used in shared programs  : 360704 -> 360704 (0.00%)
total local used in shared programs   : 20952 -> 20952 (0.00%)

                local     shared        gpr       inst      bytes 
    helped           0           0         255         680         680 
      hurt           0           0         128        1484        1484 

Rhys Perry (4):
  nv50/ir: add preliminary support for OP_XMAD
  gm107/ir: add support for OP_XMAD on GM107+
  nv50/ir: optimize imul/imad to xmads
  nv50/ir: further optimize multiplication by immediates

 src/gallium/drivers/nouveau/codegen/nv50_ir.cpp    |   3 +-
 src/gallium/drivers/nouveau/codegen/nv50_ir.h      |  14 ++
 .../drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp |  61 +++++++
 .../drivers/nouveau/codegen/nv50_ir_peephole.cpp   | 188 +++++++++++++++++++--
 .../drivers/nouveau/codegen/nv50_ir_print.cpp      |  20 +++
 .../drivers/nouveau/codegen/nv50_ir_target.cpp     |   7 +-
 .../nouveau/codegen/nv50_ir_target_gm107.cpp       |   5 +
 .../nouveau/codegen/nv50_ir_target_nv50.cpp        |   5 +-
 .../nouveau/codegen/nv50_ir_target_nvc0.cpp        |  26 ++-
 src/util/bitscan.h                                 |  26 +++
 10 files changed, 331 insertions(+), 24 deletions(-)

-- 
2.14.4

_______________________________________________
mesa-dev mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 0/4] nv50/ir: Improve Performance of Integer Multiplication

Reply via email to