On 03/05/2018 04:35 PM, Matt Turner wrote:
> On Fri, Feb 23, 2018 at 3:56 PM, Ian Romanick <i...@freedesktop.org> wrote:
>> From: Ian Romanick <ian.d.roman...@intel.com>
>> On vector platforms, this helps elide some constant loads.
>> No changes on Broadwell or Skylake.
>> Haswell
>> total instructions in shared programs: 13093793 -> 13060163 (-0.26%)
>> instructions in affected programs: 1277532 -> 1243902 (-2.63%)
>> helped: 13216
>> HURT: 95
> What's going on in the hurt shaders?

I'm not completely sure.  All of these shaders are negatively affected
by the DPH transformation.  Only one of the shaders is small enough (19
instructions) to easily examine.  In that case, it looks like a couple
things end up not getting loaded via VF.  Only one of those is the DPH
operand.  There are appear to be changes in the constant loading in the
others as well, which causes the shaders to diverge slightly after about
5 instruction making comparisons between the 70+ instruction shaders
frustrating at best.  Many of the shorter shaders had flow control, so
that exacerbated the issue.

I tried a couple modifications to the DPH pattern including
'vec4(is_used_once)' and 'c(is_not_const)'.  These had missed results in
cycles, and didn't consistently help the instruction counts in the 95.

I did discover that I should have listed the transformations in the
opposite order.  As is, code that matches the last fdot4 pattern will
never become a multiply (speculation) because the previous
transformations will gradually convert it to a fdot2.

Flipping the order helped instructions in 1 program but hurt cycles.
Looking at the changed shader, it appears that flipping the order allows
an fdot4 to be converted to and fdot2 instead of an fdot3.  This allows
CSE to eliminate the (new) fdot2.

Oddly, flipping the order made a shader-db slightly slower...
1.113±0.711 seconds (0.429%±0.274%) at n=10 for a HSW run on my quadcore
HSW desktop.  I would have expected it to be slightly faster. *shrug*
mesa-dev mailing list

Reply via email to