On Sun, Apr 12, 2015 at 10:02:03AM +0300, Pohjolainen, Topi wrote: > On Fri, Apr 10, 2015 at 12:52:04PM -0700, Ben Widawsky wrote: > > Certain platforms support the ability to sample from a texture, and write > > it out > > to the file RT - thus saving a costly send instructions (note that this is a > > potnential win if one wanted to backport to a tag that didn't have the patch > > from Topi which removed excess MOVs from LOAD_PAYLOAD - 97caf5fa04dbd2), > > > > v2: Modify the algorithm. Instead of iterating in reverse through blocks and > > insts, since the last block/inst is the only thing which can benefit. > > Rebased > > on top of Ken's patching modifying is_last_send > > > > v3: Rebased over almost 2 months, and Incorporated feedback from Matt: > > Some comment typo fixes and rewordings. > > Whitespace > > Move the optimization pass outside of the optimize loop > > > > v4: Some cosmetic changes requested from Ken. These changes ensured that the > > optimization function always returned true when an optimization occurred, > > and > > false when one did not. This behavior did not exist with the original > > patch. As > > a result, having the separate helper function which Matt did not like no > > longer > > made sense, and so now I believe everyone should be happy. > > > > Braswell data: > > Benchmark (n=20) %diff > > *OglBatch5 -1.4 > > *OglBatch7 -1.79 > > OglFillTexMulti 5.57 > > OglFillTexSingle 1.16 > > OglShMapPcf 0.05 > > OglTexFilterAniso 3.01 > > OglTexFilterTri 1.94 > > > > SKL data: > > NONE COLLECTED > > > > No piglit regressions: > > (http://otc-gfxtest-01.jf.intel.com:8080/view/dev/job/bwidawsk/112/) > > > > [*] I believe my measurements are incorrect for Batch5-7. If I add this new > > optimization, but never emit the new instruction I see similar results. > > I'm seeing ~7% (with 95% confidence) decrease in OglBatch6/7 when I'm > launching resolve clears with the light-weight mechanism provided by blorp. > This may be totally unrelated but lets see if I get any smarter.
I let OglBatch6 run for some time (160 rounds each), and I get: x /mnt/before + /mnt/after +------------------------------------------------------------------------------+ | + x | | + x | | + x x | | + + x x x x | | + + x x x x x | | + + ++ x xx x xx x | | + *++ * +* x*xx+xx xxxx | | + + + **+ *x+*+x**x+x** xxxxx | | + + + ++*** *x+*+***x+x** xxxxx | | + +++++++++***+**+*******x** xxxxxx | | + + +++ ++*++++*+****+**********x**xxxxxxx x++ x | |+ + ++ ** *+***+*+*+**+******************x*x***x+*** * xx*+ x| | |__________|AM_______A__|_____| | +------------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 160 102.365 122.348 113.472 113.21107 3.6714446 + 160 93.4825 121.597 110.289 110.03581 4.3771895 Difference at 95.0% confidence -3.17526 +/- 0.885251 -2.80473% +/- 0.781947% (Student's t, pooled s = 4.03976) I'm not sure if one can really conclude much from this, I would almost claim that my changes just introduce more fluctuation in the fps numbers but nothing else. I examined what callgrind tells me. Both master and meta-blorp got the same amount of frames rendered while the latter does a little less work with cpu to achieve this. The latter also submits slightly less work for the GPU since clears are executed without the vertex shader stage. Hence I can't really explain why it should be any slower. So if I were you I probably wouldn't worry too much about your results. _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev