On Thu, 2015-07-02 at 13:04 +0300, Oded Gabbay wrote: > Hi, > > This patch-set implements the most heavily used fast paths, according to > profiling done by me using the cairo traces package.
I finally got a chance to try this series on a power7, and the results are... mixed. A sampling of x11perf numbers (against Xvfb, just switching pixman before and after): before after Operation ------------ ------------------------- ------------------------- 6856255.6 5564651.7 ( 0.812) 10x10 rectangle 125522.9 455209.1 ( 3.627) 100x100 rectangle 5419.2 29705.8 ( 5.482) 500x500 rectangle This one is telling, I think. This should be the vmx_fill path, and it looks like a nice win for large ops but a hit for small ops. Is the vmx setup cost that high, or is there something else going on? 1641838.0 1684290.9 ( 1.026) Char in 80-char aa line (Charter 10) 432916.1 466759.2 ( 1.078) Char in 30-char aa line (Charter 24) 1412008.5 1545401.0 ( 1.094) Char in 80-char aa line (Courier 12) 1440361.7 1947014.6 ( 1.352) Char in 80-char rgb line (Charter 10) 384600.6 576289.5 ( 1.498) Char in 30-char rgb line (Charter 24) 1258381.8 1811421.7 ( 1.439) Char in 80-char rgb line (Courier 12) Render text gets faster, nice. 1202555.7 1228256.6 ( 1.021) Scroll 10x10 pixels 162282.8 131857.7 ( 0.813) Scroll 100x100 pixels 6819.8 6256.2 ( 0.917) Scroll 500x500 pixels 1695720.5 1752339.8 ( 1.033) Copy 10x10 from pixmap to window 210222.2 165836.1 ( 0.789) Copy 100x100 from pixmap to window 14408.8 10600.1 ( 0.736) Copy 500x500 from pixmap to window This should be the vmx_blit path, and it gets quite a bit worse for large ops. Eesh. 1021293.5 1060568.6 ( 1.038) PutImage 10x10 square 54803.7 56420.0 ( 1.029) PutImage 100x100 square 1933.5 1935.4 ( 1.001) PutImage 500x500 square 1418641.0 1432543.1 ( 1.010) ShmPutImage 10x10 square 194769.2 160047.5 ( 0.822) ShmPutImage 100x100 square 11951.2 10968.1 ( 0.918) ShmPutImage 500x500 square Again, blit path, and usually worse for large ops. 576975.4 573388.4 ( 0.994) Composite 10x10 from pixmap to window 156830.4 131246.8 ( 0.837) Composite 100x100 from pixmap to window 12172.5 10150.2 ( 0.834) Composite 500x500 from pixmap to window Not-quite-a-blit path, but no transformation, and the same kind of performance hit. 176570.2 176330.2 ( 0.999) Scale 5x5 from pixmap to 10x10 window 4598.0 4460.9 ( 0.970) Scale 50x50 from pixmap to 100x100 window 189.9 185.9 ( 0.979) Scale 250x250 from pixmap to 500x500 window 269540.6 269767.4 ( 1.001) Scale 10x10 from pixmap to 5x5 window 267201.2 268220.5 ( 1.004) Scale 100x100 from pixmap to 5x5 window 766.8 740.1 ( 0.965) Scale 500x500 from pixmap to 250x250 window All within the noise margin, so I suspect the series just doesn't hit these paths. (Ignore the implausible numbers from "Scale 100x100", that's an x11perf bug I just pushed a fix for.) I'm a little hesitant to take a 10% to 20% hit to software blit performance. It might be that vmx_blt is just a mistake to try, that the CPU and compiler are smarter than we are. - ajax _______________________________________________ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman