Siarhei Siamashka <siarhei.siamas...@gmail.com> writes: > I just wonder how big is the performance cost for adding an extra > comparison operation. Probably much less than using -ffloat-store, > -fexcess-precision=standard, and -std=c99 options, but might be > interesting to confirm.
It's not going to matter all that much in any case since we are talking about floating point variants of operations that involve divisions. These are not used that much, and the divisions will tend to swamp a lot of the difference. However, I added conjoint_over_8888_2a10 to lowlevel-blt-test and did some measurements: As a baseline, current master compiled with -m32 and == 0.0f checks: conjoint_over_8888_2a10 = L1: 5.62 L2: 5.67 M: 5.65 ( 0.50%) HT: 5.59 VT: 5.52 R: 5.49 RT: 5.06 ( 68Kops/s) With the FLT_MIN checks: conjoint_over_8888_2a10 = L1: 5.68 L2: 5.73 M: 5.72 ( 0.51%) HT: 5.65 VT: 5.53 R: 5.45 RT: 5.02 ( 67Kops/s) The numbers are actually slightly better with the checks, so I suspect the difference is just noise (although conceivably, the checks may filter out more divisions than before). When just pixman-combine-float.c is compiled with -ffloat-store: conjoint_over_8888_2a10 = L1: 5.58 L2: 5.60 M: 5.60 ( 0.50%) HT: 5.53 VT: 5.44 R: 5.41 RT: 4.99 ( 67Kops/s) The numbers here are slightly worse than the baseline, but possibly still just noise. If all of pixman is compiled with -ffloat-store: conjoint_over_8888_2a10 = L1: 4.31 L2: 4.34c M: 4.31 ( 0.38%) HT: 4.26 VT: 4.21 R: 4.14 RT: 3.92 ( 53Kops/s) the numbers are clearly worse. Finally, the numbers in x86_64 mode. Current master: conjoint_over_8888_2a10 = L1: 19.09 L2: 19.58 M: 19.13 ( 1.75%) HT: 17.47 VT: 17.35 R: 17.32 RT: 13.72 ( 178Kops/s) With FLT_MIN checks: conjoint_over_8888_2a10 = L1: 19.09 L2: 19.59 M: 19.51 ( 1.76%) HT: 17.52 VT: 17.02 R: 17.00 RT: 13.43 ( 175Kops/s) Ie., no real difference. Søren _______________________________________________ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman