Performance numbers before/after on MIPS-74kc @ 1GHz:
lowlevel-blt-bench results
Referent (before):
over__ = L1: 19.47 L2: 16.30 M: 11.24 ( 59.69%) HT: 9.54
VT: 9.29 R: 9.47 RT: 6.24 ( 37Kops/s)
Optimized:
over__ = L1: 43.67 L2: 33.30 M: 16.32
Performance numbers before/after on MIPS-74kc @ 1GHz:
lowlevel-blt-bench results
Referent (before):
over__0565 = L1: 13.22 L2: 12.02 M: 9.77 ( 38.92%) HT: 8.58
VT: 8.35 R: 8.38 RT: 5.78 ( 35Kops/s)
Optimized:
over__0565 = L1: 26.20 L2: 22.97 M: 15.92
Performance numbers before/after on MIPS-74kc @ 1GHz:
lowlevel-blt-bench results
Referent (before):
src_0565_ = L1: 20.70 L2: 19.22 M: 12.50 ( 49.79%) HT: 10.45
VT: 10.18 R: 9.99 RT: 5.31 ( 31Kops/s)
Optimized:
src_0565_ = L1: 62.98 L2: 53.44 M: 23.07
After introducing new PRNG (pseudorandom number generator) a bug in two DSPr2
routines was revealed. Bug manifested by wrong calculation in composite and
glyph tests, which caused make check to fail for MIPS DSPr2 optimizations.
Bug was in the calculation of the:
*dst = over (src, *dst) when ma
Rounding logic was not implemented right.
Instead of using rounding version of the 8-bit shift, logical shifts were used.
Also, code used unnecessary multiplications, which could be avoided by packing
4 destination (a8) pixel into one 32bit register. There were also, unnecessary
spills on stack.
---
test/lowlevel-blt-bench.c |2 ++
1 files changed, 2 insertions(+), 0 deletions(-)
diff --git a/test/lowlevel-blt-bench.c b/test/lowlevel-blt-bench.c
index 4e16f7b..a1657ea 100644
--- a/test/lowlevel-blt-bench.c
+++ b/test/lowlevel-blt-bench.c
@@ -643,6 +643,8 @@ tests_tbl[] =
{
Add necessary support to lowlevel-blt benchmark for benchmarking pixbuf and
rpixbuf fast paths. bench_composite function now checks for pixbuf string in
testname, and if that is detected, use same bits for src and mask images.
---
test/lowlevel-blt-bench.c | 11 +--
1 files changed, 9
Performance numbers before/after on MIPS-74kc @ 1GHz:
lowlevel-blt-bench results
Referent (before):
pixbuf = L1: 18.18 L2: 16.47 M: 13.36 (107.27%) HT: 10.16 VT:
10.07 R: 9.84 RT: 5.54 ( 35Kops/s)
Optimized:
pixbuf = L1: 43.54 L2: 36.02 M: 17.08 (137.09%) HT: