[Pixman] [PATCH 1/2] MIPS: DSPr2: Basic infrastructure for MIPS architecture

2012-02-10 Thread Nemanja Lukic
From: Nemanja Lukic nemanja.lu...@rt-rk.com MIPS DSP instruction set extensions --- configure.ac | 42 pixman/Makefile.am | 13 pixman/pixman-cpu.c| 66 pixman/pixman-mips-dspr2

[Pixman] Basic infrastructure for MIPS architecture and initial set of SRC routines.

2012-02-21 Thread Nemanja Lukic
. - In the future, when M14KE, 1074Kc cores (and others) become available we can add those also to the search string. -mdspr2 compiler flag is automatically enabled for MIPS platforms. It can be disabled at configure time for chips that doesn't support it. Best Regards, Nemanja Lukic

[Pixman] [PATCH 1/2] MIPS: DSPr2: Basic infrastructure for MIPS architecture

2012-02-21 Thread Nemanja Lukic
From: Nemanja Lukic nemanja.lu...@rt-rk.com MIPS DSP instruction set extensions --- configure.ac | 53 pixman/Makefile.am | 13 ++ pixman/pixman-cpu.c| 53 pixman

[Pixman] [PATCH 1/2] MIPS: DSPr2: Basic infrastructure for MIPS architecture

2012-02-22 Thread Nemanja Lukic
From: Nemanja Lukic nemanja.lu...@rt-rk.com MIPS DSP instruction set extensions --- configure.ac | 45 + pixman/Makefile.am | 13 ++ pixman/pixman-cpu.c| 53 pixman/pixman

[Pixman] [PATCH 2/2] MIPS: DSPr2: Added fast-paths for SRC operation.

2012-02-22 Thread Nemanja Lukic
From: Nemanja Lukic nemanja.lu...@rt-rk.com Following fast-path functions are implemented (routines 4, 5 and 6 utilize same fast-memcpy routine): 1. src_x888_ 2. src__0565 3. src_0565_ 4. src_0565_0565 5. src__ 6. src_0888_0888 Performance numbers

[Pixman] MIPS blt and fill routines.

2012-02-29 Thread Nemanja Lukic
Per code review: - Main loop in the pixman_fill_buff16_mips routine now uses 4-byte writes - Added alignment check to ensure that we don't encounter unaligned write with the sw instruction (pixman_fill_buff16_mips) - Added lowlevel-blt-bench results (src_n_0565/src_n_) in the log

[Pixman] [PATCH] MIPS: DSPr2: Added mips_dspr2_blt and mips_dspr2_fill routines.

2012-02-29 Thread Nemanja Lukic
From: Nemanja Lukic nemanja.lu...@rt-rk.com Performance numbers before/after on MIPS-74kc @ 1GHz Referent (before): lowlevel-blt-bench: src_n_0565 = L1: 238.14 L2: 233.15 M: 57.88 ( 77.23%) HT: 53.22 VT: 49.99 R: 47.73 RT: 24.79 ( 91Kops/s) src_n_ = L1

[Pixman] [PATCH] MIPS: DSPr2: Added over_n_8888_8888_ca and over_n_8888_0565_ca fast paths.

2012-03-11 Thread Nemanja Lukic
From: Nemanja Lukic nemanja.lu...@rt-rk.com Performance numbers before/after on MIPS-74kc @ 1GHz Referent (before): lowlevel-blt-bench: over_n___ca = L1: 8.32 L2: 7.65 M: 6.38 ( 51.08%) HT: 5.78 VT: 5.74 R: 5.84 RT: 4.39 ( 37Kops/s) over_n__0565_ca = L1

[Pixman] [PATCH 1/2] MIPS: DSPr2: Added over_n_8_8888 and over_n_8_0565 fast paths.

2012-05-02 Thread Nemanja Lukic
From: Nemanja Lukic nemanja.lu...@rt-rk.com Performance numbers before/after on MIPS-74kc @ 1GHz Referent (before): lowlevel-blt-bench: over_n_8_ = L1: 10.40 L2: 9.79 M: 8.47 ( 33.62%) HT: 7.64 VT: 7.59 R: 7.48 RT: 5.30 ( 40Kops/s) over_n_8_0565 = L1: 7.40 L2

[Pixman] Fix for the bug in the MIPS over_n_8888_8888_ca/over_n_8888_0565_ca routines revealed by make check

2012-05-09 Thread Nemanja Lukic
Benchmark results (lowlevel-blt-bench and cairo-perf-trace) on Malta board (@1Ghz) remain the same as in original commit. ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman

[Pixman] [PATCH] MIPS: DSPr2: Fix for the bug in the MIPS over_n_8888_8888_ca/over_n_8888_0565_ca routines revealed by composite test.

2012-05-09 Thread Nemanja Lukic
From: Nemanja Lukic nemanja.lu...@rt-rk.com --- pixman/pixman-mips-dspr2-asm.S | 60 ++- 1 files changed, 28 insertions(+), 32 deletions(-) diff --git a/pixman/pixman-mips-dspr2-asm.S b/pixman/pixman-mips-dspr2-asm.S index ca03605..87558f0 100644

[Pixman] [PATCH] MIPS: DSPr2: Fix for the bug in the MIPS over_n_8888_8888_ca/over_n_8888_0565_ca routines (introduced in commit d2ee5631) revealed by composite test.

2012-05-23 Thread Nemanja Lukic
From: Nemanja Lukic nemanja.lu...@rt-rk.com In main loop (unrolled by factor 2), instead of negating multiplied mask values by srca, values of srca was negated, and passed as alpha argument for UN8x4_MUL_UN8x4_ADD_UN8x4 macro. Instead of: ma = ~ma; UN8x4_MUL_UN8x4_ADD_UN8x4 (d, ma, s); Code

[Pixman] MIPS bilinear fast paths (src_8888_8_8888, src_8888_8_0565, src_0565_8_x888, src_0565_8_0565, add_8888_8_8888).

2012-06-25 Thread Nemanja Lukic
Added optimizations for several bilinear fast paths: - src__8_ - src__8_0565 - src_0565_8_x888 - src_0565_8_0565 - add__8_ Benchmark results (using tweaked version of the lowlevel-blt-bench which does bilinear scaling using almost identity matrix) on Malta board (@1Ghz)

[Pixman] More MIPS bilinear fast paths (src_8888_8888, src_8888_0565, src_0565_8888, src_0565_0565, over_8888_8888, add_8888_8888).

2012-06-28 Thread Nemanja Lukic
Added optimizations for several bilinear fast paths: - src__ - src__0565 - src_0565_ - src_0565_0565 - over__ - add__ Benchmark results (using tweaked version of the lowlevel-blt-bench which does bilinear scaling using almost identity matrix) on Malta board

[Pixman] More MIPS DSPr2 bilinear fast paths.

2012-07-02 Thread Nemanja Lukic
Added optimizations for several bilinear fast paths: - src__8_ - src__8_0565 - src_0565_8_x888 - src_0565_8_0565 - add__8_ - src__ - src__0565 - src_0565_ - src_0565_0565 - over__ - add__ Benchmark results (using lowlevel-blt-bench) on

[Pixman] [PATCH 1/2] MIPS: DSPr2: Added several bilinear fast paths: - src_8888_8_8888 - src_8888_8_0565 - src_0565_8_x888 - src_0565_8_0565 - add_8888_8_8888

2012-07-02 Thread Nemanja Lukic
From: Nemanja Lukic nemanja.lu...@rt-rk.com Performance numbers before/after on MIPS-74kc @ 1GHz: lowlevel-blt-bench -b Referent (before): src__8_ = L1: 6.37 L2: 6.08 M: 5.46 ( 32.57%) HT: 4.64 VT: 4.61 R: 4.52 RT: 2.85 ( 23Kops/s) src__8_0565

[Pixman] [PATCH 2/2] MIPS: DSPr2: Added more bilinear fast paths (without mask pixels): - src_8888_8888 - src_8888_0565 - src_0565_8888 - src_0565_0565 - over_8888_8888 - add_8888_8888

2012-07-02 Thread Nemanja Lukic
From: Nemanja Lukic nemanja.lu...@rt-rk.com Performance numbers before/after on MIPS-74kc @ 1GHz: lowlevel-blt-bench -b Referent (before): src__ = L1: 8.18 L2: 7.79 M: 6.32 ( 33.51%) HT: 5.78 VT: 5.70 R: 5.61 RT: 3.79 ( 29Kops/s) src__0565

[Pixman] [PATCH] MIPS: DSPr2: Added more fast-paths for OVER operation: - over_8888_n_8888 - over_8888_n_0565 - over_0565_n_0565 - over_8888_8_8888 - over_8888_8_0565 - over_0565_8_0565

2012-08-05 Thread Nemanja Lukic
From: Nemanja Lukic nemanja.lu...@rt-rk.com Performance numbers before/after on MIPS-74kc @ 1GHz: lowlevel-blt-bench results Referent (before): over__n_ = L1: 9.92 L2: 11.27 M: 8.50 ( 45.23%) HT: 4.70 VT: 4.45 R: 4.49 RT: 1.85 ( 20Kops/s

[Pixman] [PATCH 2/4] MIPS: DSPr2: Added fast-paths for OVER operation: - over_8888_n_0565 - over_8888_8_0565

2012-09-14 Thread Nemanja Lukic
From: Nemanja Lukic nemanja.lu...@rt-rk.com Performance numbers before/after on MIPS-74kc @ 1GHz: lowlevel-blt-bench results Referent (before): over__n_0565 = L1: 8.95 L2: 8.33 M: 6.95 ( 27.74%) HT: 4.27 VT: 4.07 R: 4.01 RT: 1.74 ( 19Kops/s

[Pixman] [PATCH 3/4] MIPS: DSPr2: Added fast-paths for OVER operation: - over_0565_n_0565 - over_0565_8_0565

2012-09-14 Thread Nemanja Lukic
From: Nemanja Lukic nemanja.lu...@rt-rk.com Performance numbers before/after on MIPS-74kc @ 1GHz: lowlevel-blt-bench results Referent (before): over_0565_n_0565 = L1: 7.56 L2: 7.24 M: 6.16 ( 16.38%) HT: 4.01 VT: 3.84 R: 3.79 RT: 1.66 ( 18Kops/s

[Pixman] [PATCH 4/4] MIPS: DSPr2: Added OVER combiner and two new fast paths: - over_8888_8888 - over_8888_8888_8888

2012-09-14 Thread Nemanja Lukic
From: Nemanja Lukic nemanja.lu...@rt-rk.com Performance numbers before/after on MIPS-74kc @ 1GHz: lowlevel-blt-bench results Referent (before): over__ = L1: 19.61 L2: 17.10 M: 11.16 ( 59.20%) HT: 16.47 VT: 15.81 R: 14.82 RT: 8.90 ( 50Kops/s

[Pixman] [PATCH 2/3] MIPS: DSPr2: Added more fast-paths for SRC operation:

2012-11-04 Thread Nemanja Lukic
From: Nemanja Lukic nemanja.lu...@rt-rk.com Performance numbers before/after on MIPS-74kc @ 1GHz: lowlevel-blt-bench results Referent (before): src_n_8_ = L1: 13.79 L2: 22.47 M: 17.55 ( 58.28%) HT: 6.95 VT: 6.46 R: 6.34 RT: 2.07 ( 20Kops/s) src_n_8_8 = L1

[Pixman] Several MIPS fast paths.

2012-11-12 Thread Nemanja Lukic
Added optimizations for several: - SRC fast paths: - src_n_8_ - src_n_8_8 - OVER fast paths: - over_n_0565 - over_n_ - OVER nearest neigbor scaling fast paths: - over__8_0565 - over_0565_8_0565 Benchmark results (lowlevel-blt-bench) on Malta board (@1Ghz)

[Pixman] [PATCH 2/3] MIPS: DSPr2: Added more fast-paths for SRC operation:

2012-11-12 Thread Nemanja Lukic
From: Nemanja Lukic nemanja.lu...@rt-rk.com Performance numbers before/after on MIPS-74kc @ 1GHz: lowlevel-blt-bench results Referent (before): src_n_8_ = L1: 13.79 L2: 22.47 M: 17.55 ( 58.28%) HT: 6.95 VT: 6.46 R: 6.34 RT: 2.07 ( 20Kops/s) src_n_8_8 = L1

[Pixman] [PATCH 3/3] MIPS: DSPr2: Added more fast-paths for OVER operation:

2012-11-12 Thread Nemanja Lukic
From: Nemanja Lukic nemanja.lu...@rt-rk.com Performance numbers before/after on MIPS-74kc @ 1GHz: lowlevel-blt-bench results Referent (before): over_n_0565 = L1: 14.48 L2: 21.36 M: 17.57 ( 23.30%) HT: 6.95 VT: 6.44 R: 6.39 RT: 2.16 ( 22Kops/s) over_n_ = L1

[Pixman] [PATCH 1/3] MIPS: DSPr2: Added several nearest neighbor fast paths with a8 mask:

2012-11-12 Thread Nemanja Lukic
From: Nemanja Lukic nemanja.lu...@rt-rk.com Performance numbers before/after on MIPS-74kc @ 1GHz: lowlevel-blt-bench -n Referent (before): over__8_0565 = L1: 9.62 L2: 8.85 M: 7.40 ( 39.27%) HT: 5.67 VT: 5.61 R: 5.45 RT: 2.98 ( 22Kops/s) over_0565_8_0565

[Pixman] Several MIPS fast paths.

2012-11-18 Thread Nemanja Lukic
Added optimizations for several out_reverse, over_reverse and in oprations: - out_reverse_8_0565 - out_reverse_8_ - over_reverse_n_ - in_n_8_8 Benchmark results (lowlevel-blt-bench) on Malta board (@1Ghz) are included in the log messages. Any comments to these patches are

[Pixman] [PATCH 1/2] MIPS: DSPr2: Added more fast-paths for REVERSE operation: - out_reverse_8_0565 - out_reverse_8_8888

2012-11-18 Thread Nemanja Lukic
From: Nemanja Lukic nemanja.lu...@rt-rk.com Performance numbers before/after on MIPS-74kc @ 1GHz: lowlevel-blt-bench results Referent (before): out_reverse_8_0565 = L1: 9.15 L2: 13.56 M: 10.65 ( 21.19%) HT: 9.26 VT: 9.14 R: 8.85 RT: 4.88 ( 37Kops/s

[Pixman] [PATCH 2/2] MIPS: DSPr2: Added more fast-paths: - over_reverse_n_8888 - in_n_8_8

2012-11-18 Thread Nemanja Lukic
From: Nemanja Lukic nemanja.lu...@rt-rk.com Performance numbers before/after on MIPS-74kc @ 1GHz: lowlevel-blt-bench results Referent (before): over_reverse_n_ = L1: 15.25 L2: 17.41 M: 13.53 ( 35.98%) HT: 6.43 VT: 5.98 R: 5.94 RT: 2.18 ( 22Kops/s

[Pixman] [PATCH 2/2] MIPS: DSPr2: Added more fast-paths for SRC operation: - src_0888_8888_rev - src_0888_0565_rev

2013-02-11 Thread Nemanja Lukic
From: Nemanja Lukic nemanja.lu...@rt-rk.com Performance numbers before/after on MIPS-74kc @ 1GHz: lowlevel-blt-bench results Referent (before): src_0888__rev = L1: 51.88 L2: 42.00 M: 19.04 ( 88.50%) HT: 15.27 VT: 14.62 R: 14.13 RT: 7.12 ( 45Kops/s

[Pixman] MIPS DSPr2: Fix for over_n_8888_8888_ca/over_n_8888_0565_ca routines.

2013-03-01 Thread Nemanja Lukic
After introducing new PRNG (pseudorandom number generator) a bug in two DSPr2 routines was revealed. Bug manifested by wrong calculation in composite and glyph tests, which caused make check to fail for MIPS DSPr2 optimizations. ___ Pixman mailing list

[Pixman] [PATCH] MIPS: DSPr2: Fix bug in over_n_8888_8888_ca/over_n_8888_0565_ca routines

2013-03-01 Thread Nemanja Lukic
From: Nemanja Lukic nemanja.lu...@rt-rk.com After introducing new PRNG (pseudorandom number generator) a bug in two DSPr2 routines was revealed. Bug manifested by wrong calculation in composite and glyph tests, which caused make check to fail for MIPS DSPr2 optimizations. Bug

[Pixman] Pixbuf/rpixbuf paths crash

2013-03-01 Thread Nemanja Lukic
-benchmark? Did I construct these two testcases correctly? Thanks, Nemanja Lukic ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman

[Pixman] Several MIPS nearest neigbor scaling fast paths.

2013-03-01 Thread Nemanja Lukic
Added optimizations for several nearest neigbor scaling fast paths: - over__ - over__0565 - src_0565_ Benchmark results (lowlevel-blt-bench) on Malta board (@1Ghz) are included in the log messages. Any comments to this patch are welcome.

[Pixman] [PATCH] MIPS: DSPr2: Added several nearest neighbor fast paths: - over_8888_8888 - over_8888_0565 - src_0565_8888

2013-03-01 Thread Nemanja Lukic
From: Nemanja Lukic nemanja.lu...@rt-rk.com Performance numbers before/after on MIPS-74kc @ 1GHz: lowlevel-blt-bench results Referent (before): over__ = L1: 19.47 L2: 16.30 M: 11.24 ( 59.69%) HT: 9.54 VT: 9.29 R: 9.47 RT: 6.24 ( 37Kops/s) over__0565

Re: [Pixman] [PATCH] MIPS: DSPr2: Fix bug in over_n_8888_8888_ca/over_n_8888_0565_ca routines

2013-03-04 Thread Nemanja Lukic
that this is not the only problem in the MIPS DSPr2 code. Using test/fuzzer-find-diff.pl script, I can reproduce one more failure: I'll look into this, and upload separate patch with fix for this. Thanks, Nemanja Lukic -Original Message- From: Siarhei Siamashka [mailto:siarhei.siamas

[Pixman] MIPS DSPr2: Fix for in_n_8 routine.

2013-03-04 Thread Nemanja Lukic
Increasing number of the iterations in blitters-test revealed bug in DSPr2 optimization. Bug is in the in_n_8 routine. Rounding logic was not implemented right. Also, code used unnecessary multiplications, which could be avoided by packing 4 destination (a8) pixel into one 32bit register. There

[Pixman] [PATCH] MIPS: DSPr2: Fix for bug in in_n_8 routine.

2013-03-04 Thread Nemanja Lukic
Rounding logic was not implemented right. Instead of using rounding version of the 8-bit shift, logical shifts were used. Also, code used unnecessary multiplications, which could be avoided by packing 4 destination (a8) pixel into one 32bit register. There were also, unnecessary spills on stack.

[Pixman] Several MIPS fast paths.

2013-03-05 Thread Nemanja Lukic
Added optimizations for two fast paths: - pixbuf - rpixbuf Benchmark results (using tweaked version of the lowlevel-blt-bench which uses same bits for mask and src images) on Malta board (@1Ghz) are included in the log message. Any comments to this patch are welcome.

[Pixman] [PATCH] MIPS: DSPr2: Added two fast paths: - pixbuf - rpixbuf

2013-03-05 Thread Nemanja Lukic
From: Nemanja Lukic nemanja.lu...@rt-rk.com Performance numbers before/after on MIPS-74kc @ 1GHz: lowlevel-blt-bench results Referent (before): pixbuf = L1: 18.18 L2: 16.47 M: 13.36 (107.27%) HT: 10.16 VT: 10.07 R: 9.84 RT: 5.54 ( 35Kops/s) rpixbuf = L1: 14.63 L2

Re: [Pixman] [PATCH] MIPS: DSPr2: Fix for bug in in_n_8 routine.

2013-03-13 Thread Nemanja Lukic
I support increasing number of iterations for blitters-test. This is what I usually leave overnight (make check), and which takes a lot of time for MIPS already with default number of iterations. Nemanja Lukic -Original Message- From: Siarhei Siamashka [mailto:siarhei.siamas

Re: [Pixman] Several MIPS fast paths.

2013-03-13 Thread Nemanja Lukic
I'll push that as separate commit. -Original Message- From: pixman-bounces+nemanja.lukic=rt-rk@lists.freedesktop.org [mailto:pixman-bounces+nemanja.lukic=rt-rk@lists.freedesktop.org] On Behalf Of Søren Sandmann Sent: Thursday, March 07, 2013 12:57 AM To: Nemanja Lukic Cc: pixman

[Pixman] [PATCH 1/9] MIPS: DSPr2: Fix bug in over_n_8888_8888_ca/over_n_8888_0565_ca routines

2013-03-16 Thread Nemanja Lukic
After introducing new PRNG (pseudorandom number generator) a bug in two DSPr2 routines was revealed. Bug manifested by wrong calculation in composite and glyph tests, which caused make check to fail for MIPS DSPr2 optimizations. Bug was in the calculation of the: *dst = over (src, *dst) when ma

[Pixman] [PATCH 2/9] MIPS: DSPr2: Added over_8888_8888 nearest neighbor fast path.

2013-03-16 Thread Nemanja Lukic
Performance numbers before/after on MIPS-74kc @ 1GHz: lowlevel-blt-bench results Referent (before): over__ = L1: 19.47 L2: 16.30 M: 11.24 ( 59.69%) HT: 9.54 VT: 9.29 R: 9.47 RT: 6.24 ( 37Kops/s) Optimized: over__ = L1: 43.67 L2: 33.30 M: 16.32

[Pixman] [PATCH 4/9] MIPS: DSPr2: Added src_0565_8888 nearest neighbor fast path.

2013-03-16 Thread Nemanja Lukic
Performance numbers before/after on MIPS-74kc @ 1GHz: lowlevel-blt-bench results Referent (before): src_0565_ = L1: 20.70 L2: 19.22 M: 12.50 ( 49.79%) HT: 10.45 VT: 10.18 R: 9.99 RT: 5.31 ( 31Kops/s) Optimized: src_0565_ = L1: 62.98 L2: 53.44 M: 23.07

[Pixman] [PATCH 5/9] MIPS: DSPr2: Fix for bug in in_n_8 routine.

2013-03-16 Thread Nemanja Lukic
Rounding logic was not implemented right. Instead of using rounding version of the 8-bit shift, logical shifts were used. Also, code used unnecessary multiplications, which could be avoided by packing 4 destination (a8) pixel into one 32bit register. There were also, unnecessary spills on stack.

[Pixman] [PATCH 6/9] MIPS: DSPr2: Added pixbuf fast path.

2013-03-16 Thread Nemanja Lukic
Performance numbers before/after on MIPS-74kc @ 1GHz: lowlevel-blt-bench results Referent (before): pixbuf = L1: 18.18 L2: 16.47 M: 13.36 (107.27%) HT: 10.16 VT: 10.07 R: 9.84 RT: 5.54 ( 35Kops/s) Optimized: pixbuf = L1: 43.54 L2: 36.02 M: 17.08 (137.09%) HT:

[Pixman] [PATCH 8/9] test: add src_0888_8888_rev and src_0888_0565_rev to lowlevel-blt-bench

2013-03-16 Thread Nemanja Lukic
--- test/lowlevel-blt-bench.c |2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/test/lowlevel-blt-bench.c b/test/lowlevel-blt-bench.c index 4e16f7b..a1657ea 100644 --- a/test/lowlevel-blt-bench.c +++ b/test/lowlevel-blt-bench.c @@ -643,6 +643,8 @@ tests_tbl[] = {

[Pixman] [PATCH 7/9] MIPS: DSPr2: Added rpixbuf fast path.

2013-03-16 Thread Nemanja Lukic
Performance numbers before/after on MIPS-74kc @ 1GHz: lowlevel-blt-bench results Referent (before): rpixbuf = L1: 14.63 L2: 13.55 M: 9.91 ( 79.53%) HT: 8.47 VT: 8.32 R: 8.17 RT: 4.90 ( 33Kops/s) Optimized: rpixbuf = L1: 45.69 L2: 37.30 M: 17.24 (138.31%) HT:

Re: [Pixman] [PATCH 9/9] test: add pixbuf and rpixbuf to lowlevel-blt-bench

2013-03-24 Thread Nemanja Lukic
. bench_composite function can check for pixbuf string in testname, and if that is detected, use same bits for src and mask images. Than, pixbuf testcases will not be only compile time option. Do you think that approach is better? Thanks, Nemanja Lukic -Original Message- From: Søren

[Pixman] [PATCH 2/9] MIPS: DSPr2: Added over_8888_8888 nearest neighbor fast path.

2013-04-15 Thread Nemanja Lukic
Performance numbers before/after on MIPS-74kc @ 1GHz: lowlevel-blt-bench results Referent (before): over__ = L1: 19.47 L2: 16.30 M: 11.24 ( 59.69%) HT: 9.54 VT: 9.29 R: 9.47 RT: 6.24 ( 37Kops/s) Optimized: over__ = L1: 43.67 L2: 33.30 M: 16.32

[Pixman] [PATCH 3/9] MIPS: DSPr2: Added over_8888_0565 nearest neighbor fast path.

2013-04-15 Thread Nemanja Lukic
Performance numbers before/after on MIPS-74kc @ 1GHz: lowlevel-blt-bench results Referent (before): over__0565 = L1: 13.22 L2: 12.02 M: 9.77 ( 38.92%) HT: 8.58 VT: 8.35 R: 8.38 RT: 5.78 ( 35Kops/s) Optimized: over__0565 = L1: 26.20 L2: 22.97 M: 15.92

[Pixman] [PATCH 4/9] MIPS: DSPr2: Added src_0565_8888 nearest neighbor fast path.

2013-04-15 Thread Nemanja Lukic
Performance numbers before/after on MIPS-74kc @ 1GHz: lowlevel-blt-bench results Referent (before): src_0565_ = L1: 20.70 L2: 19.22 M: 12.50 ( 49.79%) HT: 10.45 VT: 10.18 R: 9.99 RT: 5.31 ( 31Kops/s) Optimized: src_0565_ = L1: 62.98 L2: 53.44 M: 23.07

[Pixman] [PATCH 1/9] MIPS: DSPr2: Fix bug in over_n_8888_8888_ca/over_n_8888_0565_ca routines

2013-04-15 Thread Nemanja Lukic
After introducing new PRNG (pseudorandom number generator) a bug in two DSPr2 routines was revealed. Bug manifested by wrong calculation in composite and glyph tests, which caused make check to fail for MIPS DSPr2 optimizations. Bug was in the calculation of the: *dst = over (src, *dst) when ma

[Pixman] [PATCH 5/9] MIPS: DSPr2: Fix for bug in in_n_8 routine.

2013-04-15 Thread Nemanja Lukic
Rounding logic was not implemented right. Instead of using rounding version of the 8-bit shift, logical shifts were used. Also, code used unnecessary multiplications, which could be avoided by packing 4 destination (a8) pixel into one 32bit register. There were also, unnecessary spills on stack.

[Pixman] [PATCH 6/9] test: add src_0888_8888_rev and src_0888_0565_rev to lowlevel-blt-bench

2013-04-15 Thread Nemanja Lukic
--- test/lowlevel-blt-bench.c |2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/test/lowlevel-blt-bench.c b/test/lowlevel-blt-bench.c index 4e16f7b..a1657ea 100644 --- a/test/lowlevel-blt-bench.c +++ b/test/lowlevel-blt-bench.c @@ -643,6 +643,8 @@ tests_tbl[] = {

[Pixman] [PATCH 7/9] test: add pixbuf and rpixbuf to lowlevel-blt-bench

2013-04-15 Thread Nemanja Lukic
Add necessary support to lowlevel-blt benchmark for benchmarking pixbuf and rpixbuf fast paths. bench_composite function now checks for pixbuf string in testname, and if that is detected, use same bits for src and mask images. --- test/lowlevel-blt-bench.c | 11 +-- 1 files changed, 9

[Pixman] [PATCH 8/9] MIPS: DSPr2: Added pixbuf fast path.

2013-04-15 Thread Nemanja Lukic
Performance numbers before/after on MIPS-74kc @ 1GHz: lowlevel-blt-bench results Referent (before): pixbuf = L1: 18.18 L2: 16.47 M: 13.36 (107.27%) HT: 10.16 VT: 10.07 R: 9.84 RT: 5.54 ( 35Kops/s) Optimized: pixbuf = L1: 43.54 L2: 36.02 M: 17.08 (137.09%) HT:

Re: [Pixman] Several MIPS fast paths and bug fixes.

2013-04-26 Thread Nemanja Lukic
If there are no other comments, I'll push this patch set in a day or two. Nemanja Lukic -Original Message- From: Siarhei Siamashka [mailto:siarhei.siamas...@gmail.com] Sent: Thursday, April 18, 2013 12:22 AM To: Nemanja Lukic Cc: pixman@lists.freedesktop.org Subject: Re: [Pixman

[Pixman] [PATCH 02/12] test: add src_0888_0888 to lowlevel-blt-bench

2013-09-08 Thread Nemanja Lukic
--- test/lowlevel-blt-bench.c |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/test/lowlevel-blt-bench.c b/test/lowlevel-blt-bench.c index 1049e21..c84be65 100644 --- a/test/lowlevel-blt-bench.c +++ b/test/lowlevel-blt-bench.c @@ -716,6 +716,7 @@ tests_tbl[] = {

[Pixman] [PATCH 07/12] MIPS: DSPr1: Moving DSPr1 specific code from DSPr2 files to DSPr1 files

2013-09-08 Thread Nemanja Lukic
Some of the optimizations introduced in previous DSPr2 commits, similar to previous patches, were not DSPr2 specific and utilized DSPr1 instructions only. Since Pixman's run-time CPU detection only added DSPr2 fast-paths on 74K MIPS cores, these optimizations couldn't be used on cores that don't

[Pixman] [PATCH 03/12] MIPS: MIPS32r2: empty MIPS32r2 implementation

2013-09-08 Thread Nemanja Lukic
OF + * SUCH DAMAGE. + * + * Author: Nemanja Lukic (nemanja.lu...@rt-rk.com) + */ + +#ifndef PIXMAN_MIPS_COMMON_ASM_H +#define PIXMAN_MIPS_COMMON_ASM_H + +#endif /* PIXMAN_MIPS_COMMON_ASM_H */ diff --git a/pixman/pixman-mips-common.h b/pixman/pixman-mips-common.h new file mode 100644 index 000..fc46ed8

[Pixman] [PATCH 06/12] MIPS: DSPr1: empty DSPr1 implementation

2013-09-08 Thread Nemanja Lukic
; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * Author: Nemanja Lukic

[Pixman] [PATCH 09/12] MIPS: MIPS32r2: Added optimization for fucntion pixman_fill_buff16

2013-09-08 Thread Nemanja Lukic
Performance numbers before/after on MIPS-24kc @ 500 MHz Referent (before): src_n_0565= L1: 117.24 L2: 110.68 M:115.83 ( 96.31%) HT: 78.96 VT: 75.03 R: 65.98 RT: 24.94 ( 164Kops/s) Optimized (with these optimizations): src_n_0565= L1: 429.43 L2: 299.39 M:346.21

[Pixman] [PATCH 10/12] MIPS: disabled non 32-bit platforms

2013-09-08 Thread Nemanja Lukic
This patch add mechanism which allows optimizations to be run only on 32-bit platforms. --- pixman/pixman-mips.c |8 1 files changed, 8 insertions(+), 0 deletions(-) diff --git a/pixman/pixman-mips.c b/pixman/pixman-mips.c index a9f228a..eadf912 100644 --- a/pixman/pixman-mips.c +++

[Pixman] [PATCH 12/12] MIPS: enable prefetch for store only for CPU with 32 byte cache line

2013-09-08 Thread Nemanja Lukic
--- pixman/pixman-mips-common.h| 31 +-- pixman/pixman-mips-dspr1-asm.S | 59 +- pixman/pixman-mips-dspr1.c | 15 -- pixman/pixman-mips-dspr2.c |6 +-- pixman/pixman-mips.c | 31 +++- pixman/pixman-mips32r2-asm.S | 110

[Pixman] [PATCH 09/11] MIPS: MIPS32r2: Added optimization for function pixman_fill_buff16

2013-10-28 Thread Nemanja Lukic
Performance numbers before/after on MIPS-24kc @ 500 MHz Referent (before): src_n_0565= L1: 117.24 L2: 110.68 M:115.83 ( 96.31%) HT: 78.96 VT: 75.03 R: 65.98 RT: 24.94 ( 164Kops/s) Optimized (with these optimizations): src_n_0565= L1: 429.43 L2: 299.39 M:346.21

[Pixman] [PATCH 04/11] MIPS: DSPr2: runtime detection extended

2013-10-28 Thread Nemanja Lukic
--- pixman/pixman-mips.c | 83 ++ 1 files changed, 63 insertions(+), 20 deletions(-) diff --git a/pixman/pixman-mips.c b/pixman/pixman-mips.c index 3048813..93fda99 100644 --- a/pixman/pixman-mips.c +++ b/pixman/pixman-mips.c @@ -24,14 +24,27 @@

[Pixman] [PATCH 05/11] MIPS: MIPS32r2: empty implementation with runtime detection

2013-10-28 Thread Nemanja Lukic
IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * Author: Nemanja Lukic (nemanja.lu...@rt-rk.com) + */ + +#ifndef PIXMAN_MIPS_COMMON_ASM_H +#define PIXMAN_MIPS_COMMON_ASM_H + +#endif /* PIXMAN_MIPS_COMMON_ASM_H */ diff --git a/pixman/pixman-mips-common.h b/pixman/pixman-mips-common.h new

[Pixman] [PATCH 07/11] MIPS: DSPr1: empty implementation with runtime detection

2013-10-28 Thread Nemanja Lukic
OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * Author: Nemanja Lukic (nemanja.lu...@rt-rk.com) + */ + +#include pixman-private.h +#include pixman-mips-dspr1-asm.h diff --git a/pixman/pixman-mips-dspr1-asm.h b/pixman/pixman-mips-dspr1-asm.h new file mode

[Pixman] [PATCH 02/11] MIPS: DSPr2: Removed build restrictions and repair compiler's check

2013-10-28 Thread Nemanja Lukic
--- configure.ac |8 ++-- 1 files changed, 2 insertions(+), 6 deletions(-) diff --git a/configure.ac b/configure.ac index 8a3b622..8764f7b 100644 --- a/configure.ac +++ b/configure.ac @@ -719,25 +719,21 @@ dnl Check if assembler is gas compatible and supports MIPS DSPr2 instructions

[Pixman] [PATCH 01/11] MIPS: update author's e-mail address

2013-10-28 Thread Nemanja Lukic
-asm.S index 866e93e..9dad163 100644 --- a/pixman/pixman-mips-dspr2-asm.S +++ b/pixman/pixman-mips-dspr2-asm.S @@ -26,7 +26,7 @@ * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * - * Author: Nemanja Lukic (nlu...@mips.com) + * Author: Nemanja Lukic

[Pixman] [PATCH 11/11] MIPS: enable prefetch for store only for CPU with 32 byte cache line

2013-10-28 Thread Nemanja Lukic
--- pixman/pixman-mips-common.h| 31 +-- pixman/pixman-mips-dspr1-asm.S | 59 +- pixman/pixman-mips-dspr1.c | 15 -- pixman/pixman-mips-dspr2.c |6 +-- pixman/pixman-mips.c | 34 - pixman/pixman-mips32r2-asm.S | 110

[Pixman] [PATCH 03/11] test: add src_0888_0888 to lowlevel-blt-bench

2013-10-28 Thread Nemanja Lukic
--- test/lowlevel-blt-bench.c |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/test/lowlevel-blt-bench.c b/test/lowlevel-blt-bench.c index 1049e21..c84be65 100644 --- a/test/lowlevel-blt-bench.c +++ b/test/lowlevel-blt-bench.c @@ -716,6 +716,7 @@ tests_tbl[] = {

[Pixman] [PATCH 10/11] MIPS: disabled non 32-bit platforms

2013-10-28 Thread Nemanja Lukic
This patch add mechanism which allows optimizations to be run only on 32-bit platforms. --- pixman/pixman-mips.c |4 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/pixman/pixman-mips.c b/pixman/pixman-mips.c index 221da24..8825621 100644 --- a/pixman/pixman-mips.c +++

[Pixman] [PATCH 08/11] MIPS: DSPr1: Move fast paths implementation from DSPr2 to DSPr1

2013-10-28 Thread Nemanja Lukic
Some of the optimizations introduced in previous DSPr2 commits, similar to previous patch, were not DSPr2 specific and utilized DSPr1 instructions only. Since Pixman's run-time CPU detection only added DSPr2 fast-paths on 74K MIPS cores, these optimizations couldn't be used on cores that don't

Re: [Pixman] mips* asm exports symbols that should not export

2013-12-19 Thread Nemanja Lukic
and sorry for late reply, Nemanja Lukic -Original Message- From: pixman-boun...@lists.freedesktop.org [mailto:pixman-boun...@lists.freedesktop.org] On Behalf Of YunQiang Su Sent: Saturday, December 7, 2013 5:57 PM To: pixman@lists.freedesktop.org Subject: [Pixman] mips* asm exports symbols

Re: [Pixman] mips* asm exports symbols that should not export

2013-12-19 Thread Nemanja Lukic
Hi YunQiang Su, Attached is solution for unwanted symbol visibility. I'll upstream both patches soon. Thanks, Nemanja Lukic -Original Message- From: Nemanja Lukic [mailto:nemanja.lu...@rt-rk.com] Sent: Thursday, December 19, 2013 12:30 PM To: 'Yunqiang Su' Cc: 'pixman

[Pixman] [PATCH 04/11] MIPS: dspr2: runtime detection extended

2014-03-13 Thread Nemanja Lukic
--- pixman/pixman-mips.c | 83 ++ 1 files changed, 63 insertions(+), 20 deletions(-) diff --git a/pixman/pixman-mips.c b/pixman/pixman-mips.c index 3048813..93fda99 100644 --- a/pixman/pixman-mips.c +++ b/pixman/pixman-mips.c @@ -24,14 +24,27 @@

[Pixman] [PATCH 01/11] MIPS: update author's e-mail address

2014-03-13 Thread Nemanja Lukic
-asm.S index 866e93e..9dad163 100644 --- a/pixman/pixman-mips-dspr2-asm.S +++ b/pixman/pixman-mips-dspr2-asm.S @@ -26,7 +26,7 @@ * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * - * Author: Nemanja Lukic (nlu...@mips.com) + * Author: Nemanja Lukic

[Pixman] [PATCH 05/11] MIPS: mips32r2: empty implementation with runtime detection

2014-03-13 Thread Nemanja Lukic
IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * Author: Nemanja Lukic (nemanja.lu...@rt-rk.com) + */ + +#ifndef PIXMAN_MIPS_COMMON_ASM_H +#define PIXMAN_MIPS_COMMON_ASM_H + +#endif /* PIXMAN_MIPS_COMMON_ASM_H */ diff --git a/pixman/pixman-mips-common.h b/pixman/pixman-mips-common.h new

[Pixman] [PATCH 11/11] MIPS: enable prefetch for store only for CPU with 32 byte cache line

2014-03-13 Thread Nemanja Lukic
--- pixman/pixman-mips-common.h| 31 +-- pixman/pixman-mips-dspr1-asm.S | 59 +- pixman/pixman-mips-dspr1.c | 15 -- pixman/pixman-mips-dspr2.c |6 +-- pixman/pixman-mips.c | 34 - pixman/pixman-mips32r2-asm.S | 110

[Pixman] [PATCH 03/11] test: add src_0888_0888 to lowlevel-blt-bench

2014-03-13 Thread Nemanja Lukic
--- test/lowlevel-blt-bench.c |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/test/lowlevel-blt-bench.c b/test/lowlevel-blt-bench.c index 1049e21..c84be65 100644 --- a/test/lowlevel-blt-bench.c +++ b/test/lowlevel-blt-bench.c @@ -716,6 +716,7 @@ tests_tbl[] = {

[Pixman] [PATCH 02/11] MIPS: dspr2: Removed build restrictions and repair compiler's check

2014-03-13 Thread Nemanja Lukic
--- configure.ac |8 ++-- 1 files changed, 2 insertions(+), 6 deletions(-) diff --git a/configure.ac b/configure.ac index 6327972..5229032 100644 --- a/configure.ac +++ b/configure.ac @@ -720,25 +720,21 @@ dnl Check if assembler is gas compatible and supports MIPS DSPr2 instructions

[Pixman] [PATCH 09/11] MIPS: mips32r2: Added optimization for function pixman_fill_buff16

2014-03-13 Thread Nemanja Lukic
Performance numbers before/after on MIPS-24kc @ 500 MHz Referent (before): src_n_0565= L1: 117.24 L2: 110.68 M:115.83 ( 96.31%) HT: 78.96 VT: 75.03 R: 65.98 RT: 24.94 ( 164Kops/s) Optimized (with these optimizations): src_n_0565= L1: 429.43 L2: 299.39 M:346.21

[Pixman] Basic infrastructure for MIPS32r2 and DSPr1 optimizations.

2014-03-13 Thread Nemanja Lukic
Some of the optimizations introduced in previous DSPr2 commits were not DSPr2 specific. Some of the fast-paths didn't used DSPr2 instructions at all, and rather utilized more generic MIPS32r2 instruction set or previous version of DSP instruction set (DSPr1) for optimizations. Since Pixman's

[Pixman] [PATCH 08/11] MIPS: dspr1: Move fast paths implementation from dspr2 to dspr1

2014-03-13 Thread Nemanja Lukic
Some of the optimizations introduced in previous dspr2 commits, similar to previous patch, were not dspr2 specific and utilized dspr1 instructions only. Since Pixman's run-time CPU detection only added dspr2 fast-paths on 74K MIPS cores, these optimizations couldn't be used on cores that don't

[Pixman] [PATCH 07/11] MIPS: dspr1: empty implementation with runtime detection

2014-03-13 Thread Nemanja Lukic
OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * Author: Nemanja Lukic (nemanja.lu...@rt-rk.com) + */ + +#include pixman-private.h +#include pixman-mips-dspr1-asm.h diff --git a/pixman/pixman-mips-dspr1-asm.h b/pixman/pixman-mips-dspr1-asm.h new file mode

Re: [Pixman] mips* asm exports symbols that should not export

2014-04-08 Thread Nemanja Lukic
Unfortunately no. I was planning to push them after that big patch set, I updated few weeks ago. Best Regards, Nemanja Lukic -Original Message- From: Søren Sandmann [mailto:soren.sandm...@gmail.com] Sent: Monday, April 7, 2014 7:45 PM To: Nemanja Lukic Cc: 'Yunqiang Su'; pixman

[Pixman] [PATCH 03/13] MIPS: dspr2: Removed build restrictions and repair compiler's check

2014-06-27 Thread Nemanja Lukic
Build restriction wasn't good since it demands '-mips32r2' in CFLAGS during configuration to enable DSPr2 optimizations. Additional CFLAGS are not needed now and pixman could build targeting the lowest common denominator. Architecture and ISA are set in inline assembler to allow compiler to build

[Pixman] [PATCH 02/13] MIPS: update author's e-mail address

2014-06-27 Thread Nemanja Lukic
-asm.S index 866e93e..9dad163 100644 --- a/pixman/pixman-mips-dspr2-asm.S +++ b/pixman/pixman-mips-dspr2-asm.S @@ -26,7 +26,7 @@ * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * - * Author: Nemanja Lukic (nlu...@mips.com) + * Author: Nemanja Lukic

[Pixman] [PATCH 07/13] MIPS: mips32r2: empty implementation with runtime detection

2014-06-27 Thread Nemanja Lukic
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * Author: Nemanja Lukic (nemanja.lu...@rt-rk.com) + */ + +#ifndef PIXMAN_MIPS_COMMON_ASM_H +#define PIXMAN_MIPS_COMMON_ASM_H + +#endif

[Pixman] [PATCH 06/13] MIPS: dspr2: runtime detection extended

2014-06-27 Thread Nemanja Lukic
isa filed (mips32r2) is available from kernel version 3.9 ASEs implemented field (dsp, dsp2) is available from 3.7 In older kernel versions dsp represents both DSPr1 and DSPr2 if kernel version is 3.7 and above runtime detection tries to find 'dsp2' in /proc/cpuinfo. If it fails or if kernel

[Pixman] [PATCH 12/13] MIPS: disabled non 32-bit platforms

2014-06-27 Thread Nemanja Lukic
There are important differences in the ABI, since saved registers or passed values can take twice as much stack space. This patch add mechanism which allows optimizations to be run only on 32-bit platforms since all optimizations are done in assembly. --- pixman/pixman-mips.c |4 1 files

[Pixman] [PATCH 05/13] Implementing memcpy through pointer

2014-06-27 Thread Nemanja Lukic
pointer to function (memcpy) added to pixman_implementation_t and it points to C version of memcpy (linked in pixman-general.c). Function to call is pixman_memcpy and every call of memcpy is replaced with pixman_memcpy. If there is optimized version of memcpy it should be linked with imp-memcpy.

[Pixman] [PATCH 09/13] MIPS: dspr1: empty implementation with runtime detection

2014-06-27 Thread Nemanja Lukic
IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * Author: Nemanja Lukic (nemanja.lu...@rt-rk.com) + */ + +#include pixman-private.h +#include pixman-mips

Re: [Pixman] pixman-0.32.6 fails to build on mips32r2

2014-09-15 Thread Nemanja Lukic
Hi Vincent, Thanks. I'll push it in the following days. Kind Regards, Nemanja Lukic -Original Message- From: Vicente Olivert Riera [mailto:vincent.ri...@imgtec.com] Sent: Friday, September 12, 2014 4:19 PM To: Nemanja Lukic Cc: pixman@lists.freedesktop.org Subject: Re: pixman-0.32.6