From: Nemanja Lukic nemanja.lu...@rt-rk.com
MIPS DSP instruction set extensions
---
configure.ac | 42
pixman/Makefile.am | 13
pixman/pixman-cpu.c| 66
pixman/pixman-mips-dspr2
.
- In the future, when M14KE, 1074Kc cores (and others) become available we
can add those also to the search string.
-mdspr2 compiler flag is automatically enabled for MIPS platforms. It can be
disabled at configure time for chips that doesn't support it.
Best Regards,
Nemanja Lukic
From: Nemanja Lukic nemanja.lu...@rt-rk.com
MIPS DSP instruction set extensions
---
configure.ac | 53
pixman/Makefile.am | 13 ++
pixman/pixman-cpu.c| 53
pixman
From: Nemanja Lukic nemanja.lu...@rt-rk.com
MIPS DSP instruction set extensions
---
configure.ac | 45 +
pixman/Makefile.am | 13 ++
pixman/pixman-cpu.c| 53
pixman/pixman
From: Nemanja Lukic nemanja.lu...@rt-rk.com
Following fast-path functions are implemented (routines 4, 5 and 6 utilize
same fast-memcpy routine):
1. src_x888_
2. src__0565
3. src_0565_
4. src_0565_0565
5. src__
6. src_0888_0888
Performance numbers
Per code review:
- Main loop in the pixman_fill_buff16_mips routine now uses 4-byte writes
- Added alignment check to ensure that we don't encounter unaligned write
with the sw instruction (pixman_fill_buff16_mips)
- Added lowlevel-blt-bench results (src_n_0565/src_n_) in the log
From: Nemanja Lukic nemanja.lu...@rt-rk.com
Performance numbers before/after on MIPS-74kc @ 1GHz
Referent (before):
lowlevel-blt-bench:
src_n_0565 = L1: 238.14 L2: 233.15 M: 57.88 ( 77.23%) HT:
53.22 VT: 49.99 R: 47.73 RT: 24.79 ( 91Kops/s)
src_n_ = L1
From: Nemanja Lukic nemanja.lu...@rt-rk.com
Performance numbers before/after on MIPS-74kc @ 1GHz
Referent (before):
lowlevel-blt-bench:
over_n___ca = L1: 8.32 L2: 7.65 M: 6.38 ( 51.08%) HT:
5.78 VT: 5.74 R: 5.84 RT: 4.39 ( 37Kops/s)
over_n__0565_ca = L1
From: Nemanja Lukic nemanja.lu...@rt-rk.com
Performance numbers before/after on MIPS-74kc @ 1GHz
Referent (before):
lowlevel-blt-bench:
over_n_8_ = L1: 10.40 L2: 9.79 M: 8.47 ( 33.62%) HT: 7.64
VT: 7.59 R: 7.48 RT: 5.30 ( 40Kops/s)
over_n_8_0565 = L1: 7.40 L2
Benchmark results (lowlevel-blt-bench and cairo-perf-trace) on Malta board
(@1Ghz) remain the same as in original commit.
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman
From: Nemanja Lukic nemanja.lu...@rt-rk.com
---
pixman/pixman-mips-dspr2-asm.S | 60 ++-
1 files changed, 28 insertions(+), 32 deletions(-)
diff --git a/pixman/pixman-mips-dspr2-asm.S b/pixman/pixman-mips-dspr2-asm.S
index ca03605..87558f0 100644
From: Nemanja Lukic nemanja.lu...@rt-rk.com
In main loop (unrolled by factor 2), instead of negating multiplied mask values
by srca, values of srca was negated, and passed as alpha argument for
UN8x4_MUL_UN8x4_ADD_UN8x4 macro.
Instead of:
ma = ~ma;
UN8x4_MUL_UN8x4_ADD_UN8x4 (d, ma, s);
Code
Added optimizations for several bilinear fast paths:
- src__8_
- src__8_0565
- src_0565_8_x888
- src_0565_8_0565
- add__8_
Benchmark results (using tweaked version of the lowlevel-blt-bench which does
bilinear scaling using almost identity matrix) on Malta board (@1Ghz)
Added optimizations for several bilinear fast paths:
- src__
- src__0565
- src_0565_
- src_0565_0565
- over__
- add__
Benchmark results (using tweaked version of the lowlevel-blt-bench which does
bilinear scaling using almost identity matrix) on Malta board
Added optimizations for several bilinear fast paths:
- src__8_
- src__8_0565
- src_0565_8_x888
- src_0565_8_0565
- add__8_
- src__
- src__0565
- src_0565_
- src_0565_0565
- over__
- add__
Benchmark results (using lowlevel-blt-bench) on
From: Nemanja Lukic nemanja.lu...@rt-rk.com
Performance numbers before/after on MIPS-74kc @ 1GHz:
lowlevel-blt-bench -b
Referent (before):
src__8_ = L1: 6.37 L2: 6.08 M: 5.46 ( 32.57%) HT:
4.64 VT: 4.61 R: 4.52 RT: 2.85 ( 23Kops/s)
src__8_0565
From: Nemanja Lukic nemanja.lu...@rt-rk.com
Performance numbers before/after on MIPS-74kc @ 1GHz:
lowlevel-blt-bench -b
Referent (before):
src__ = L1: 8.18 L2: 7.79 M: 6.32 ( 33.51%) HT:
5.78 VT: 5.70 R: 5.61 RT: 3.79 ( 29Kops/s)
src__0565
From: Nemanja Lukic nemanja.lu...@rt-rk.com
Performance numbers before/after on MIPS-74kc @ 1GHz:
lowlevel-blt-bench results
Referent (before):
over__n_ = L1: 9.92 L2: 11.27 M: 8.50 ( 45.23%) HT:
4.70 VT: 4.45 R: 4.49 RT: 1.85 ( 20Kops/s
From: Nemanja Lukic nemanja.lu...@rt-rk.com
Performance numbers before/after on MIPS-74kc @ 1GHz:
lowlevel-blt-bench results
Referent (before):
over__n_0565 = L1: 8.95 L2: 8.33 M: 6.95 ( 27.74%) HT:
4.27 VT: 4.07 R: 4.01 RT: 1.74 ( 19Kops/s
From: Nemanja Lukic nemanja.lu...@rt-rk.com
Performance numbers before/after on MIPS-74kc @ 1GHz:
lowlevel-blt-bench results
Referent (before):
over_0565_n_0565 = L1: 7.56 L2: 7.24 M: 6.16 ( 16.38%) HT:
4.01 VT: 3.84 R: 3.79 RT: 1.66 ( 18Kops/s
From: Nemanja Lukic nemanja.lu...@rt-rk.com
Performance numbers before/after on MIPS-74kc @ 1GHz:
lowlevel-blt-bench results
Referent (before):
over__ = L1: 19.61 L2: 17.10 M: 11.16 ( 59.20%) HT:
16.47 VT: 15.81 R: 14.82 RT: 8.90 ( 50Kops/s
From: Nemanja Lukic nemanja.lu...@rt-rk.com
Performance numbers before/after on MIPS-74kc @ 1GHz:
lowlevel-blt-bench results
Referent (before):
src_n_8_ = L1: 13.79 L2: 22.47 M: 17.55 ( 58.28%) HT: 6.95
VT: 6.46 R: 6.34 RT: 2.07 ( 20Kops/s)
src_n_8_8 = L1
Added optimizations for several:
- SRC fast paths:
- src_n_8_
- src_n_8_8
- OVER fast paths:
- over_n_0565
- over_n_
- OVER nearest neigbor scaling fast paths:
- over__8_0565
- over_0565_8_0565
Benchmark results (lowlevel-blt-bench) on Malta board (@1Ghz)
From: Nemanja Lukic nemanja.lu...@rt-rk.com
Performance numbers before/after on MIPS-74kc @ 1GHz:
lowlevel-blt-bench results
Referent (before):
src_n_8_ = L1: 13.79 L2: 22.47 M: 17.55 ( 58.28%) HT: 6.95
VT: 6.46 R: 6.34 RT: 2.07 ( 20Kops/s)
src_n_8_8 = L1
From: Nemanja Lukic nemanja.lu...@rt-rk.com
Performance numbers before/after on MIPS-74kc @ 1GHz:
lowlevel-blt-bench results
Referent (before):
over_n_0565 = L1: 14.48 L2: 21.36 M: 17.57 ( 23.30%) HT: 6.95
VT: 6.44 R: 6.39 RT: 2.16 ( 22Kops/s)
over_n_ = L1
From: Nemanja Lukic nemanja.lu...@rt-rk.com
Performance numbers before/after on MIPS-74kc @ 1GHz:
lowlevel-blt-bench -n
Referent (before):
over__8_0565 = L1: 9.62 L2: 8.85 M: 7.40 ( 39.27%) HT:
5.67 VT: 5.61 R: 5.45 RT: 2.98 ( 22Kops/s)
over_0565_8_0565
Added optimizations for several out_reverse, over_reverse and in oprations:
- out_reverse_8_0565
- out_reverse_8_
- over_reverse_n_
- in_n_8_8
Benchmark results (lowlevel-blt-bench) on Malta board (@1Ghz) are
included in the log messages.
Any comments to these patches are
From: Nemanja Lukic nemanja.lu...@rt-rk.com
Performance numbers before/after on MIPS-74kc @ 1GHz:
lowlevel-blt-bench results
Referent (before):
out_reverse_8_0565 = L1: 9.15 L2: 13.56 M: 10.65 ( 21.19%) HT:
9.26 VT: 9.14 R: 8.85 RT: 4.88 ( 37Kops/s
From: Nemanja Lukic nemanja.lu...@rt-rk.com
Performance numbers before/after on MIPS-74kc @ 1GHz:
lowlevel-blt-bench results
Referent (before):
over_reverse_n_ = L1: 15.25 L2: 17.41 M: 13.53 ( 35.98%) HT:
6.43 VT: 5.98 R: 5.94 RT: 2.18 ( 22Kops/s
From: Nemanja Lukic nemanja.lu...@rt-rk.com
Performance numbers before/after on MIPS-74kc @ 1GHz:
lowlevel-blt-bench results
Referent (before):
src_0888__rev = L1: 51.88 L2: 42.00 M: 19.04 ( 88.50%) HT:
15.27 VT: 14.62 R: 14.13 RT: 7.12 ( 45Kops/s
After introducing new PRNG (pseudorandom number generator) a bug in two DSPr2
routines was revealed. Bug manifested by wrong calculation in composite and
glyph tests, which caused make check to fail for MIPS DSPr2 optimizations.
___
Pixman mailing list
From: Nemanja Lukic nemanja.lu...@rt-rk.com
After introducing new PRNG (pseudorandom number generator) a bug in two DSPr2
routines was revealed. Bug manifested by wrong calculation in composite and
glyph tests, which caused make check to fail for MIPS DSPr2 optimizations.
Bug
-benchmark?
Did I construct these two testcases correctly?
Thanks,
Nemanja Lukic
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman
Added optimizations for several nearest neigbor scaling fast paths:
- over__
- over__0565
- src_0565_
Benchmark results (lowlevel-blt-bench) on Malta board (@1Ghz) are
included in the log messages.
Any comments to this patch are welcome.
From: Nemanja Lukic nemanja.lu...@rt-rk.com
Performance numbers before/after on MIPS-74kc @ 1GHz:
lowlevel-blt-bench results
Referent (before):
over__ = L1: 19.47 L2: 16.30 M: 11.24 ( 59.69%) HT: 9.54
VT: 9.29 R: 9.47 RT: 6.24 ( 37Kops/s)
over__0565
that this is not the only problem in the MIPS DSPr2
code. Using test/fuzzer-find-diff.pl script, I can reproduce one
more failure:
I'll look into this, and upload separate patch with fix for this.
Thanks,
Nemanja Lukic
-Original Message-
From: Siarhei Siamashka [mailto:siarhei.siamas
Increasing number of the iterations in blitters-test revealed bug in DSPr2
optimization. Bug is in the in_n_8 routine. Rounding logic was not implemented
right. Also, code used unnecessary multiplications, which could be avoided
by packing 4 destination (a8) pixel into one 32bit register. There
Rounding logic was not implemented right.
Instead of using rounding version of the 8-bit shift, logical shifts were used.
Also, code used unnecessary multiplications, which could be avoided by packing
4 destination (a8) pixel into one 32bit register. There were also, unnecessary
spills on stack.
Added optimizations for two fast paths:
- pixbuf
- rpixbuf
Benchmark results (using tweaked version of the lowlevel-blt-bench which uses
same bits for mask and src images) on Malta board (@1Ghz) are included in the
log message.
Any comments to this patch are welcome.
From: Nemanja Lukic nemanja.lu...@rt-rk.com
Performance numbers before/after on MIPS-74kc @ 1GHz:
lowlevel-blt-bench results
Referent (before):
pixbuf = L1: 18.18 L2: 16.47 M: 13.36 (107.27%) HT: 10.16 VT:
10.07 R: 9.84 RT: 5.54 ( 35Kops/s)
rpixbuf = L1: 14.63 L2
I support increasing number of iterations for blitters-test.
This is what I usually leave overnight (make check), and which takes a lot
of time
for MIPS already with default number of iterations.
Nemanja Lukic
-Original Message-
From: Siarhei Siamashka [mailto:siarhei.siamas
I'll push that as separate commit.
-Original Message-
From: pixman-bounces+nemanja.lukic=rt-rk@lists.freedesktop.org
[mailto:pixman-bounces+nemanja.lukic=rt-rk@lists.freedesktop.org] On
Behalf Of Søren Sandmann
Sent: Thursday, March 07, 2013 12:57 AM
To: Nemanja Lukic
Cc: pixman
After introducing new PRNG (pseudorandom number generator) a bug in two DSPr2
routines was revealed. Bug manifested by wrong calculation in composite and
glyph tests, which caused make check to fail for MIPS DSPr2 optimizations.
Bug was in the calculation of the:
*dst = over (src, *dst) when ma
Performance numbers before/after on MIPS-74kc @ 1GHz:
lowlevel-blt-bench results
Referent (before):
over__ = L1: 19.47 L2: 16.30 M: 11.24 ( 59.69%) HT: 9.54
VT: 9.29 R: 9.47 RT: 6.24 ( 37Kops/s)
Optimized:
over__ = L1: 43.67 L2: 33.30 M: 16.32
Performance numbers before/after on MIPS-74kc @ 1GHz:
lowlevel-blt-bench results
Referent (before):
src_0565_ = L1: 20.70 L2: 19.22 M: 12.50 ( 49.79%) HT: 10.45
VT: 10.18 R: 9.99 RT: 5.31 ( 31Kops/s)
Optimized:
src_0565_ = L1: 62.98 L2: 53.44 M: 23.07
Rounding logic was not implemented right.
Instead of using rounding version of the 8-bit shift, logical shifts were used.
Also, code used unnecessary multiplications, which could be avoided by packing
4 destination (a8) pixel into one 32bit register. There were also, unnecessary
spills on stack.
Performance numbers before/after on MIPS-74kc @ 1GHz:
lowlevel-blt-bench results
Referent (before):
pixbuf = L1: 18.18 L2: 16.47 M: 13.36 (107.27%) HT: 10.16 VT:
10.07 R: 9.84 RT: 5.54 ( 35Kops/s)
Optimized:
pixbuf = L1: 43.54 L2: 36.02 M: 17.08 (137.09%) HT:
---
test/lowlevel-blt-bench.c |2 ++
1 files changed, 2 insertions(+), 0 deletions(-)
diff --git a/test/lowlevel-blt-bench.c b/test/lowlevel-blt-bench.c
index 4e16f7b..a1657ea 100644
--- a/test/lowlevel-blt-bench.c
+++ b/test/lowlevel-blt-bench.c
@@ -643,6 +643,8 @@ tests_tbl[] =
{
Performance numbers before/after on MIPS-74kc @ 1GHz:
lowlevel-blt-bench results
Referent (before):
rpixbuf = L1: 14.63 L2: 13.55 M: 9.91 ( 79.53%) HT: 8.47 VT:
8.32 R: 8.17 RT: 4.90 ( 33Kops/s)
Optimized:
rpixbuf = L1: 45.69 L2: 37.30 M: 17.24 (138.31%) HT:
. bench_composite function can check for pixbuf string in
testname,
and if that is detected, use same bits for src and mask images. Than, pixbuf
testcases
will not be only compile time option. Do you think that approach is better?
Thanks,
Nemanja Lukic
-Original Message-
From: Søren
Performance numbers before/after on MIPS-74kc @ 1GHz:
lowlevel-blt-bench results
Referent (before):
over__ = L1: 19.47 L2: 16.30 M: 11.24 ( 59.69%) HT: 9.54
VT: 9.29 R: 9.47 RT: 6.24 ( 37Kops/s)
Optimized:
over__ = L1: 43.67 L2: 33.30 M: 16.32
Performance numbers before/after on MIPS-74kc @ 1GHz:
lowlevel-blt-bench results
Referent (before):
over__0565 = L1: 13.22 L2: 12.02 M: 9.77 ( 38.92%) HT: 8.58
VT: 8.35 R: 8.38 RT: 5.78 ( 35Kops/s)
Optimized:
over__0565 = L1: 26.20 L2: 22.97 M: 15.92
Performance numbers before/after on MIPS-74kc @ 1GHz:
lowlevel-blt-bench results
Referent (before):
src_0565_ = L1: 20.70 L2: 19.22 M: 12.50 ( 49.79%) HT: 10.45
VT: 10.18 R: 9.99 RT: 5.31 ( 31Kops/s)
Optimized:
src_0565_ = L1: 62.98 L2: 53.44 M: 23.07
After introducing new PRNG (pseudorandom number generator) a bug in two DSPr2
routines was revealed. Bug manifested by wrong calculation in composite and
glyph tests, which caused make check to fail for MIPS DSPr2 optimizations.
Bug was in the calculation of the:
*dst = over (src, *dst) when ma
Rounding logic was not implemented right.
Instead of using rounding version of the 8-bit shift, logical shifts were used.
Also, code used unnecessary multiplications, which could be avoided by packing
4 destination (a8) pixel into one 32bit register. There were also, unnecessary
spills on stack.
---
test/lowlevel-blt-bench.c |2 ++
1 files changed, 2 insertions(+), 0 deletions(-)
diff --git a/test/lowlevel-blt-bench.c b/test/lowlevel-blt-bench.c
index 4e16f7b..a1657ea 100644
--- a/test/lowlevel-blt-bench.c
+++ b/test/lowlevel-blt-bench.c
@@ -643,6 +643,8 @@ tests_tbl[] =
{
Add necessary support to lowlevel-blt benchmark for benchmarking pixbuf and
rpixbuf fast paths. bench_composite function now checks for pixbuf string in
testname, and if that is detected, use same bits for src and mask images.
---
test/lowlevel-blt-bench.c | 11 +--
1 files changed, 9
Performance numbers before/after on MIPS-74kc @ 1GHz:
lowlevel-blt-bench results
Referent (before):
pixbuf = L1: 18.18 L2: 16.47 M: 13.36 (107.27%) HT: 10.16 VT:
10.07 R: 9.84 RT: 5.54 ( 35Kops/s)
Optimized:
pixbuf = L1: 43.54 L2: 36.02 M: 17.08 (137.09%) HT:
If there are no other comments, I'll push this patch set in a day or two.
Nemanja Lukic
-Original Message-
From: Siarhei Siamashka [mailto:siarhei.siamas...@gmail.com]
Sent: Thursday, April 18, 2013 12:22 AM
To: Nemanja Lukic
Cc: pixman@lists.freedesktop.org
Subject: Re: [Pixman
---
test/lowlevel-blt-bench.c |1 +
1 files changed, 1 insertions(+), 0 deletions(-)
diff --git a/test/lowlevel-blt-bench.c b/test/lowlevel-blt-bench.c
index 1049e21..c84be65 100644
--- a/test/lowlevel-blt-bench.c
+++ b/test/lowlevel-blt-bench.c
@@ -716,6 +716,7 @@ tests_tbl[] =
{
Some of the optimizations introduced in previous DSPr2 commits, similar to
previous patches, were not DSPr2 specific and utilized DSPr1 instructions only.
Since Pixman's run-time CPU detection only added DSPr2 fast-paths on 74K MIPS
cores, these optimizations couldn't be used on cores that don't
OF
+ * SUCH DAMAGE.
+ *
+ * Author: Nemanja Lukic (nemanja.lu...@rt-rk.com)
+ */
+
+#ifndef PIXMAN_MIPS_COMMON_ASM_H
+#define PIXMAN_MIPS_COMMON_ASM_H
+
+#endif /* PIXMAN_MIPS_COMMON_ASM_H */
diff --git a/pixman/pixman-mips-common.h b/pixman/pixman-mips-common.h
new file mode 100644
index 000..fc46ed8
; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+ * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
+ * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+ * SUCH DAMAGE.
+ *
+ * Author: Nemanja Lukic
Performance numbers before/after on MIPS-24kc @ 500 MHz
Referent (before):
src_n_0565= L1: 117.24 L2: 110.68 M:115.83 ( 96.31%) HT: 78.96 VT:
75.03 R: 65.98 RT: 24.94 ( 164Kops/s)
Optimized (with these optimizations):
src_n_0565= L1: 429.43 L2: 299.39 M:346.21
This patch add mechanism which allows optimizations to be run only
on 32-bit platforms.
---
pixman/pixman-mips.c |8
1 files changed, 8 insertions(+), 0 deletions(-)
diff --git a/pixman/pixman-mips.c b/pixman/pixman-mips.c
index a9f228a..eadf912 100644
--- a/pixman/pixman-mips.c
+++
---
pixman/pixman-mips-common.h| 31 +--
pixman/pixman-mips-dspr1-asm.S | 59 +-
pixman/pixman-mips-dspr1.c | 15 --
pixman/pixman-mips-dspr2.c |6 +--
pixman/pixman-mips.c | 31 +++-
pixman/pixman-mips32r2-asm.S | 110
Performance numbers before/after on MIPS-24kc @ 500 MHz
Referent (before):
src_n_0565= L1: 117.24 L2: 110.68 M:115.83 ( 96.31%) HT: 78.96 VT:
75.03 R: 65.98 RT: 24.94 ( 164Kops/s)
Optimized (with these optimizations):
src_n_0565= L1: 429.43 L2: 299.39 M:346.21
---
pixman/pixman-mips.c | 83 ++
1 files changed, 63 insertions(+), 20 deletions(-)
diff --git a/pixman/pixman-mips.c b/pixman/pixman-mips.c
index 3048813..93fda99 100644
--- a/pixman/pixman-mips.c
+++ b/pixman/pixman-mips.c
@@ -24,14 +24,27 @@
IF ADVISED OF THE POSSIBILITY OF
+ * SUCH DAMAGE.
+ *
+ * Author: Nemanja Lukic (nemanja.lu...@rt-rk.com)
+ */
+
+#ifndef PIXMAN_MIPS_COMMON_ASM_H
+#define PIXMAN_MIPS_COMMON_ASM_H
+
+#endif /* PIXMAN_MIPS_COMMON_ASM_H */
diff --git a/pixman/pixman-mips-common.h b/pixman/pixman-mips-common.h
new
OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+ * SUCH DAMAGE.
+ *
+ * Author: Nemanja Lukic (nemanja.lu...@rt-rk.com)
+ */
+
+#include pixman-private.h
+#include pixman-mips-dspr1-asm.h
diff --git a/pixman/pixman-mips-dspr1-asm.h b/pixman/pixman-mips-dspr1-asm.h
new file mode
---
configure.ac |8 ++--
1 files changed, 2 insertions(+), 6 deletions(-)
diff --git a/configure.ac b/configure.ac
index 8a3b622..8764f7b 100644
--- a/configure.ac
+++ b/configure.ac
@@ -719,25 +719,21 @@ dnl Check if assembler is gas compatible and supports
MIPS DSPr2 instructions
-asm.S
index 866e93e..9dad163 100644
--- a/pixman/pixman-mips-dspr2-asm.S
+++ b/pixman/pixman-mips-dspr2-asm.S
@@ -26,7 +26,7 @@
* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
*
- * Author: Nemanja Lukic (nlu...@mips.com)
+ * Author: Nemanja Lukic
---
pixman/pixman-mips-common.h| 31 +--
pixman/pixman-mips-dspr1-asm.S | 59 +-
pixman/pixman-mips-dspr1.c | 15 --
pixman/pixman-mips-dspr2.c |6 +--
pixman/pixman-mips.c | 34 -
pixman/pixman-mips32r2-asm.S | 110
---
test/lowlevel-blt-bench.c |1 +
1 files changed, 1 insertions(+), 0 deletions(-)
diff --git a/test/lowlevel-blt-bench.c b/test/lowlevel-blt-bench.c
index 1049e21..c84be65 100644
--- a/test/lowlevel-blt-bench.c
+++ b/test/lowlevel-blt-bench.c
@@ -716,6 +716,7 @@ tests_tbl[] =
{
This patch add mechanism which allows optimizations to be run only
on 32-bit platforms.
---
pixman/pixman-mips.c |4
1 files changed, 4 insertions(+), 0 deletions(-)
diff --git a/pixman/pixman-mips.c b/pixman/pixman-mips.c
index 221da24..8825621 100644
--- a/pixman/pixman-mips.c
+++
Some of the optimizations introduced in previous DSPr2 commits, similar to
previous patch, were not DSPr2 specific and utilized DSPr1 instructions only.
Since Pixman's run-time CPU detection only added DSPr2 fast-paths on 74K MIPS
cores, these optimizations couldn't be used on cores that don't
and sorry for late reply,
Nemanja Lukic
-Original Message-
From: pixman-boun...@lists.freedesktop.org
[mailto:pixman-boun...@lists.freedesktop.org] On Behalf Of YunQiang Su
Sent: Saturday, December 7, 2013 5:57 PM
To: pixman@lists.freedesktop.org
Subject: [Pixman] mips* asm exports symbols
Hi YunQiang Su,
Attached is solution for unwanted symbol visibility.
I'll upstream both patches soon.
Thanks,
Nemanja Lukic
-Original Message-
From: Nemanja Lukic [mailto:nemanja.lu...@rt-rk.com]
Sent: Thursday, December 19, 2013 12:30 PM
To: 'Yunqiang Su'
Cc: 'pixman
---
pixman/pixman-mips.c | 83 ++
1 files changed, 63 insertions(+), 20 deletions(-)
diff --git a/pixman/pixman-mips.c b/pixman/pixman-mips.c
index 3048813..93fda99 100644
--- a/pixman/pixman-mips.c
+++ b/pixman/pixman-mips.c
@@ -24,14 +24,27 @@
-asm.S
index 866e93e..9dad163 100644
--- a/pixman/pixman-mips-dspr2-asm.S
+++ b/pixman/pixman-mips-dspr2-asm.S
@@ -26,7 +26,7 @@
* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
*
- * Author: Nemanja Lukic (nlu...@mips.com)
+ * Author: Nemanja Lukic
IF ADVISED OF THE POSSIBILITY OF
+ * SUCH DAMAGE.
+ *
+ * Author: Nemanja Lukic (nemanja.lu...@rt-rk.com)
+ */
+
+#ifndef PIXMAN_MIPS_COMMON_ASM_H
+#define PIXMAN_MIPS_COMMON_ASM_H
+
+#endif /* PIXMAN_MIPS_COMMON_ASM_H */
diff --git a/pixman/pixman-mips-common.h b/pixman/pixman-mips-common.h
new
---
pixman/pixman-mips-common.h| 31 +--
pixman/pixman-mips-dspr1-asm.S | 59 +-
pixman/pixman-mips-dspr1.c | 15 --
pixman/pixman-mips-dspr2.c |6 +--
pixman/pixman-mips.c | 34 -
pixman/pixman-mips32r2-asm.S | 110
---
test/lowlevel-blt-bench.c |1 +
1 files changed, 1 insertions(+), 0 deletions(-)
diff --git a/test/lowlevel-blt-bench.c b/test/lowlevel-blt-bench.c
index 1049e21..c84be65 100644
--- a/test/lowlevel-blt-bench.c
+++ b/test/lowlevel-blt-bench.c
@@ -716,6 +716,7 @@ tests_tbl[] =
{
---
configure.ac |8 ++--
1 files changed, 2 insertions(+), 6 deletions(-)
diff --git a/configure.ac b/configure.ac
index 6327972..5229032 100644
--- a/configure.ac
+++ b/configure.ac
@@ -720,25 +720,21 @@ dnl Check if assembler is gas compatible and supports
MIPS DSPr2 instructions
Performance numbers before/after on MIPS-24kc @ 500 MHz
Referent (before):
src_n_0565= L1: 117.24 L2: 110.68 M:115.83 ( 96.31%) HT: 78.96 VT:
75.03 R: 65.98 RT: 24.94 ( 164Kops/s)
Optimized (with these optimizations):
src_n_0565= L1: 429.43 L2: 299.39 M:346.21
Some of the optimizations introduced in previous DSPr2 commits were not DSPr2
specific. Some of the fast-paths didn't used DSPr2 instructions at all, and
rather utilized more generic MIPS32r2 instruction set or previous version of
DSP instruction set (DSPr1) for optimizations.
Since Pixman's
Some of the optimizations introduced in previous dspr2 commits, similar to
previous patch, were not dspr2 specific and utilized dspr1 instructions only.
Since Pixman's run-time CPU detection only added dspr2 fast-paths on 74K MIPS
cores, these optimizations couldn't be used on cores that don't
OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+ * SUCH DAMAGE.
+ *
+ * Author: Nemanja Lukic (nemanja.lu...@rt-rk.com)
+ */
+
+#include pixman-private.h
+#include pixman-mips-dspr1-asm.h
diff --git a/pixman/pixman-mips-dspr1-asm.h b/pixman/pixman-mips-dspr1-asm.h
new file mode
Unfortunately no.
I was planning to push them after that big patch set, I updated few weeks ago.
Best Regards,
Nemanja Lukic
-Original Message-
From: Søren Sandmann [mailto:soren.sandm...@gmail.com]
Sent: Monday, April 7, 2014 7:45 PM
To: Nemanja Lukic
Cc: 'Yunqiang Su'; pixman
Build restriction wasn't good since it demands '-mips32r2'
in CFLAGS during configuration to enable DSPr2 optimizations.
Additional CFLAGS are not needed now and pixman could build
targeting the lowest common denominator.
Architecture and ISA are set in inline assembler
to allow compiler to build
-asm.S
index 866e93e..9dad163 100644
--- a/pixman/pixman-mips-dspr2-asm.S
+++ b/pixman/pixman-mips-dspr2-asm.S
@@ -26,7 +26,7 @@
* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
*
- * Author: Nemanja Lukic (nlu...@mips.com)
+ * Author: Nemanja Lukic
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
+ * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+ * SUCH DAMAGE.
+ *
+ * Author: Nemanja Lukic (nemanja.lu...@rt-rk.com)
+ */
+
+#ifndef PIXMAN_MIPS_COMMON_ASM_H
+#define PIXMAN_MIPS_COMMON_ASM_H
+
+#endif
isa filed (mips32r2) is available from kernel version 3.9
ASEs implemented field (dsp, dsp2) is available from 3.7
In older kernel versions dsp represents both DSPr1 and DSPr2
if kernel version is 3.7 and above runtime detection tries
to find 'dsp2' in /proc/cpuinfo. If it fails or if kernel
There are important differences in the ABI, since saved
registers or passed values can take twice as much stack space.
This patch add mechanism which allows optimizations to be run only
on 32-bit platforms since all optimizations are done in assembly.
---
pixman/pixman-mips.c |4
1 files
pointer to function (memcpy) added to pixman_implementation_t
and it points to C version of memcpy (linked in
pixman-general.c). Function to call is pixman_memcpy and
every call of memcpy is replaced with pixman_memcpy.
If there is optimized version of memcpy it should
be linked with imp-memcpy.
IN CONTRACT, STRICT
+ * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
+ * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+ * SUCH DAMAGE.
+ *
+ * Author: Nemanja Lukic (nemanja.lu...@rt-rk.com)
+ */
+
+#include pixman-private.h
+#include pixman-mips
Hi Vincent,
Thanks. I'll push it in the following days.
Kind Regards,
Nemanja Lukic
-Original Message-
From: Vicente Olivert Riera [mailto:vincent.ri...@imgtec.com]
Sent: Friday, September 12, 2014 4:19 PM
To: Nemanja Lukic
Cc: pixman@lists.freedesktop.org
Subject: Re: pixman-0.32.6
97 matches
Mail list logo