Re: [Pixman] More MIPS OVER fast paths (over_8888_n_8888, over_8888_n_0565, over_0565_n_0565, over_8888_8_8888, over_8888_8_0565, over_0565_8_0565, over_8888_8888 and over_8888_8888_8888) including OV

2012-10-01 Thread Lukic, Nemanja
Hi Soren, Siarhei

Here are results measured for this OVER combiner on couple of OVER fast-paths:
Before adding OVER combiner:
over__   =  L1:  95.65  L2:  70.26  M: 13.95 ( 74.24%)  HT: 16.56  VT: 
15.96  R: 14.90  RT:  9.05 (  53Kops/s)
over__8_ =  L1:  13.62  L2:  11.22  M:  7.57 ( 80.53%)  HT:  6.24  VT:  
6.19  R:  6.13  RT:  3.93 (  30Kops/s)
over__8_0565 =  L1:   7.37  L2:   8.30  M:  6.24 ( 58.08%)  HT:  5.46  VT:  
5.38  R:  5.26  RT:  3.35 (  27Kops/s)
over_0565_8_ =  L1:  10.56  L2:   9.32  M:  7.13 ( 66.42%)  HT:  5.83  VT:  
5.79  R:  5.74  RT:  3.60 (  28Kops/s)
over_0565_8_0565 =  L1:   7.82  L2:   7.20  M:  6.09 ( 48.62%)  HT:  5.11  VT:  
5.07  R:  4.93  RT:  3.13 (  26Kops/s)

After:
over__   =  L1: 163.64  L2:  83.68  M: 17.67 ( 94.15%)  HT: 17.09  VT: 
16.60  R: 15.31  RT:  9.60 (  55Kops/s)
over__8_ =  L1:  25.98  L2:  22.50  M: 11.54 (122.95%)  HT:  9.94  VT:  
9.63  R:  9.20  RT:  5.80 (  38Kops/s)
over__8_0565 =  L1:  14.00  L2:  12.45  M:  8.77 ( 81.79%)  HT:  6.99  VT:  
6.89  R:  6.72  RT:  3.95 (  30Kops/s)
over_0565_8_ =  L1:  16.75  L2:  14.82  M: 10.06 ( 93.83%)  HT:  7.98  VT:  
7.79  R:  7.48  RT:  4.22 (  31Kops/s)
over_0565_8_0565 =  L1:  10.76  L2:   9.69  M:  7.86 ( 62.79%)  HT:  6.18  VT:  
6.11  R:  5.97  RT:  3.48 (  28Kops/s)

Thanks,
Nemanja Lukic

-Original Message-
From: Søren Sandmann [mailto:sandm...@cs.au.dk] 
Sent: Tuesday, September 25, 2012 6:23 AM
To: Lukic, Nemanja
Cc: pixman@lists.freedesktop.org
Subject: Re: [Pixman] More MIPS OVER fast paths (over__n_, 
over__n_0565, over_0565_n_0565, over__8_, over__8_0565, 
over_0565_8_0565, over__ and over___) including OVER 
combiner.

Nemanja Lukic nlu...@mips.com writes:

 Added optimizations for several OVER fast paths:
  - over__n_
  - over__n_0565
  - over_0565_n_0565
  - over__8_
  - over__8_0565
  - over_0565_8_0565
  - over__
  - over___
 Including OVER combiner.
 Per previous code review:
  - Previously pushed single big commit is now divided into 4 smaller pieces.

Thanks for the patches. I have pushed them to master with a few
formatting fixes.

However, you should get a freedesktop account so that you can push
patches yourself, or at least, if you want me to merge them, provide a
public git repository that can be pulled from.

  - Added OVER combiner.

Did you do any measurements of this one? As Siarhei said:

As for the performance numbers. I wonder how much faster would these
new specialized MIPS fast paths be if we had a DSPr2 optimized OVER
combiner? You can check sse2_combine_over_u and
neon_combine_over_u functions as examples of existing combiners.


Søren
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] Questionable numbers from lowlevel-blt-bench

2012-10-01 Thread Matt Turner
On Mon, Oct 1, 2012 at 1:17 AM, Jonathan Morton
jonathan.mor...@movial.com wrote:
 On Sun, 30 Sep 2012 15:05:18 -0700, Matt Turner matts...@gmail.com
 wrote:
 In doing performance work, I've noticed some weird results from
 lowlevel-blt-bench. Often it has seemed that the RT results determined
 the Kops/s almost entirely. I came across an instance of this today
 which was particularly striking:

 Before:
 add__ =  L1:  47.01  L2:  36.84  M: 18.96 ( 33.14%)  HT: 35.94
  VT: 33.82  R: 30.64  RT: 19.36 ( 181Kops/s)

 After:
 add__ =  L1: 230.78  L2: 200.86  M: 90.48 (159.44%)  HT: 48.41
  VT: 45.46  R: 42.78  RT: 19.22 ( 181Kops/s)

 L1/L2/M numbers are improved by ~5x. HT, VT, and R numbers are
 improved by ~1.35x. RT doesn't change... neither does Kops/s.

 What's going on here, and can we make the composite result more sensible?

 The figures in brackets are derived directly from one or more of the
 other figures.  In this case, the Kops/s number is derived directly
 from the RT number, which should explain why they correlate.

Ahh. At least I (and I'm pretty sure others too) thought that the Kops
number was supposed to be a composite of HT, VT, RT, and R. This
explains it then.

 The percentage figure, meanwhile, represents a percentage of memory
 bandwidth used by this blitter (under the M test), the peak bandwidth
 being derived from an earlier measurement.  (You're seeing more than
 100%, which suggests that the earlier measurement is not optimal.)

Indeed. I'm prefetching in the modified function.

 The RT figure is meant to measure, as directly as possible, the per-call
 overhead which does not depend on the number of pixels involved.
 Accordingly, it is not expected to change significantly when doing
 pixel-related optimisations.

Right, makes sense.

Thanks!
Matt
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman