I have a similar performance result got about one month ago.
Gvim test has great performance increase, about 360%. Other tests has no side
effect and no increase neither.
I tested it on 32-bit userland. I will test it again based on the git code and
give out the data later.
Regards,
Xinyun
On
On Monday 29 November 2010 20:59:52 Siarhei Siamashka wrote:
> On Wednesday 17 November 2010 07:47:39 Xu, Samuel wrote:
> > For MOVD, we simplified the backward copy code since pervious code is too
> > long and not gain obvious performance,
>
> And this is what I'm worried about. First you propose
On Wednesday 17 November 2010 07:47:39 Xu, Samuel wrote:
> Hi, Soeren Sandmann and Siarhei Siamashka:
> Glad to send out this refreshed patch to address points we discussed
> for this SSSE3 patch.
Thanks. And sorry for a rather late reply.
> In this new patch, we merged 32 bit and 64 bit a
"Xu, Samuel" writes:
> Appreciate comments on this patch
I'll be away for the next three weeks, so I won't be able to review
anything until then.
Soren
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinf
, September 09, 2010 12:56 PM
To: 'sandm...@daimi.au.dk'; Siarhei Siamashka; Ma, Ling; Liu, Xinyun
Subject: RE: [Pixman] [ssse3]Optimization for fetch_scanline_x8r8g8b8
Hi, Soeren Sandmann and Siarhei Siamashka:
It is really a long discussion and thanks for your patience! I believe all of
u
, September 09, 2010 12:56 PM
To: 'sandm...@daimi.au.dk'; Siarhei Siamashka; Ma, Ling; Liu, Xinyun
Subject: RE: [Pixman] [ssse3]Optimization for fetch_scanline_x8r8g8b8
Hi, Soeren Sandmann and Siarhei Siamashka:
It is really a long discussion and thanks for your patience! I believe all of
us has s
..@daimi.au.dk'; Xu, Samuel
> Cc: Siarhei Siamashka; pixman@lists.freedesktop.org; Liu, Xinyun
> Subject: RE: [Pixman] [ssse3]Optimization for fetch_scanline_x8r8g8b8
>
> Hi Soren
> > I am also interested in the answer to the question of whether this
> > code was generat
Hi Soren
> I am also interested in the answer to the question of whether this
> code was generated by passing intrinsics through the Intel compiler.
I'm original author for this code, the code is based on our original memcpy
function which has been pushed into glibc.
Thanks
Ling
"Xu, Samuel" writes:
> Hi, Soeren Sandmann and Siarhei Siamashka:
>
> As a wrap of current discussion, combining you two's comments, can we assume
> this new patch of SSSE3 is ok?
> New patch might contains:
> 1. Fix 64 bit CPU detection issue for MMX and SSE2
> 2. Add more comments for git com
this case, how to determine Sun studio?
Thanks!
Samuel
-Original Message-
From: Siarhei Siamashka [mailto:siarhei.siamas...@gmail.com]
Sent: Wednesday, September 08, 2010 3:39 AM
To: Ma, Ling
Cc: Xu, Samuel; Soeren Sandmann; pixman@lists.freedesktop.org; Liu, Xinyun
Subje
On Friday 03 September 2010 01:39:54 Soeren Sandmann wrote:
> Siarhei Siamashka writes:
> > Apparently software prefetch also disables or interferes with the hardware
> > prefetcher on Intel Atom, hurting performance a lot. More advanced
> > processors can cope with it.
> >
> > But increased pref
On Tuesday 07 September 2010 14:03:52 Ma, Ling wrote:
> > > > Wouldn't just the use of MOVD/MOVSS instructions here also solve
> > > > this problem? Store forwarding does not seem to be used for SIMD
> > > > according to the manual. I haven't benchmarked anything yet
> > > > though
> Your code still can be simplified a lot. I'm just not quite sure whether it
> would
> be more practical to commit something first and then refactor it with the
> follow
> up commits. Or attempt to make a "perfect" patch before committing.
[Ma Ling] Yes, I agree with you, let us commit it first,
Hi Siarhei Siamashka
> Could you elaborate? Preferably with a reference to the relevant section of
> the
> optimization manual. Because it looks like exactly the store forwarding
> address
> aliasing issue to me.
[Ma Ling]:Currently it is not described in our optimization manual, soon it
will b
On Friday 03 September 2010 11:53:47 Xu, Samuel wrote:
> >* Siarhei asked whether it would be possible to unify the 32 and 64
> >
> > bit assembly sources. I don't think you commented on that.
>
> I think it is very difficult to unify 32 and 64 bit assemble src.
It's not so difficult if you chan
On Friday 03 September 2010 11:53:47 Xu, Samuel wrote:
> >* Store forwarding
> >
> > - We need some comments in the assembly about the store forwarding
> >
> >that Ma Ling described.
>
> How about this comments added to asm code:
> "CPU doesn't check each address bit for src and dest, so re
Thanks Soren!
Following is my comment.
Appreciate your explicitly code guide on Sun Studio check, I never has
experience on Sun Studio.
After confirming of Sun Studio related, we can send next patch to address rest
issues.
Samuel
> * The 64 bit CPU detection is broken. It doesn't use SSE2 and
Siarhei Siamashka writes:
> Looks like this data has been posted already:
> http://lists.freedesktop.org/archives/pixman/2010-June/000231.html
>
> Checking a few more things with microbenchmarks shows that the prefetch
> distance of just 64 bytes ahead is way too small.
We should probably get
"Xu, Samuel" writes:
> Hi, Siarhei Siamashka:
> Attached patch has updated copyright part (only copyright change). we
> referred http://cgit.freedesktop.org/pixman/tree/COPYING.
> Yes, As you assumed, we tested on multiple 32/64 bit boxes w/o
> and w/ SSSE3.
Here are some more comm
On Monday 30 August 2010 23:31:35 Siarhei Siamashka wrote:
> And Intel Atom does not like software prefetch very much. This reminds me an
> older report:
> http://lists.freedesktop.org/archives/pixman/2010-June/000218.html
>
> I can try to run a full set of cairo-perf-trace benchmarks to get more
On Monday 30 August 2010 16:06:45 Siarhei Siamashka wrote:
> Also I tried to run my simple microbenchmarking program on Intel Atom N450
> netbook, x86_64 system. The results are the following:
>
> --
> All results are presented in millions of pixels per second
> L1 - small Xx1 rectangle (fitting
On Monday 30 August 2010 12:26:27 Xu, Samuel wrote:
> Hi, Siarhei Siamashka:
> Sorry for a typo, fixed version is attached. Pls ignore pervious mail.
>
> New patch, which contains
> 1)Simplified 64 bit detect_cpu_features(), only check SSSE3 bit.
> _MSC_VER
> path switched to __
On Sunday 29 August 2010 12:25:47 Xu, Samuel wrote:
> Hi, Siarhei Siamashka:
> Q:---" What problems do you have without "merge" mechanism?"
> A: Of course there isn't correctness issue w/o "merge".
> Currently, sse2_fast_paths/mmx_fast_paths/c_fast_paths...are excluded each
> other, although some c
.siamas...@gmail.com]
Sent: Friday, August 27, 2010 10:57 PM
To: Xu, Samuel
Cc: pixman@lists.freedesktop.org; Ma, Ling; Liu, Xinyun
Subject: Re: [Pixman] [ssse3]Optimization for fetch_scanline_x8r8g8b8
On Friday 27 August 2010 15:00:49 Xu, Samuel wrote:
> Hi, Siarhei Siamashka:
> Than
On Friday 27 August 2010 15:00:49 Xu, Samuel wrote:
> Hi, Siarhei Siamashka:
> Thanks for quick response!
> For 64 bit detect_cpu_features(), if ignore HAVE_GETISAX and _MSC_VER,
> it is ok for us to simplify it as your example in next update.
If you can ensure MSVC compatibility
st 27, 2010 2:15 PM
To: Xu, Samuel
Cc: pixman@lists.freedesktop.org
Subject: Re: [Pixman] [ssse3]Optimization for fetch_scanline_x8r8g8b8
Hi, Xu.
> --- /dev/null
> +++ b/pixman/pixman-access-ssse3_x86-64.S
..
> +#if (defined(__amd64__) || defined(__x86_64__) ||defined(_M_AMD64))
u, Samuel
Cc: pixman@lists.freedesktop.org; Ma, Ling; Liu, Xinyun
Subject: Re: [Pixman] [ssse3]Optimization for fetch_scanline_x8r8g8b8
On Friday 27 August 2010 05:59:00 Xu, Samuel wrote:
> Hi Siarhei Siamashka,
>
> Here is a new patch, can you review it? Thank you!
> It address follow
On Friday 27 August 2010 05:59:00 Xu, Samuel wrote:
> Hi Siarhei Siamashka,
>
> Here is a new patch, can you review it? Thank you!
> It address following suggestions:
> 1: SSSE3 file is split to a new file.
Thanks.
> Comparing with to duplicate every
> content from SSE2 file, I added a way to me
ge-
From: Siarhei Siamashka [mailto:siarhei.siamas...@gmail.com]
Sent: Sunday, August 22, 2010 1:49 AM
To: Liu, Xinyun
Cc: pixman@lists.freedesktop.org; Ma, Ling; Xu, Samuel
Subject: Re: [Pixman] [ssse3]Optimization for fetch_scanline_x8r8g8b8
On Friday 20 August 2010 18:39:47 Liu, Xinyun wrote:
Hi Siar
On Friday 20 August 2010 18:39:47 Liu, Xinyun wrote:
> Hi Siarhei Siamashka,
>
> Here is a new patch, can you review it? Thank you!
Sure, thanks for the updated patch. Some comments follow.
> From 9783651899a2763d7fcca2960fc354bd1f541980 Mon Sep 17 00:00:00 2001
> From: root
A minor nitpick he
On Friday 20 August 2010 19:36:07 Xu, Samuel wrote:
> We measured performance, and compared with original SSE2 intrinsic enabled
> version(0.19.4), on ATOM, and get following findings using 480P flash
> H.264 video playing workload:
> 1) sse2_composite_src_x888_()'s cycle reduced 67%. This func
: 64 bit system, without SSSE3
Thanks!
Samuel
-Original Message-
From: Liu, Xinyun [mailto:xinyun...@gmail.com]
Sent: Friday, August 20, 2010 11:40 PM
To: Siarhei Siamashka; pixman@lists.freedesktop.org
Cc: Ma, Ling; Xu, Samuel
Subject: Re: [Pixman] [ssse3]Optimization for fetch_scanline_
Hi Siarhei Siamashka,
Here is a new patch, can you review it? Thank you!
With this patch, opfile said that the performance is increased
dramatically for Atom.
Samuel and Ling will provide detailed data.
Regards,
Liu, Xinyun
0001-Add-ssse3_composite_src_x888_.patch
Description: Binary data
On Tuesday 17 August 2010 10:52:43 Xu, Samuel wrote:
> We'd like to provide a new patch with following enhancement soon:
> 1) Add 64 bit asm code specifically for 64 bit, which will co-exist with 32
> bit version
> 2) CPUID dynamic check in pixman-cpu.c and pixman-access.c
> 3) Makefile fixing
> 4
;make check"?
Samuel
-Original Message-
From: Ma, Ling
Sent: Tuesday, August 17, 2010 3:22 PM
To: Siarhei Siamashka; Xu, Samuel
Cc: pixman@lists.freedesktop.org; Liu, Xinyun
Subject: RE: [Pixman] [ssse3]Optimization for fetch_scanline_x8r8g8b8
Hi Siarhei Siamashka
> I did n
On Monday 16 August 2010 11:24:44 Xu, Samuel wrote:
> Thanks for kindly comments! It is very nice that bug#20709 also
emphasize
> similar performance issue.
Well, having bugs unresolved for such a long time is not so nice. Also see some
more comments below.
> So, let's discuss how to make
in current pixman-arm-neon-asm.S inside
pixman code.
Thanks!
Samuel
-Original Message-
From: Siarhei Siamashka [mailto:siarhei.siamas...@gmail.com]
Sent: Saturday, August 14, 2010 1:54 AM
To: pixman@lists.freedesktop.org
Cc: Liu, Xinyun; Ma, Ling; Xu, Samuel
Subject: Re: [Pixman] [ssse3]Opti
On Wednesday 11 August 2010 09:00:54 Liu Xinyun wrote:
> Hi,
>
> piman-access.c: fetch_scanline_x8r8g8b8() is mainly memcpy and 'or"
> operations. With ssse3_memcpy, the performance is increased a little.
>
> Reference: http://bugs.meego.com/show_bug.cgi?id=5012
>
> Quote:
> > After optimization
38 matches
Mail list logo