Re: [Pixman] [ssse3]Optimization for fetch_scanline_x8r8g8b8

2010-12-02 Thread Liu Xinyun
I have a similar performance result got about one month ago. Gvim test has great performance increase, about 360%. Other tests has no side effect and no increase neither. I tested it on 32-bit userland. I will test it again based on the git code and give out the data later. Regards, Xinyun On

Re: [Pixman] [ssse3]Optimization for fetch_scanline_x8r8g8b8

2010-12-02 Thread Siarhei Siamashka
On Monday 29 November 2010 20:59:52 Siarhei Siamashka wrote: > On Wednesday 17 November 2010 07:47:39 Xu, Samuel wrote: > > For MOVD, we simplified the backward copy code since pervious code is too > > long and not gain obvious performance, > > And this is what I'm worried about. First you propose

Re: [Pixman] [ssse3]Optimization for fetch_scanline_x8r8g8b8

2010-11-29 Thread Siarhei Siamashka
On Wednesday 17 November 2010 07:47:39 Xu, Samuel wrote: > Hi, Soeren Sandmann and Siarhei Siamashka: > Glad to send out this refreshed patch to address points we discussed > for this SSSE3 patch. Thanks. And sorry for a rather late reply. > In this new patch, we merged 32 bit and 64 bit a

Re: [Pixman] [ssse3]Optimization for fetch_scanline_x8r8g8b8

2010-11-19 Thread Soeren Sandmann
"Xu, Samuel" writes: > Appreciate comments on this patch I'll be away for the next three weeks, so I won't be able to review anything until then. Soren ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinf

Re: [Pixman] [ssse3]Optimization for fetch_scanline_x8r8g8b8

2010-11-18 Thread Xu, Samuel
, September 09, 2010 12:56 PM To: 'sandm...@daimi.au.dk'; Siarhei Siamashka; Ma, Ling; Liu, Xinyun Subject: RE: [Pixman] [ssse3]Optimization for fetch_scanline_x8r8g8b8 Hi, Soeren Sandmann and Siarhei Siamashka: It is really a long discussion and thanks for your patience! I believe all of u

Re: [Pixman] [ssse3]Optimization for fetch_scanline_x8r8g8b8

2010-11-16 Thread Xu, Samuel
, September 09, 2010 12:56 PM To: 'sandm...@daimi.au.dk'; Siarhei Siamashka; Ma, Ling; Liu, Xinyun Subject: RE: [Pixman] [ssse3]Optimization for fetch_scanline_x8r8g8b8 Hi, Soeren Sandmann and Siarhei Siamashka: It is really a long discussion and thanks for your patience! I believe all of us has s

Re: [Pixman] [ssse3]Optimization for fetch_scanline_x8r8g8b8

2010-09-08 Thread Ma, Ling
..@daimi.au.dk'; Xu, Samuel > Cc: Siarhei Siamashka; pixman@lists.freedesktop.org; Liu, Xinyun > Subject: RE: [Pixman] [ssse3]Optimization for fetch_scanline_x8r8g8b8 > > Hi Soren > > I am also interested in the answer to the question of whether this > > code was generat

Re: [Pixman] [ssse3]Optimization for fetch_scanline_x8r8g8b8

2010-09-08 Thread Ma, Ling
Hi Soren > I am also interested in the answer to the question of whether this > code was generated by passing intrinsics through the Intel compiler. I'm original author for this code, the code is based on our original memcpy function which has been pushed into glibc. Thanks Ling

Re: [Pixman] [ssse3]Optimization for fetch_scanline_x8r8g8b8

2010-09-08 Thread Soeren Sandmann
"Xu, Samuel" writes: > Hi, Soeren Sandmann and Siarhei Siamashka: > > As a wrap of current discussion, combining you two's comments, can we assume > this new patch of SSSE3 is ok? > New patch might contains: > 1. Fix 64 bit CPU detection issue for MMX and SSE2 > 2. Add more comments for git com

Re: [Pixman] [ssse3]Optimization for fetch_scanline_x8r8g8b8

2010-09-07 Thread Xu, Samuel
this case, how to determine Sun studio? Thanks! Samuel -Original Message- From: Siarhei Siamashka [mailto:siarhei.siamas...@gmail.com] Sent: Wednesday, September 08, 2010 3:39 AM To: Ma, Ling Cc: Xu, Samuel; Soeren Sandmann; pixman@lists.freedesktop.org; Liu, Xinyun Subje

Re: [Pixman] [ssse3]Optimization for fetch_scanline_x8r8g8b8

2010-09-07 Thread Siarhei Siamashka
On Friday 03 September 2010 01:39:54 Soeren Sandmann wrote: > Siarhei Siamashka writes: > > Apparently software prefetch also disables or interferes with the hardware > > prefetcher on Intel Atom, hurting performance a lot. More advanced > > processors can cope with it. > > > > But increased pref

Re: [Pixman] [ssse3]Optimization for fetch_scanline_x8r8g8b8

2010-09-07 Thread Siarhei Siamashka
On Tuesday 07 September 2010 14:03:52 Ma, Ling wrote: > > > > Wouldn't just the use of MOVD/MOVSS instructions here also solve > > > > this problem? Store forwarding does not seem to be used for SIMD > > > > according to the manual. I haven't benchmarked anything yet > > > > though

Re: [Pixman] [ssse3]Optimization for fetch_scanline_x8r8g8b8

2010-09-07 Thread Ma, Ling
> Your code still can be simplified a lot. I'm just not quite sure whether it > would > be more practical to commit something first and then refactor it with the > follow > up commits. Or attempt to make a "perfect" patch before committing. [Ma Ling] Yes, I agree with you, let us commit it first,

Re: [Pixman] [ssse3]Optimization for fetch_scanline_x8r8g8b8

2010-09-07 Thread Ma, Ling
Hi Siarhei Siamashka > Could you elaborate? Preferably with a reference to the relevant section of > the > optimization manual. Because it looks like exactly the store forwarding > address > aliasing issue to me. [Ma Ling]:Currently it is not described in our optimization manual, soon it will b

Re: [Pixman] [ssse3]Optimization for fetch_scanline_x8r8g8b8

2010-09-03 Thread Siarhei Siamashka
On Friday 03 September 2010 11:53:47 Xu, Samuel wrote: > >* Siarhei asked whether it would be possible to unify the 32 and 64 > > > > bit assembly sources. I don't think you commented on that. > > I think it is very difficult to unify 32 and 64 bit assemble src. It's not so difficult if you chan

Re: [Pixman] [ssse3]Optimization for fetch_scanline_x8r8g8b8

2010-09-03 Thread Siarhei Siamashka
On Friday 03 September 2010 11:53:47 Xu, Samuel wrote: > >* Store forwarding > > > > - We need some comments in the assembly about the store forwarding > > > >that Ma Ling described. > > How about this comments added to asm code: > "CPU doesn't check each address bit for src and dest, so re

Re: [Pixman] [ssse3]Optimization for fetch_scanline_x8r8g8b8

2010-09-03 Thread Xu, Samuel
Thanks Soren! Following is my comment. Appreciate your explicitly code guide on Sun Studio check, I never has experience on Sun Studio. After confirming of Sun Studio related, we can send next patch to address rest issues. Samuel > * The 64 bit CPU detection is broken. It doesn't use SSE2 and

Re: [Pixman] [ssse3]Optimization for fetch_scanline_x8r8g8b8

2010-09-02 Thread Soeren Sandmann
Siarhei Siamashka writes: > Looks like this data has been posted already: > http://lists.freedesktop.org/archives/pixman/2010-June/000231.html > > Checking a few more things with microbenchmarks shows that the prefetch > distance of just 64 bytes ahead is way too small. We should probably get

Re: [Pixman] [ssse3]Optimization for fetch_scanline_x8r8g8b8

2010-09-02 Thread Soeren Sandmann
"Xu, Samuel" writes: > Hi, Siarhei Siamashka: > Attached patch has updated copyright part (only copyright change). we > referred http://cgit.freedesktop.org/pixman/tree/COPYING. > Yes, As you assumed, we tested on multiple 32/64 bit boxes w/o > and w/ SSSE3. Here are some more comm

Re: [Pixman] [ssse3]Optimization for fetch_scanline_x8r8g8b8

2010-09-02 Thread Siarhei Siamashka
On Monday 30 August 2010 23:31:35 Siarhei Siamashka wrote: > And Intel Atom does not like software prefetch very much. This reminds me an > older report: > http://lists.freedesktop.org/archives/pixman/2010-June/000218.html > > I can try to run a full set of cairo-perf-trace benchmarks to get more

Re: [Pixman] [ssse3]Optimization for fetch_scanline_x8r8g8b8

2010-08-30 Thread Siarhei Siamashka
On Monday 30 August 2010 16:06:45 Siarhei Siamashka wrote: > Also I tried to run my simple microbenchmarking program on Intel Atom N450 > netbook, x86_64 system. The results are the following: > > -- > All results are presented in millions of pixels per second > L1 - small Xx1 rectangle (fitting

Re: [Pixman] [ssse3]Optimization for fetch_scanline_x8r8g8b8

2010-08-30 Thread Siarhei Siamashka
On Monday 30 August 2010 12:26:27 Xu, Samuel wrote: > Hi, Siarhei Siamashka: > Sorry for a typo, fixed version is attached. Pls ignore pervious mail. > > New patch, which contains > 1)Simplified 64 bit detect_cpu_features(), only check SSSE3 bit. > _MSC_VER > path switched to __

Re: [Pixman] [ssse3]Optimization for fetch_scanline_x8r8g8b8

2010-08-29 Thread Siarhei Siamashka
On Sunday 29 August 2010 12:25:47 Xu, Samuel wrote: > Hi, Siarhei Siamashka: > Q:---" What problems do you have without "merge" mechanism?" > A: Of course there isn't correctness issue w/o "merge". > Currently, sse2_fast_paths/mmx_fast_paths/c_fast_paths...are excluded each > other, although some c

Re: [Pixman] [ssse3]Optimization for fetch_scanline_x8r8g8b8

2010-08-29 Thread Xu, Samuel
.siamas...@gmail.com] Sent: Friday, August 27, 2010 10:57 PM To: Xu, Samuel Cc: pixman@lists.freedesktop.org; Ma, Ling; Liu, Xinyun Subject: Re: [Pixman] [ssse3]Optimization for fetch_scanline_x8r8g8b8 On Friday 27 August 2010 15:00:49 Xu, Samuel wrote: > Hi, Siarhei Siamashka: > Than

Re: [Pixman] [ssse3]Optimization for fetch_scanline_x8r8g8b8

2010-08-27 Thread Siarhei Siamashka
On Friday 27 August 2010 15:00:49 Xu, Samuel wrote: > Hi, Siarhei Siamashka: > Thanks for quick response! > For 64 bit detect_cpu_features(), if ignore HAVE_GETISAX and _MSC_VER, > it is ok for us to simplify it as your example in next update. If you can ensure MSVC compatibility

Re: [Pixman] [ssse3]Optimization for fetch_scanline_x8r8g8b8

2010-08-27 Thread Xu, Samuel
st 27, 2010 2:15 PM To: Xu, Samuel Cc: pixman@lists.freedesktop.org Subject: Re: [Pixman] [ssse3]Optimization for fetch_scanline_x8r8g8b8 Hi, Xu. > --- /dev/null > +++ b/pixman/pixman-access-ssse3_x86-64.S .. > +#if (defined(__amd64__) || defined(__x86_64__) ||defined(_M_AMD64))

Re: [Pixman] [ssse3]Optimization for fetch_scanline_x8r8g8b8

2010-08-27 Thread Xu, Samuel
u, Samuel Cc: pixman@lists.freedesktop.org; Ma, Ling; Liu, Xinyun Subject: Re: [Pixman] [ssse3]Optimization for fetch_scanline_x8r8g8b8 On Friday 27 August 2010 05:59:00 Xu, Samuel wrote: > Hi Siarhei Siamashka, > > Here is a new patch, can you review it? Thank you! > It address follow

Re: [Pixman] [ssse3]Optimization for fetch_scanline_x8r8g8b8

2010-08-27 Thread Siarhei Siamashka
On Friday 27 August 2010 05:59:00 Xu, Samuel wrote: > Hi Siarhei Siamashka, > > Here is a new patch, can you review it? Thank you! > It address following suggestions: > 1: SSSE3 file is split to a new file. Thanks. > Comparing with to duplicate every > content from SSE2 file, I added a way to me

Re: [Pixman] [ssse3]Optimization for fetch_scanline_x8r8g8b8

2010-08-27 Thread Makoto Kato
ge- From: Siarhei Siamashka [mailto:siarhei.siamas...@gmail.com] Sent: Sunday, August 22, 2010 1:49 AM To: Liu, Xinyun Cc: pixman@lists.freedesktop.org; Ma, Ling; Xu, Samuel Subject: Re: [Pixman] [ssse3]Optimization for fetch_scanline_x8r8g8b8 On Friday 20 August 2010 18:39:47 Liu, Xinyun wrote: Hi Siar

Re: [Pixman] [ssse3]Optimization for fetch_scanline_x8r8g8b8

2010-08-21 Thread Siarhei Siamashka
On Friday 20 August 2010 18:39:47 Liu, Xinyun wrote: > Hi Siarhei Siamashka, > > Here is a new patch, can you review it? Thank you! Sure, thanks for the updated patch. Some comments follow. > From 9783651899a2763d7fcca2960fc354bd1f541980 Mon Sep 17 00:00:00 2001 > From: root A minor nitpick he

Re: [Pixman] [ssse3]Optimization for fetch_scanline_x8r8g8b8

2010-08-21 Thread Siarhei Siamashka
On Friday 20 August 2010 19:36:07 Xu, Samuel wrote: > We measured performance, and compared with original SSE2 intrinsic enabled > version(0.19.4), on ATOM, and get following findings using 480P flash > H.264 video playing workload: > 1) sse2_composite_src_x888_()'s cycle reduced 67%. This func

Re: [Pixman] [ssse3]Optimization for fetch_scanline_x8r8g8b8

2010-08-20 Thread Xu, Samuel
: 64 bit system, without SSSE3 Thanks! Samuel -Original Message- From: Liu, Xinyun [mailto:xinyun...@gmail.com] Sent: Friday, August 20, 2010 11:40 PM To: Siarhei Siamashka; pixman@lists.freedesktop.org Cc: Ma, Ling; Xu, Samuel Subject: Re: [Pixman] [ssse3]Optimization for fetch_scanline_

Re: [Pixman] [ssse3]Optimization for fetch_scanline_x8r8g8b8

2010-08-20 Thread Liu, Xinyun
Hi Siarhei Siamashka, Here is a new patch, can you review it? Thank you! With this patch, opfile said that the performance is increased dramatically for Atom. Samuel and Ling will provide detailed data. Regards, Liu, Xinyun 0001-Add-ssse3_composite_src_x888_.patch Description: Binary data

Re: [Pixman] [ssse3]Optimization for fetch_scanline_x8r8g8b8

2010-08-19 Thread Siarhei Siamashka
On Tuesday 17 August 2010 10:52:43 Xu, Samuel wrote: > We'd like to provide a new patch with following enhancement soon: > 1) Add 64 bit asm code specifically for 64 bit, which will co-exist with 32 > bit version > 2) CPUID dynamic check in pixman-cpu.c and pixman-access.c > 3) Makefile fixing > 4

Re: [Pixman] [ssse3]Optimization for fetch_scanline_x8r8g8b8

2010-08-17 Thread Xu, Samuel
;make check"? Samuel -Original Message- From: Ma, Ling Sent: Tuesday, August 17, 2010 3:22 PM To: Siarhei Siamashka; Xu, Samuel Cc: pixman@lists.freedesktop.org; Liu, Xinyun Subject: RE: [Pixman] [ssse3]Optimization for fetch_scanline_x8r8g8b8 Hi Siarhei Siamashka > I did n

Re: [Pixman] [ssse3]Optimization for fetch_scanline_x8r8g8b8

2010-08-16 Thread Siarhei Siamashka
On Monday 16 August 2010 11:24:44 Xu, Samuel wrote: > Thanks for kindly comments! It is very nice that bug#20709 also emphasize > similar performance issue. Well, having bugs unresolved for such a long time is not so nice. Also see some more comments below. > So, let's discuss how to make

Re: [Pixman] [ssse3]Optimization for fetch_scanline_x8r8g8b8

2010-08-16 Thread Xu, Samuel
in current pixman-arm-neon-asm.S inside pixman code. Thanks! Samuel -Original Message- From: Siarhei Siamashka [mailto:siarhei.siamas...@gmail.com] Sent: Saturday, August 14, 2010 1:54 AM To: pixman@lists.freedesktop.org Cc: Liu, Xinyun; Ma, Ling; Xu, Samuel Subject: Re: [Pixman] [ssse3]Opti

Re: [Pixman] [ssse3]Optimization for fetch_scanline_x8r8g8b8

2010-08-13 Thread Siarhei Siamashka
On Wednesday 11 August 2010 09:00:54 Liu Xinyun wrote: > Hi, > > piman-access.c: fetch_scanline_x8r8g8b8() is mainly memcpy and 'or" > operations. With ssse3_memcpy, the performance is increased a little. > > Reference: http://bugs.meego.com/show_bug.cgi?id=5012 > > Quote: > > After optimization