Stephen Horner mailed me privately (and asked me to forward to list), ---8<----
I think I just did what you did, and accidentally emailed you instead of the list. Damnit! Anyways I looked and just found out that my $MAILDIR/sent is b0rked, so if you could bounce the email I sent concerning the altivec optimizations (if you did get it) to the list, I'd be a happy hacker indeed :) Stephen ps - do you know any powerpc assembly? --->8---- On Mon, 2005-06-20 at 23:09 -0700, Stephen Horner wrote: > > Damn. I sent this privately to Mike because I forgot to change the > > address. I don't know why some people's email leaves out the list but > > sorry Mike. For the list: > This would be a good thing for me as well, being that I currently only have a > PIII Coppermine machine. If you need a box to test the code on, and haven't > yet > found someone to assist you with your code, feel free to give me a holler. > > On another note, I'm curious if there is anyone here on the list that has a > powerpc32 or 64 bit machine and uses Eterm? I'm curious if anyone here uses > such > glorious (^_^) hardware, and could use the altivec version of these > optimizations. If not, I'll direct my powerpc studies on another project. > > Thanks > Stephen > Interesting that you mention the PPC as a new comment in my code is: /* The challenge is now on for the PowerPC gurus to adapt this code to use * the PowerPC's Altivec SIMD engine to achieve the same performance * increases. PA-RISC has the MAX extensions, the Alpha has MVI, the SGI * has MDMX, and the UltraSPARC has VIS. Good luck guys! :-) * * P.S. Sorry MEJ. :-/ */ I've been tinkering with this for a little while and ported the x86/MMX engine to run w/ SSE2 (actually all it does is use all 128 bits of the registers so it was pretty easy) but their are a couple of instructions that I used that deal with the upper 64 bits that aren't available in the SSE1 instruction set. I'm still looking into that. This isn't a high priority for me because Eterm's maintainer, Michael Jennings (Mej), wasn't too excited about yet another code base to maintain. Nor has he chimed in on any of these discussions. The main reason that I did the original port was that we x86-64 users didn't have access to any enhanced asm routines so it made a big difference for me. But people with x86/SSE can still use the original MMX stuff written by Willem Jan-Monsuwe and get a good deal of speed up. On the other hand this type of stuff is interesting so I'm going to do some stuff if for no other reason than to learn. However, before I submit a change set to Mej I want to have a thorough patch that handles all the cases that are going to be, or could be, handled: Arch Inst-set Status --------------------------------------------------- All C Done by Mej The profiling base. x86 MMX Done by Willem x86-64 SSE2 Done by me x86 SSE2 Done but not submitted x86 SSE In progress x86 3DNow ??? PPC Altivec Offered by Stephen Horner??? Alpha MVI ??? SGI MDMX ??? U-Sparc VIS ??? PA-RISC MAX ??? And most importantly I want to justify the changes with some custom built profiling code and time the SIMD stuff against the C code. Using the C stuff for a base should enable meaningful benchmarks across different speed processors, after all I don't want to profile the speed of the processors but the speed _increases_ within the processor families by using their Single_Instruction_Multiple_Data ops. Provided I can get some good numbers and someone can help me come up with a good way to modify the auto-tools nightmare into a coherent and easily maintainable processor interrogation system then we can collectively submit the change set to Mr. Jennings. The auto-tools stuff is a weakness for me as John Ellson had to be talked into doing it for the x86-64 SSE2 stuff. And hopefully we won't irritate Mr. Jennings to much by redoing a newly committed patch. ;-) I have a fairly comprehensive test program that has some timing stuff in it (needs improvement) that I intend to hand out to the people that have volunteered to test so that I can ask them to benchmark the various things. It also verifies correctness of the code. I also have a patch to gdb that enables inspection of the xmm registers as hexadecimal rather than floats. And I have found that a few of the instructions in the original code can be replaced with others that do the same thing but with different calls. Those need to be benchmarked to see if they helped or hurt. The biggest place that I have squeezed more performance from is by pre-populating the cache, adding memory fences, and using non-temporal memory reads and writes. Again, these different code sets are going to need benchmarking on different processor cores to see which ones are better. The new code is really nice in enlightenment 16.8 (cvs) when you have enabled update window while dragging. The Eterm background stays synced with the real background and is _now_ almost flicker free. If I get all of this stuff worked out then I'm going to dig into evas, imlib2, and E-17 and look for speedups. I'm not as optimistic about improvements in there as raster is a phenomenal programmer and he has been through that code (it's his code) so much that it is already smokin' hot. But who knows. I don't want to get anyone too psyched here as I'm currently working on about three big projects and five smaller ones. It could be a while before something comes out of this. On the other hand, I am keeping a list of people that have volunteered to test and the archs that they can test. So please feel free to email me if you want to do some testing. I'm also compiling a list of processors that contain SIMD instruction sets and what those sets are. Please send any additions corrections to the following list to me. * Intel Chips SIMD Instruction Set * Pentium None * Pentium Pro None * Pentium MMX MMX * Pentium II MMX * Pentium III SSE * Pentium IV SSE2 * Pentium IV (Prescott) SSE3 * Pentium IV Xeon (EM64T) SSE3 * * AMD Chips SIMD Instruction Set * AMD K6 MMX * AMD K6-2 3D Now * AMD K6-III Extended 3D Now * Athalon 3D Now * Athalon XP SSE * AMD 64 SSE2 * AMD 64 (rev. E) SSE3 * AMD 64 X2 SSE3 * * AMD SIMD Instruction Sets Includes * 3D Now MMX * Advanced/Extended 3D Now MMX * 3D Now Pro SSE Again, this code isn't really that big of a deal as the x86/mmx and x86-64sse2 stuff is now working for all Intel compatible processors (my main goal was x86-64). I have seen about 15-20% speed bumps with some adjustments (over what's in Eterm's CVS) but I need to improve the accuracy of my timing code before I can say that definitively. So Stephen, I think there are a number of people that would like to see some speed bumps on their PPC archs and I don't know anything about the internals of those processors. I do know that KWO, E-16.8's maintainer, refuses to look at asm on the x86 but likes the PPC so I think he would agree with you about that being a glorious processor. Not to mention the distributions like Yellow-Dog that are based on the arch. Mr. Woelders, KWO, did successfully test my new x86/sse2 code though so I know he has a Pentium IV. From what I have read the PPC SIMD engine is a bit more elegant than is Intel/AMD's. Not that I consider slashdot a great source (many idiots but also many engineers) but there is one comment in particular that seems to sum up the differences quite well: http://apple.slashdot.org/comments.pl?sid=151831&cid=12742719 Any SIMD coders or testers that want to get on the list for me to send software to need to email me; I'm not attaching a big chunk-o-code to a list submission. And don't expect super fast progress here as I want to make sure things are as fast as possible this time. (Last time I just wanted them to work). I would appreciate it if MEJ chimed in here so I know if he is interested. I still intend to test the various performances but if he isn't interested then I'm not going to waste time cleaning up the code, ensuring the #ifdef stuff is right and hoping for help with the auto-tools. Also, if anyone has a SSE opcode reference (not SSE2, I have that) I would appreciate it. If there is a C coder out there that doesn't want to learn assembly but still wants to do something cool then how about adding a slider to shade/lighten the background to Eterm's menu/button bar. Right now the background --> brightness selector isn't that granular (increments by 32). Then perhaps a drop down that changes the slider from effecting the brightness to contrast or gamma. Just a thought. Cheers, The River Rat -- Tres ------------------------------------------------------- SF.Net email is sponsored by: Discover Easy Linux Migration Strategies from IBM. Find simple to follow Roadmaps, straightforward articles, informative Webcasts and more! Get everything you need to get up to speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click _______________________________________________ enlightenment-devel mailing list enlightenment-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/enlightenment-devel