Stephen Horner mailed me privately (and asked me to forward to list),

---8<----

I think I just did what you did, and accidentally emailed you instead of the
list. Damnit! Anyways I looked and just found out that my $MAILDIR/sent is
b0rked, so if you could bounce the email I sent concerning the altivec
optimizations (if you did get it) to the list, I'd be a happy hacker indeed :)

Stephen

ps - do you know any powerpc assembly? 

--->8----

On Mon, 2005-06-20 at 23:09 -0700, Stephen Horner wrote:
> > Damn.  I sent this privately to Mike because I forgot to change the
> > address.  I don't know why some people's email leaves out the list but
> > sorry Mike.  For the list:

> This would be a good thing for me as well, being that I currently only have a
> PIII Coppermine machine. If you need a box to test the code on, and haven't 
> yet
> found someone to assist you with your code, feel free to give me a holler.
> 
> On another note, I'm curious if there is anyone here on the list that has a
> powerpc32 or 64 bit machine and uses Eterm? I'm curious if anyone here uses 
> such
> glorious (^_^) hardware, and could use the altivec version of these
> optimizations. If not, I'll direct my powerpc studies on another project.
> 
> Thanks
> Stephen 
> 
Interesting that you mention the PPC as a new comment in my code is:

/*  The challenge is now on for the PowerPC gurus to adapt this code to use
 *  the PowerPC's Altivec SIMD engine to achieve the same performance 
 *  increases.  PA-RISC has the MAX extensions, the Alpha has MVI, the SGI
 *  has MDMX, and the UltraSPARC has VIS.  Good luck guys!  :-)
 *
 *  P.S.        Sorry MEJ.  :-/
 */


        I've been tinkering with this for a little while and ported the x86/MMX
engine to run w/ SSE2 (actually all it does is use all 128 bits of the
registers so it was pretty easy) but their are a couple of instructions
that I used that deal with the upper 64 bits that aren't available in
the SSE1 instruction set.  I'm still looking into that.

        This isn't a high priority for me because Eterm's maintainer, Michael
Jennings (Mej), wasn't too excited about yet another code base to
maintain.  Nor has he chimed in on any of these discussions.  The main
reason that I did the original port was that we x86-64 users didn't have
access to any enhanced asm routines so it made a big difference for me.
But people with x86/SSE can still use the original MMX stuff written by
Willem Jan-Monsuwe and get a good deal of speed up.  On the other hand
this type of stuff is interesting so I'm going to do some stuff if for
no other reason than to learn.  However, before I submit a change set to
Mej I want to have a thorough patch that handles all the cases that are
going to be, or could be, handled:

Arch    Inst-set        Status
---------------------------------------------------
All     C               Done by Mej                The profiling base.
x86     MMX             Done by Willem
x86-64  SSE2            Done by me
x86     SSE2            Done but not submitted
x86     SSE             In progress
x86     3DNow           ???
PPC     Altivec         Offered by Stephen Horner???
Alpha   MVI             ???
SGI     MDMX            ???
U-Sparc VIS             ???
PA-RISC MAX             ???

        And most importantly I want to justify the changes with some custom
built profiling code and time the SIMD stuff against the C code.  Using
the C stuff for a base should enable meaningful benchmarks across
different speed processors, after all I don't want to profile the speed
of the processors but the speed _increases_ within the processor
families by using their Single_Instruction_Multiple_Data ops.  Provided
I can get some good numbers and someone can help me come up with a good
way to modify the auto-tools nightmare into a coherent and easily
maintainable processor interrogation system then we can collectively
submit the change set to Mr. Jennings.  The auto-tools stuff is a
weakness for me as John Ellson had to be talked into doing it for the
x86-64 SSE2 stuff.  And hopefully we won't irritate Mr. Jennings to much
by redoing a newly committed patch.  ;-)

        I have a fairly comprehensive test program that has some timing stuff
in it (needs improvement) that I intend to hand out to the people that
have volunteered to test so that I can ask them to benchmark the various
things.  It also verifies correctness of the code.  I also have a patch
to gdb that enables inspection of the xmm registers as hexadecimal
rather than floats.  And I have found that a few of the instructions in
the original code can be replaced with others that do the same thing but
with different calls.  Those need to be benchmarked to see if they
helped or hurt.  The biggest place that I have squeezed more performance
from is by pre-populating the cache, adding memory fences, and using
non-temporal memory reads and writes.  Again, these different code sets
are going to need benchmarking on different processor cores to see which
ones are better.

        The new code is really nice in enlightenment 16.8 (cvs) when you have
enabled update window while dragging.  The Eterm background stays synced
with the real background and is _now_ almost flicker free.  If I get all
of this stuff worked out then I'm going to dig into evas, imlib2, and
E-17 and look for speedups.  I'm not as optimistic about improvements in
there as raster is a phenomenal programmer and he has been through that
code (it's his code) so much that it is already smokin' hot.  But who
knows.  I don't want to get anyone too psyched here as I'm currently
working on about three big projects and five smaller ones.  It could be
a while before something comes out of this.  On the other hand, I am
keeping a list of people that have volunteered to test and the archs
that they can test.  So please feel free to email me if you want to do
some testing.  I'm also compiling a list of processors that contain SIMD
instruction sets and what those sets are.  Please send any additions
corrections to the following list to me.


 *  Intel Chips         SIMD Instruction Set
 *  Pentium                     None
 *  Pentium Pro                 None
 *  Pentium MMX                 MMX
 *  Pentium II                  MMX
 *  Pentium III                 SSE
 *  Pentium IV                  SSE2
 *  Pentium IV (Prescott)       SSE3
 *  Pentium IV Xeon (EM64T)     SSE3
 *
 *  AMD Chips           SIMD Instruction Set
 *  AMD K6                      MMX
 *  AMD K6-2                    3D Now
 *  AMD K6-III                  Extended 3D Now
 *  Athalon                     3D Now
 *  Athalon XP                  SSE
 *  AMD 64                      SSE2
 *  AMD 64 (rev. E)             SSE3
 *  AMD 64 X2                   SSE3
 *
 *  AMD SIMD Instruction Sets Includes
 *  3D Now                      MMX
 *  Advanced/Extended 3D Now    MMX
 *  3D Now Pro                  SSE


        Again, this code isn't really that big of a deal as the x86/mmx and
x86-64sse2 stuff is now working for all Intel compatible processors (my
main goal was x86-64).  I have seen about 15-20% speed bumps with some
adjustments (over what's in Eterm's CVS) but I need to improve the
accuracy of my timing code before I can say that definitively.  So
Stephen, I think there are a number of people that would like to see
some speed bumps on their PPC archs and I don't know anything about the
internals of those processors.  I do know that KWO, E-16.8's maintainer,
refuses to look at asm on the x86 but likes the PPC so I think he would
agree with you about that being a glorious processor.  Not to mention
the distributions like Yellow-Dog that are based on the arch.  Mr.
Woelders, KWO, did successfully test my new x86/sse2 code though so I
know he has a Pentium IV.  From what I have read the PPC SIMD engine is
a bit more elegant than is Intel/AMD's.  Not that I consider slashdot a
great source (many idiots but also many engineers) but there is one
comment in particular that seems to sum up the differences quite well:

http://apple.slashdot.org/comments.pl?sid=151831&cid=12742719

        Any SIMD coders or testers that want to get on the list for me to send
software to need to email me; I'm not attaching a big chunk-o-code to a
list submission.  And don't expect super fast progress here as I want to
make sure things are as fast as possible this time.  (Last time I just
wanted them to work).  I would appreciate it if MEJ chimed in here so I
know if he is interested.  I still intend to test the various
performances but if he isn't interested then I'm not going to waste time
cleaning up the code, ensuring the #ifdef stuff is right and hoping for
help with the auto-tools.  Also, if anyone has a SSE opcode reference
(not SSE2, I have that) I would appreciate it.

        If there is a C coder out there that doesn't want to learn assembly but
still wants to do something cool then how about adding a slider to
shade/lighten the background to Eterm's menu/button bar.  Right now the
background --> brightness selector isn't that granular (increments by
32).  Then perhaps a drop down that changes the slider from effecting
the brightness to contrast or gamma.  Just a thought.

Cheers,
The River Rat
-- 
Tres



-------------------------------------------------------
SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
from IBM. Find simple to follow Roadmaps, straightforward articles,
informative Webcasts and more! Get everything you need to get up to
speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click
_______________________________________________
enlightenment-devel mailing list
enlightenment-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/enlightenment-devel

Reply via email to