>-----Original Message-----
 >From: orthochronous [mailto:[email protected]]
 >Sent: Monday, August 16, 2010 12:23 AM
 >To: [email protected]; Discussions about the BitC language
 >Subject: Re: [bitc-dev] Bitc and Simd
 >
 >On Sun, Aug 15, 2010 at 10:00 AM, Ben Kloosterman <[email protected]>
 >wrote:
 >> Yes this is the biggest issue for SIMD , that said even if  you remove
 >> 70-90% of intrinsic lines it's a big win , will keep a solution
 >simpler and
 >> many programs will run without them and possibly on other platforms.
 >
 >That does depend what your goal is. If you're trying to stuff SIMD
 >into the compiler then this might be feasible (although I've never
 >benchmarked the kinds of things you're proposing so I don't know how
 >much of a win it is in practice.) 


On x86 you are already using it as most libs now do it in asm , it improves
medium memcpy (and all memset) speed substantially ( for smaller copies
<64-128 bytes, alignment means it is not used in a general purpose mem_cpy)
and for large copies ( > cache size) non temporal instructions and using 128
bit registers result in a 40% gain ( compared to memcpy using SSE2)  ( or
more if the code benefits from the cache being used up for copying) .  The
question is how much will the improvement for  other functions like bitscan
,a aggregating loops into 128 but etc improve the whole program.


 > If you want to write
 >numerical/multimedia algorithms then the best place to put effort is
 >in to making using intrinsics work as smoothly with the rest of the
 >language. 

Agree.

> (You can almost point to which numerical/multimedia
 >algorithm a given SSE instruction comes from; ARM is a bit more of an
 >attempt at an orthogonal instruction set, but you can still see the
 >motivating cases.) FWIW, it's very, very rare for me to see something
 >like memcpy show up on a time profile, although routines like
 >correlation, FFT, etc, show up all the time.

That’s because of the work you do and the fact memcpy uses SIMD :-)
Algorithms run for seconds but OS routines , GUI ,network stack , web
servers and kernel calls run very often and if you do a full run of a
program (and not just the critical part) mem cpy does show up in most. 
If your working on kernels or drivers  memcpy becomes a bigger cost  and
BitC should cover both of these groups who use C eg system programming and
high performance algorithms.

Ben 


_______________________________________________
bitc-dev mailing list
[email protected]
http://www.coyotos.org/mailman/listinfo/bitc-dev

Reply via email to