Re: [bitc-dev] Bitc and Simd

orthochronous Sat, 14 Aug 2010 10:51:39 -0700

On Sat, Aug 14, 2010 at 5:06 PM, Ben Kloosterman <[email protected]> wrote:
> Eg
>
> for ( xmm i = 0 ; i <  loopCount ; i = i + 1)
>       RunLoopVariableDependentSIMDAlgorithm(i) ;
>
> Or this
>
> //pointers/data must be 16 byte aligned
> int blockMemCopy(void *destination, void *source, int32 size)
> {
>
>   xmm *dest = (xmm*)&destination;
>   xmm *sour = (xmm*)&source;
>   int c;
>
>   for(c=0;c< (size <<2) ;c++)
>      *dest++ = *sour++;
>
>    return c>>2 ;
> }


Just a quick comment: on ARM chips the NEON unit is deliberately run 5
cycles behind the main scalar pipeline. As such, it is heavily advised
against using SIMD instructions unless you're actually using the full
SIMD capabilities (ideally using the main pipeline just to do control
flow) since otherwise you incur notable penalties moving both sending
data to and from the unit from the main pipeline. Additionally the
NEON unit on ARM uses only the L2 cache, requiring explicitly making
the L1 cache coherent with L2 before accessing any of the data in the
main part of the CPU:

http://forums.arm.com/lofiversion/index.php?t12665.html

This is a reasonable design for multimedia, where most of the time the
scalar and SIMD data-sets are don't overlap. (I'm interested in ARM as
well as Intel because both of these chips turn up in smartphones,
tablets and netbooks.)

Regards,
David Steven Tweed

_______________________________________________
bitc-dev mailing list
[email protected]
http://www.coyotos.org/mailman/listinfo/bitc-dev

Re: [bitc-dev] Bitc and Simd

Reply via email to