>
> Take for instance the PSHUFB instruction, which allows a very fast 
> [16]byte lookup in SSSE3 capable machines. This is helpful in various ways, 
> but if it isn't available, it will have to commit the XMM register to 
> memory and do 16 lookups, which is at least an order of magnitude slower 
> than using the SIMD. Similarly, RSQRT (low precision reciprocal of the 
> square root) instruction allows a "shortcut", but if it isn't available on 
> your architecture, it will likely be very expensive.
>

this kind of thing is what lead me to wonder if using a lot of separate 
templates might not be better.

because then you wouldn’t be using the compilers ability to optimise 
through all this complexity, but actually just using a record of what 
people have found optimal on a case by case basis. 

with this being used on a most critical code any slight advantage becomes 
worthwhile, particularly if all you have to do is 'generate' each time you 
change the critical part of the code.

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to