> > Take for instance the PSHUFB instruction, which allows a very fast > [16]byte lookup in SSSE3 capable machines. This is helpful in various ways, > but if it isn't available, it will have to commit the XMM register to > memory and do 16 lookups, which is at least an order of magnitude slower > than using the SIMD. Similarly, RSQRT (low precision reciprocal of the > square root) instruction allows a "shortcut", but if it isn't available on > your architecture, it will likely be very expensive. >
this kind of thing is what lead me to wonder if using a lot of separate templates might not be better. because then you wouldn’t be using the compilers ability to optimise through all this complexity, but actually just using a record of what people have found optimal on a case by case basis. with this being used on a most critical code any slight advantage becomes worthwhile, particularly if all you have to do is 'generate' each time you change the critical part of the code. -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.