The packed integer shift can help a lot, although I would not believe getting 5x out of just that can be achieved. I don't know if they also have a 128-bit AND instruction (or a similar packed AND), but if they do, a combination of that with this new packed shift would go a long way.
Ferenc On 08/26/08 17:01, Darren J Moffat wrote: > An interesting thread to follow on cryptography@ ... > > -------- Original Message -------- > Subject: 5x speedup for AES using SSE5? > Date: Sat, 23 Aug 2008 14:00:44 +0100 > From: Paul Crowley <paul at ciphergoth.org> > To: cryptography at metzdowd.com > > http://www.ddj.com/hpc-high-performance-computing/201803067 > > In the above Dr Dobb's article from a little over a year ago, AMD Senior > Fellow Leendert vanDoorn states "the Advanced Encryption Standard (AES) > algorithm gets a factor of 5 performance improvement by using the new > SSE5 extension". However, glancing through the SSE5 specification, I > can't see at all how such a dramatic speedup might be achieved. Does > anyone know any more, or can anyone see more than I can in the spec? > > http://developer.amd.com/cpu/SSE5/Pages/default.aspx