On 02/06/2011 12:08 AM, Niels Möller wrote: > The unoptimized GF(2^128) multiply function really is awfully slow. On > x86_64, gmac takes 830 cycles/byte! We can compare to the sha functions, > where sha1, sha256 and sha512 take respectively 8, 18 and 12 > cycles/byte, so the current code is two orders of magnitude slower than > hmac-sha1. > It remains to see how much table space and/or assembly hacking is needed > to get reasonable performance.
There is a special instruction for that on new intel and AMD CPUs... http://software.intel.com/en-us/articles/intel-carry-less-multiplication-instruction-and-its-usage-for-computing-the-gcm-mode/ http://en.wikipedia.org/wiki/CLMUL_instruction_set Unfortunately I don't have anything close to those cpus... regards, Nikos _______________________________________________ nettle-bugs mailing list [email protected] http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs
