Nikos Mavrogiannopoulos <[email protected]> writes:

> On 02/06/2011 12:08 AM, Niels Möller wrote:
>
>> It remains to see how much table space and/or assembly hacking is needed
>> to get reasonable performance.
>
> There is a special instruction for that on new intel and AMD CPUs...
> http://software.intel.com/en-us/articles/intel-carry-less-multiplication-instruction-and-its-usage-for-computing-the-gcm-mode/
> http://en.wikipedia.org/wiki/CLMUL_instruction_set

Interesting. I haven't played with any such special instructions (even
if it ought to make a bit of difference also for aes).

Anyway, I've been hacking a bit on the C-implementation over the day,
and the galois hashing (gmac) is now 18 times(!) faster. Summary of
changes:

  Original unoptimized code:
  
           Algorithm        mode Mbyte/s cycles/byte cycles/block
                gmac        auth    1.49      829.83     13277.27
  
  Optimized rshift, rewritten to use word-sized operations:
  
           Algorithm        mode Mbyte/s cycles/byte cycles/block
                gmac        auth    4.62      268.14      4290.23
  
  Optimized gf_mul, rewritten to use separate byte and bit loops:
  
           Algorithm        mode Mbyte/s cycles/byte cycles/block
                gmac        auth    5.46      227.18      3634.90
  
  Moved reduction into shift function:
  
           Algorithm        mode Mbyte/s cycles/byte cycles/block
                gmac        auth    6.79      182.69      2923.02
  
  Introduced 4-bit tables:
  
           Algorithm        mode Mbyte/s cycles/byte cycles/block
                gmac        auth   27.14       45.68       730.82

Remaining tricks:

* Try 8-bit tables (which increases storage need a lot, the modest 4-bit
  tables need only 256 additional bytes per key, for gf_mul, and a
  32-byte constant table for gf_shift. Extending to 8 bits makes both
  tables 16 times larger).

* See if it makes sense to write any assembler for the hashing function.

* Various smaller microoptimizations, like a public memxor-variant for
  when areas are known to be word-aligned. Or inline that xor:ing.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
_______________________________________________
nettle-bugs mailing list
[email protected]
http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs

Reply via email to