"Yuriy M. Kaminskiy" <[email protected]> writes:

I've had another look, trying to understand how it differs. 

> Does not use pre-rotated tables (as in AES_SMALL), so reduces d-cache
> footprint from 4.25K to 1K (enc)/1.25K (dec);
> completely unrolled, so increases i-cache footprint
> from 948b to 4416b (enc)/4032b (dec)

Not sure unrolling is that beneficial; Nettle's implementation does two
rounds at a time (since just like in your patch, src and destination
registers alternate when doing a round), and that's so many instructions
that lop iverhead should be pretty small.

> As it completely replaces current implementation, I just attached new
> files (will post final version as a patch).

As you say, it doesn't use prerotated tables, but instead adds a , ror
#x to the relevant eor instructions.

Load and store of the cleartext and ciphertext bytes is different (and I
have some difficulty following it).

Masking to get table indices is the same as in nettle's
arm/aes-encrypt-internal.asm, while nettle's v6 code uses the uxtb
instruction, which saves one register (which the code doesn't take much
advantage of, though).

The code in your patch has more careful instruction scheduling, e.g.,
interleaving addition of roundkeys with the sbox table lookups. Nettle's
code is written with only a single temporary register used for
everything, which makes it impossible to interleave independent parts of
the mangling. While your patch seems to alternate between three
different temporaries.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.
_______________________________________________
nettle-bugs mailing list
[email protected]
http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs

Reply via email to