Nikos Mavrogiannopoulos <n...@redhat.com> writes:

> On Mon, 2018-02-19 at 15:27 +0100, Niels Möller wrote:
>> I think I'd prefer allocating a uint64_t array (largest type used in
>> nettle context structs), and leave to the compiler to figure out what
>> alignment is needed and how to get it.
>
> That way you get 8-byte alignment which is ok, but if you use it for
> aesni key state for example, it results to slower operations.

My concern now is not performance, but correctness on platforms that
require proper data alignment.

But to nevertheless comment about performance, for the benefit of x86_64
simd instructions, I've been thinking about specifying 16-byte alignment
for things like union nettle_block16 and aes subkeys, but I'm not aware
of any nice and portable way to do that. And if we do it only for some
of the possible compilers on a platform, we get an inconsistent ABI.

And I doubt it makes much difference for performance, the bottleneck for
AES isn't the reading of subkeys (at least not since recent aesni
changes which loads the subkeys into registers outside of the block
loop). memxor is the only function in Nettle which is likely to be
limited by memory accesses, and it uses its own logic to do aligned
accesses.

Speaking of memxor, I wonder if one could take advantage of alignment
info propagated by recent versions of gcc, and do something like

#define COMPILE_TIME_ZERO(x) (__builtin_constant_p((x) == 0) && (x) == 0))

inline void memxor (void *src, void *dst, size_t n) 
{
   if (COMPILE_TIME_ZERO((intptr_t) src & 7)
       && COMPILE_TIME_ZERO((intptr_t) dst & 7)
       && COMPILE_TIME_ZERO(n & 7))
     memxor_words(...);
   else 
     ...
}

to skip some runtime alignment checks when alignment can be derived at
compile time.

> Said that, variable arrays and alloca() are ok when the input doesn't
> come externally but I'm not sure if we can enforce that in nettle. What
> about moving to malloc() unconditionally?

The intention is to use TMP_ALLOC only for allocations with small
bounds, like digest or block sizes, other small buffers (e.g., 512 bytes
used in ctr.c and cbc.c), and in the pss code, hash context sizes.

You fixed a few potentially larger allocations to use TMP_GMP_ALLOC some
time ago, iirc.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.
_______________________________________________
nettle-bugs mailing list
nettle-bugs@lists.lysator.liu.se
http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs

Reply via email to