I think I will leave the curve25519 and eddsa code for now, even though
there are several important optimizations left to do (see the just
updated http://www.lysator.liu.se/~nisse/nettle/plan.html).

I think it's getting time to do fat binaries. To make progress, I think
it's best to start with something simple, relying on
__attribute__((constructor) and/or __attribute__(ifunc ...)).

For the case of memxor (where on x86_64, the fat binary mechanism needs
to select between sse2 and non-sse2 code), I'm also considering some
reorganization:

 * Use smaller assembly routines doing one case each, and let the main
   entry point always be C code which can sort out the different cases
   and handle bytes at the beginning and end of the buffer.

 * Fix the cases where the current current code reads a few bytes
   outside of input buffers (but luckily without crossing word
   boundaries, iirc).

 * Add some internal entry points, for cases where alignment is known by
   the caller.

I think some additional overhead is acceptable for the cases of small
badly aligned buffers, if one can gain cleaner or more efficient
handling of the other cases.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.

_______________________________________________
nettle-bugs mailing list
[email protected]
http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs

Reply via email to