Maamoun TK <[email protected]> writes:

> Great work. The implementation looks fine, I like the idea of using -16
> instead of 16 for rotating because vspltisw is limited to (-16 to 15)
> and vrlw picks the low-order 5 bits which is the same for both -16 and
> 16.

I picked up that trick from Torbjörn Granlund's code.

> BTW this implementation should work as is on big-endian mode without any
> hassle because lxvw4x/stxvw4x are endianness aware of loading/storing word
> values.

I've pushed it to a branch ppc-chacha-core. But it fails on big-endian
powerpc64, see https://gitlab.com/gnutls/nettle/-/jobs/758348866.

And it looks like the error message from the first failing chacha test
is truncated, which makes me suspect some error in function prologue or
register usage, resulting in some invalid state when the function returns.

Comparing to your assembly code, I don't set FUNC_ALIGN, is that a
problem?

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.
_______________________________________________
nettle-bugs mailing list
[email protected]
http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs

Reply via email to