Maamoun TK <maamoun...@googlemail.com> writes:

> Yes, this is exactly how I do it. Four messages arranged vertically in YMM
> registers.

Could you add comments explaining the register layout in a bit more
detail? From this, I take it you use 5 message registers, each one
holding 26 bits from each of 4 messages (in which order?), and half of
those ymm registers unused, due to the way vpmuludq works?

> Right, I exploited every possible way to keep the overhead inside the inner
> loop as minimal as possible.

Similarly, comments on the register layout for the key powers would also
help. I imagine each YMM register holds 26 bits from one of 4 powers,
but then you need more than 5 registers if you need pieces with and with
the premultiply by 5? And also layout of registers used for accumulation.

Sorry I'm a bit slow reviewing.

Regards,
/Niels

-- 
Niels Möller. PGP key CB4962D070D77D7FCB8BA36271D8F1FF368C6677.
Internet email is subject to wholesale government surveillance.
_______________________________________________
nettle-bugs mailing list -- nettle-bugs@lists.lysator.liu.se
To unsubscribe send an email to nettle-bugs-le...@lists.lysator.liu.se

Reply via email to