Maamoun TK <maamoun...@googlemail.com> writes: > Yes, this is exactly how I do it. Four messages arranged vertically in YMM > registers.
Could you add comments explaining the register layout in a bit more detail? From this, I take it you use 5 message registers, each one holding 26 bits from each of 4 messages (in which order?), and half of those ymm registers unused, due to the way vpmuludq works? > Right, I exploited every possible way to keep the overhead inside the inner > loop as minimal as possible. Similarly, comments on the register layout for the key powers would also help. I imagine each YMM register holds 26 bits from one of 4 powers, but then you need more than 5 registers if you need pieces with and with the premultiply by 5? And also layout of registers used for accumulation. Sorry I'm a bit slow reviewing. Regards, /Niels -- Niels Möller. PGP key CB4962D070D77D7FCB8BA36271D8F1FF368C6677. Internet email is subject to wholesale government surveillance. _______________________________________________ nettle-bugs mailing list -- nettle-bugs@lists.lysator.liu.se To unsubscribe send an email to nettle-bugs-le...@lists.lysator.liu.se