Niels Möller <[email protected]> writes:

> I've got the code to work, and I've written an x86_64 assembly
> implementation using sse2 instructions. Code on the
> ghash-sidechannel-silent branch. On my laptop, I seem toget these
> numbers:
>
> Old C implementation: 350 MB/s
> Old asm implementation: 388 MB/s,
>
> New C implementation: 116 MB/s
> New asm implementation: 196 MB/s
>
> pclmul implementation: 4047 MB/s

[...]

> I can see some possible improvements; one could use the sign bit
> instead, replacing the first three instructinos by two: movaps X, M0;
> psrlq $63, M0. Or one could do 4 bits (e.g., sign bits 127, 95, 63, 31)
> instead of just 2, wit only two more pshufd to create the additonal
> masks. Together, I think that would be a loop of 17 instructions for
> doing 4 bits.

After these improvements, the new asm code runs at 246 MB/s, 228 cycles
for each loop iteration, which processes 4 message bits.

I've pushed the changes to the master-updates branch. I haven't measured
the slowdown on other machines, but I think it makes sense to fix this
side-channel leakage, even though it makes ghash significantly slower.

Regards,
/Niels

-- 
Niels Möller. PGP key CB4962D070D77D7FCB8BA36271D8F1FF368C6677.
Internet email is subject to wholesale government surveillance.
_______________________________________________
nettle-bugs mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to