Niels Möller <[email protected]> writes: > I've got the code to work, and I've written an x86_64 assembly > implementation using sse2 instructions. Code on the > ghash-sidechannel-silent branch. On my laptop, I seem toget these > numbers: > > Old C implementation: 350 MB/s > Old asm implementation: 388 MB/s, > > New C implementation: 116 MB/s > New asm implementation: 196 MB/s > > pclmul implementation: 4047 MB/s
[...] > I can see some possible improvements; one could use the sign bit > instead, replacing the first three instructinos by two: movaps X, M0; > psrlq $63, M0. Or one could do 4 bits (e.g., sign bits 127, 95, 63, 31) > instead of just 2, wit only two more pshufd to create the additonal > masks. Together, I think that would be a loop of 17 instructions for > doing 4 bits. After these improvements, the new asm code runs at 246 MB/s, 228 cycles for each loop iteration, which processes 4 message bits. I've pushed the changes to the master-updates branch. I haven't measured the slowdown on other machines, but I think it makes sense to fix this side-channel leakage, even though it makes ghash significantly slower. Regards, /Niels -- Niels Möller. PGP key CB4962D070D77D7FCB8BA36271D8F1FF368C6677. Internet email is subject to wholesale government surveillance. _______________________________________________ nettle-bugs mailing list -- [email protected] To unsubscribe send an email to [email protected]
