On 22-10-2014 23:22, Trevor Perrin wrote:
> Robert Granger and Michael Scott report a fast E-521 implementation:
>
> http://eprint.iacr.org/2014/852
>
> Based on Haswell numbers, its efficiency seems similar to Goldilocks:
>
> https://docs.google.com/a/trevp.net/spreadsheet/ccc?key=0Aiexaz_YjIpddFJuWlNZaDBvVTRFSjVYZDdjakxoRkE&usp=sharing#gid=0
>
>
> DJB also timed it on Sandy Bridge, though his numbers are worse than
> I'd expect; not sure why:
>
> http://www.ietf.org/mail-archive/web/cfrg/current/msg05349.html

I have now had the chance to try to reproduce these timings on both 
microarchitectures. The paper states that the code
is rather fragile with respect to compilers and their different switches---I 
can certainly corroborate that. I got the
best results using one of 'g++-4.7 -O3 -fwrapv -fomit-frame-pointer 
-march=native' or 'g++-4.8 -O2 -fwrapv
-fomit-frame-pointer -march=native'. Both clang and icc were significantly 
slower regardless of which compiler
optimizations were enabled.

The Haswell cycle counts mentioned in the paper do not take Turbo Boost into 
account, and therefore are lower than the
real number; taking into account that the Core i7 4770 chip was used (3.4 to 
3.9 GHz overclocking), the Haswell cycle
count should be ~893000.  I have been able to get this slightly down to ~884000.

On Sandy Bridge, I get somewhat better timings than reported by DJB: ~1030000 
cycles. According to your spreadsheet,
this changes the score of E-521 to be better on Sandy Bridge than on Haswell 
(2.29 vs 2.07).



_______________________________________________
Curves mailing list
[email protected]
https://moderncrypto.org/mailman/listinfo/curves

Reply via email to