Re: [curves] Sandy2x

Tung Chou Wed, 30 Sep 2015 06:18:24 -0700

Hi Trevor,

Sandy2x takes 156076 Haswell cycles for X25519 shared-secret
computation. This is very close to the Ivy Bridge cycles. Note that,
however, the non-vectorized implementation from the Ed25519
paper performs much better on Haswell than on Ivy Bridge:
161648 cycles versus 182708 cycles.

Armando Faz-Hernández and Julio López have a Latincrypt paper
this year about an X25519 implementation targeting for Haswell.
They claim 1565xx Haswell cycles for shared-secret computation.
They use a 4-way vectorized multiplier to perform 2 field
multiplications/squarings at the same time. I think a better
approach would be to find 4 independent multiplications/squarings
in the formula and vectorize across them, but I haven't tried.

Best regards,
Tung Chou

On Tue, Sep 29, 2015 at 10:29 PM, Trevor Perrin <[email protected]> wrote:

> Tung Chou's "Sandy2x" code for 25519 on Sandy Bridge and Ivy Bridge is
> around 10-20% faster than other implementations:
>
> https://eprint.iacr.org/2015/943
>
> Speedup is attributed to using the 2-way 32x32->64 vectorized
> multiplier (vpmuludq) instead of the 64x64->128 serialized multiplier.
>
> The paper doesn't say whether this strategy also pays off on Haswell
> (which seems to be lagging in 25519 performance?):
>
>
> https://docs.google.com/spreadsheets/d/1SO3NGX-EgIZ1slw9uExb5FoeFy5TVkuA2lEutP6roYI/edit#gid=0
>
>
> Trevor
> _______________________________________________
> Curves mailing list
> [email protected]
> https://moderncrypto.org/mailman/listinfo/curves
>

_______________________________________________
Curves mailing list
[email protected]
https://moderncrypto.org/mailman/listinfo/curves

Re: [curves] Sandy2x

Reply via email to