Re: [curves] Balancing reduced-radix and full-radix performance for extra-strength primes

Mike Hamburg Mon, 19 Jan 2015 22:10:03 -0800


On 01/19/2015 08:26 PM, Trevor Perrin wrote:

So his argument that we should consider full-radix when choosingperformance-oriented primes seems reasonable to me. Seems like weshould try to get more analysis and timings for full-radix Ted37919,Curve41417, Goldilocks, E-521, etc.

I think it's reasonable.  Here's some analysis for those primes.

The point of the Goldilocks prime is that Karatsuba is a win at thatsize, and Karatsuba is a pain on full-radix. So even though Ed448 issuitably aligned for a full-radix implementation on 32 bits, that wouldtake 14^2 = 196 muls vs 16^2 * 3/4 = 192 for Karatsuba. The extra addsfor Karatsuba cost less than the carry handling for full-radix unlessyou have some sort of accelerator, and I don't think UMAAL is enough ofan accelerator to make up that difference. 64-bit is similar except thecoefficients aren't aligned. So it's almost always a win to useKaratsuba and reduced-radix.

E-521 takes 9 limbs on 64-bit in either full- or reduced-radix, sothere's no point in doing full-radix. On 32-bit, it's 1 - 17^2/18^2 ~11% fewer muls to do full-radix, but again you lose the ability to useKaratsuba or Chung-Hasan or the Granger-Scott IBDWT, so even ifcarry-handling is free that's not a win unless you want the simplestimplementation. For further comparison, MS would rather pick an8-or-16-limb prime (on full-radix) and get a 21% reduction in muls, butthey still pay a carry-handling penalty and lose Karatsuba/CH/IBDWT.

This analysis doesn't support Longa's point, since it shows cases wherereduced radix is very fast, even faster than full-radix on acomparable-sized prime on a machine where that's fast. I think thepoint is more valid for 2^389-21, since it's just over a power of 2^32.

I don't know about Curve41417 and Ted37919. It would probably be closefor Ted37919, since there the cost reduction is 1 - 6^2/7^2 ~ 27% andKaratsuba doesn't apply (at least on 64-bit).

On Mon, Jan 19, 2015 at 5:03 PM, D. J. Bernstein <[email protected]> wrote:
There's a long history of hard-to-catch errors in full-radix software.
https://cryptojedi.org/papers/#verify25519 is the first step towards
confidently getting out of this mess---and it's significantly _easier_
for reduced-radix software than for full-radix software.

That's interesting and not intuitive to me.  I'd think reduced-radix
was trickier to implement and analyze.  Huh...

The carry chains in full-radix software can be a pain to get right. Inmy experience, it's not so bad when you have some headroom (2^255-19 and2^379-19). But if the prime is very close to a power of the word size(2^256-189 or the NIST Solinas primes), then carries can propagatereally far. This costs performance, and it's tempting to come up with aclever, hard-to-analyze trick to limit the propagation.


-- Mike
_______________________________________________
Curves mailing list
[email protected]
https://moderncrypto.org/mailman/listinfo/curves

Re: [curves] Balancing reduced-radix and full-radix performance for extra-strength primes

Reply via email to