On 01/19/2015 08:26 PM, Trevor Perrin wrote:
So his argument that we should consider full-radix when choosing performance-oriented primes seems reasonable to me. Seems like we should try to get more analysis and timings for full-radix Ted37919, Curve41417, Goldilocks, E-521, etc.
I think it's reasonable.  Here's some analysis for those primes.

The point of the Goldilocks prime is that Karatsuba is a win at that size, and Karatsuba is a pain on full-radix. So even though Ed448 is suitably aligned for a full-radix implementation on 32 bits, that would take 14^2 = 196 muls vs 16^2 * 3/4 = 192 for Karatsuba. The extra adds for Karatsuba cost less than the carry handling for full-radix unless you have some sort of accelerator, and I don't think UMAAL is enough of an accelerator to make up that difference. 64-bit is similar except the coefficients aren't aligned. So it's almost always a win to use Karatsuba and reduced-radix.

E-521 takes 9 limbs on 64-bit in either full- or reduced-radix, so there's no point in doing full-radix. On 32-bit, it's 1 - 17^2/18^2 ~ 11% fewer muls to do full-radix, but again you lose the ability to use Karatsuba or Chung-Hasan or the Granger-Scott IBDWT, so even if carry-handling is free that's not a win unless you want the simplest implementation. For further comparison, MS would rather pick an 8-or-16-limb prime (on full-radix) and get a 21% reduction in muls, but they still pay a carry-handling penalty and lose Karatsuba/CH/IBDWT.

This analysis doesn't support Longa's point, since it shows cases where reduced radix is very fast, even faster than full-radix on a comparable-sized prime on a machine where that's fast. I think the point is more valid for 2^389-21, since it's just over a power of 2^32.

I don't know about Curve41417 and Ted37919. It would probably be close for Ted37919, since there the cost reduction is 1 - 6^2/7^2 ~ 27% and Karatsuba doesn't apply (at least on 64-bit).
On Mon, Jan 19, 2015 at 5:03 PM, D. J. Bernstein <[email protected]> wrote:
There's a long history of hard-to-catch errors in full-radix software.
https://cryptojedi.org/papers/#verify25519 is the first step towards
confidently getting out of this mess---and it's significantly _easier_
for reduced-radix software than for full-radix software.
That's interesting and not intuitive to me.  I'd think reduced-radix
was trickier to implement and analyze.  Huh...
The carry chains in full-radix software can be a pain to get right. In my experience, it's not so bad when you have some headroom (2^255-19 and 2^379-19). But if the prime is very close to a power of the word size (2^256-189 or the NIST Solinas primes), then carries can propagate really far. This costs performance, and it's tempting to come up with a clever, hard-to-analyze trick to limit the propagation.

-- Mike
_______________________________________________
Curves mailing list
[email protected]
https://moderncrypto.org/mailman/listinfo/curves

Reply via email to