On Jan 30, 2014, at 10:45 PM, Robert Ransom <[email protected]> wrote:

> On 1/30/14, Diego Aranha <[email protected]> wrote:
> 
>> Your comments are very relevant, but let me justify some of our design
>> choices. We picked field sizes similar to NIST curves, trying to provide
>> something closer to drop-in replacements.
> 
> A true drop-in replacement for one of the NSA curves would be a
> small-parameter Edwards curve over the same field, satisfying the
> ‘SafeCurves’ criteria, with a=1 and non-square d, such that:
> 
> * the isogenous Montgomery form (*not* the isomorphic one; see the
> ‘Isogeny patterns among Edwards curves’ thread) is isomorphic to a
> short-Weierstrass curve with a=-3; and
> 
> * if the field order is congruent to 3 mod 4, the isogenous Edwards
> curve a'=-1, d'=d-1 has non-square d/a (so square d).
> 
> Existing implementations of the NSA curves could simply swap out their
> short-Weierstrass b to start using the new curves; over time,
> implementations could be updated to use faster, safer formulas on the
> new curves.

We could start with x^2 + y^2 = 1 - 14666 x^2 y^2 mod 2^192-2^64-1.  The 
isogenous curve — y^2 = x^3 + 58666*x^2 + x — is isomorphic to y^2 = x^3 - 3*x 
+ 6047900113480193987160910265022055632294672911518856488260.

>> Additionally, we considered not
>> only vector or hardware implementations, but also the fast integer
>> multipliers already available to software implementations in many
>> platforms. Of course, these could require specialized assembly-language
>> multipliers for optimal performance. You can find some brief notes below.
> 
> You seem to be assuming that (a) the implementor can use (an
> equivalent of) the Intel ADC instruction, and (b) ADC is fast.  (a)
> does not hold for C implementations; according to the Ed25519 paper,
> (b) does not hold for whichever of Intel's AMD64 chips was newest at
> that time.

You can access ADC by using __uint128_t on clang and gcc (and possibly on other 
platforms).  It's still faster in assembly or intrinsics, mostly due to the 
register allocator barfing on EC numerics code, but it still pretty much works 
in C.

ADC is also passably fast on most processors, though it works better on AMD 
than Intel I've heard.  At 256 bits, it's not necessarily worth having extra 
limbs to reduce the number of ADC instructions.  At 384 bits, it may be worth 
going to extra limbs, and also Karatsuba may be profitable.  By 448 bits, you 
almost definitely want both reduced-radix and Karatsuba.  I think 2^521-1 
probably wants 9x58-bit limbs and 3-way Karatsuba, but I haven't tuned my 
implementation yet.

Diego, have you implemented arithmetic mod the primes in your paper?  Do you 
know whether they're fast or not, and with what implementations, and maybe even 
on what platforms, or are you speculating?

Cheers,
-- Mike

_______________________________________________
Curves mailing list
[email protected]
https://moderncrypto.org/mailman/listinfo/curves

Reply via email to