> On Oct 27, 2014, at 6:24 PM, Trevor Perrin <[email protected]> wrote: > > On Mon, Oct 27, 2014 at 4:42 PM, Michael Hamburg <[email protected]> wrote: >> I just checked in a (still experimental!) mod to the Goldilocks code which >> enables p521. Compile with: > [...] >> ecdh: 803kcy > > Cool, added: > > https://docs.google.com/a/trevp.net/spreadsheet/ccc?key=0Aiexaz_YjIpddFJuWlNZaDBvVTRFSjVYZDdjakxoRkE&usp=sharing#gid=0 > > > I know you ruled out Ed480-Ridinghood for performance on 32-bit > systems (I guess 8x60-bit limbs works on 64-bit, but 16x30 isn't > enough headroom on 32-bit?) > > How do you expect E-521 to perform on 32-bit? (18x29?) > > Trevor
Yeah, so 16x30 won’t work on 32-bit without carry propagation, because there are less than 4 bits of headroom (2^64 / 2^60, but the input isn’t perfectly reduced), but the multiplication and reduction produces coefficients as large as 38. > ((x^16-1)/(x-1)).numerator()^2 % (x^16 - x^8 - 1) 24*x^15 + 26*x^14 + 28*x^13 + 30*x^12 + 32*x^11 + 34*x^10 + 36*x^9 + 38*x^8 + 16*x^7 + 17*x^6 + 18*x^5 + 19*x^4 + 20*x^3 + 21*x^2 + 22*x + 23 It might be worth trying anyway to see how slow it would be. Another option is 20x24, which is going to be ~50% slower (20^2 *3/4 instead of 16^2 *3/4 multiplies) but doesn’t have carry handling problems. For 2^521-1 as 18x29, the biggest coefficient in that reduction is 35, and there are 6 bits of headroom. So it works, but it requires a reduction before every multiply. I’m not sure what the best multiply algorithm would be between Karatsuba, 3-way Chung Hasan (like on 64-bit) or the new Granger-Scott formula (which I should try on 64-bit, maybe it’s tuneable). It probably would work pretty well in NEON, but I don’t think it would be quite as efficient as Goldilocks. Another option would be 20x26, and deal with the extra bit somehow. Sorry I don’t have better numbers for you right now. Cheers, — Mike _______________________________________________ Curves mailing list [email protected] https://moderncrypto.org/mailman/listinfo/curves
