Hi all, Rambus has agreed to release the code! It’s on Sourceforge (for export control reasons) at
https://sourceforge.net/p/ed448goldilocks/code/ci/master/tree/ The code is still experimental, and I haven’t picked a hash yet (suggestions?). So you can do keygen and ECDH, except that the output of ECDH isn’t hashed. The base operations for sign and verify are there too, but I haven’t implemented those operations yet. I should be able to update the repository without future export control issues. Cheers, — Mike On Feb 12, 2014, at 1:50 PM, Michael Hamburg <[email protected]> wrote: > To be clear, the requirements from our legal department are: > * I need to ask them before releasing code, and > * I need to comply with US export control law, which (as Legal interprets it) > requires preventing IPs associated with a few specific countries from > downloading the software. > > I don’t believe that GitHub can filter by country (please tell me if that’s > changed), but SourceForge or some other service might be able to. > > Anyway, I haven’t gotten the ball rolling yet, so it might take a while to > get permission, especially since the software isn’t done yet. But I’ll see > what I can do. > > Cheers, > — Mike > > On Feb 12, 2014, at 12:26 PM, Michael Hamburg <[email protected]> wrote: > >> Hi Trevor, >> >> Not a github initially, because of Rambus legal and export control and all >> that. I’ll see if I can set up something more private and get back to you. >> >> Cheers, >> — Mike >> >> On Feb 12, 2014, at 11:22 AM, Trevor Perrin <[email protected]> wrote: >> >>> Could we expect a github? I'd love to see this! >>> >>> Trevor >>> >>> >>> On Tue, Feb 11, 2014 at 12:31 AM, Mike Hamburg <[email protected]> wrote: >>>> Hello curves, >>>> >>>> I've been working on implementation for the new curves, and I'd like to >>>> report status and some formulas and issues I found. >>>> >>>> I'm aiming for a fairly generic C/intrinsics implementation which should >>>> support any curves with minimal extra effort, but I'm starting with >>>> Ed448-Goldilocks because it's mine. I have Haswell and Sandy Bridge test >>>> machines. I also have a vectorless Cortex A9, but it doesn't work yet >>>> because I'm using 64x64->128-bit multiply intrinsics. Here's what I've >>>> found so far. >>>> >>>> If you have any suggestions on the formulas or algorithms, I'd definitely >>>> appreciate it. >>>> >>>> Field arithmetic: >>>> * Karatsuba is beneficial for Ed448. >>>> * Radix 2^56 in a 64-bit limb, 8 limbs. >>>> * M ~ 153cy on Sandy Bridge, 125cy on Haswell >>>> * square ~ 0.75M >>>> * small fixed mul ~ 0.25M >>>> * add/sub (unreduced) ~ 0.04M, a little cheaper on Haswell because of AVX >>>> >>>> I'm using the 1/sqrt(x) point encoding for now, basically because I already >>>> have code for that from an earlier project. I'm not yet counting the time >>>> to serialize and deserialize field elements, which is maybe 100 cyles at >>>> most (counting the full reduce / checking that input is fully reduced). >>>> I'm >>>> not yet counting hashing or RNG times. >>>> >>>> My earlier email about 1/sqrt(x) was slightly off: it encodes even points >>>> on >>>> the curve, but odd points on the twist. >>>> >>>> I haven't tried blind+EGCD for inverses or Legendre symbol checks. It >>>> might >>>> well be a win. One inverse square root is 56k Sandy cycles (I don't >>>> remember the Haswell number). >>>> >>>> Full Montgomery ladder: >>>> * Decompress. >>>> * Constant-time ladder by 448-bit scalar. The scalar should be even for >>>> security. It actually could be 447 bits. >>>> * Recompress. Reject points on the twist. This is basically free, but >>>> important because they can't be encoded with the 1/sqrt(x) encoding. >>>> >>>> This takes about 571kcy on Haswell, and 688kcy on Sandy, corrected for >>>> TurboBoost. >>>> >>>> I'm using the formula from the thread on efficient laddering with the >>>> isomorphic curve, but twisted. Let (xd,zd) be the point to de doubled, and >>>> (xa,za) be the point to be added. >>>> A = (xd+zd) >>>> B = (xd-zd) >>>> DA = (xa-za)*A >>>> BC = (xa+za)*B >>>> >>>> oxa = (DA+BC)^2 >>>> oza = (DA-BC)^2 * xbase >>>> >>>> AA = A^2 >>>> BB = B^2 >>>> AAod = AA*(1-d) >>>> E = AA-BB >>>> >>>> oxd = AAod*BB >>>> ozd = E*(AAod-E) >>>> >>>> return (oxd,ozd,oxa,oza) >>>> >>>> Except I'm actually using zbase instead of xbase, because of the 1/sqrt(x) >>>> format. >>>> >>>> Twisted Edwards (a=-1) windowed algorithm: >>>> * Assumes that cofactor is canceled somehow. >>>> * Recode scalar in signed form, because it's easy and I'm lazy. >>>> * Compute 8 odd multiples of P. >>>> * Constant-time add/sub chain with a 4-bit window, 448 bits. Could be 446 >>>> bits, except that 446 isn't divisible by 4. >>>> * No compress or decompress. >>>> >>>> This takes slightly less time than the Montgomery ladder, some 530kcy on >>>> Haswell and 636 kcy on Sandy. A 5-bit window makes things maybe 1-2% >>>> faster, but uses extra complexity and memory so I didn't think it was >>>> worthwhile. >>>> >>>> I'm using readdition coordinates: >>>> "Projective half-niels" for the tables, ((y-x)/2 : (y+x)/2 : dxy : 1) * z. >>>> "Lazy extended coordinates" for the accumulator, (x : y : z : t : u) where >>>> xy = tuz. >>>> >>>> I might replace the lazy extended coordinates with Hisil et al's lookahead >>>> extended-or-not coordinates, which use less memory but require more care. >>>> >>>> Full constant-time scalarmul using twisted Edwards: >>>> * Decompress points, rejecting those on the twist. >>>> * Isogenize to the twisted curve, canceling the cofactor. >>>> * Above windowed algorithm. >>>> * Isogenize back to the main curve, effectively multiplying by 4. >>>> * Recompress. >>>> >>>> This takes slightly longer than the Montgomery ladder: something like >>>> 633kcy >>>> on Haswell and 750kcy on Sandy. So Edwards or twisted Edwards is best for >>>> points you've already got in projective form, and Montgomery is best for >>>> ECDH. Unsurprising. >>>> >>>> The total executable code size to test and bench the arithmetic and curve >>>> routines is currently around 41k under clang -O4 -fPIC. That'll get bigger >>>> once there are precomputed tables. >>>> >>>> Formulas: >>>> I'm making use of the "inverse square root trick": >>>> >>>> def trick(a,b,i): >>>> # assumes p==3 mod 4; similar trick exists for 1 mod 4 >>>> # returns sqrt(+-a/b), 1/i, is_square(a/b) >>>> # assumes a,b,i are nonzero >>>> ai = a*i >>>> abi = b*ai >>>> s = 1/sqrt(+- abi*i) # using a powering ladder >>>> output sqrt(+-a/b) = s*ai >>>> s2abi = s^2*abi >>>> issquare = s2abi * i # = Legendre symbol >>>> if you care about the result of 1/i when a/b is nonsquare: >>>> output 1/i = s2abi*issquare >>>> else: >>>> output 1/i = s2abi >>>> >>>> You can tweak the trick to change the Legendre symbol of the output >>>> according to some other variable as well; this depends on the residue of p >>>> mod 8. >>>> >>>> The formula I'm using for point compression with Montgomery form is: >>>> >>>> Let P1 + P2 = P3 and (u1,v1) = P1 etc. Then >>>> 4*v1*v2*u3 = (u1*u2-1)^2 - u3^2*(u1-u2)^2 >>>> >>>> To compute the numerator of the RHS, do: >>>> sa = (z2*z1 - x2*x1) * z3 >>>> sb = (x2*z1 - z2*x1) * x3 >>>> numerator = (sa + sb) * (sa - sb) >>>> This is good enough to get the Legendre symbol. It shouldn't be too hard >>>> to >>>> convert this into a formula with some other sign bit using the inverse >>>> square root trick. >>>> >>>> This is on an untwisted (B=1) curve, but the same "ought" to be true of >>>> 4*B*v1*v2*u3 on a twisted one. >>>> >>>> To serialize an Edwards point, we have to deal with the fact that the >>>> isomorphic curve you'd get from Wikipedia is twisted, because it sets B = >>>> 4/(1-d) which isn't square, at least when p==3 mod 4. So I'm negating x to >>>> get to the curve: >>>> 4y^2/(d-1) = x^3 + 2(d+1)/(d-1) * x^2 + x >>>> where you can then scale y by sqrt(4/(d-1)) to get the standard curve. >>>> >>>> To deserialize an Edwards point, compute >>>> denominator = (u+1)^2 * (d-1) + 4u >>>> x = 2 sqrt(u/denominator) >>>> y = (1+u)/(1-u) >>>> using the inverse square root trick. This lands you on E_(1,d), because it >>>> scales the x-coordinate to get rid of the twisting that the obvious >>>> decompression would give you. >>>> >>>> You have to check if u=0 or u=1. The latter isn't on the curve, but you >>>> have to make sure it doesn't slip past the check due to the zero divide. >>>> The former works in the 1/sqrt(x) encoding without any checks. >>>> >>>> To do: >>>> I'm planning to use WNAF for variable time scalar mul, WNAF for signature >>>> verification, and a precomputed signed comb for key generation and Schnorr >>>> signing. >>>> >>>> I'm experimenting with the best way to implement Elligator. I currently >>>> only have the map to the curve done, and I might change the signs. My >>>> implementation maps directly to affine using the inverse square root trick. >>>> I'll report the formula once I'm done messing around with it. >>>> >>>> And of course there's API packaging, testing on ARM, etc. >>>> >>>> Cheers, >>>> -- Mike >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> Curves mailing list >>>> [email protected] >>>> https://moderncrypto.org/mailman/listinfo/curves >>>> >> >> _______________________________________________ >> Curves mailing list >> [email protected] >> https://moderncrypto.org/mailman/listinfo/curves >
_______________________________________________ Curves mailing list [email protected] https://moderncrypto.org/mailman/listinfo/curves
