Have a look at the references in http://www.google.com/patents/US9148282
Particularly Siguna Mueller's Lucas-function based method.. Not sure about the cost.. Mike On Thu, Jan 21, 2016 at 11:30 PM, Mike Hamburg <[email protected]> wrote: > Hello Curves, > > I was porting an embedded EC library to work with other curves, and I was > wondering if anyone knew a compact, fast, deterministic algorithm for > square roots mod NIST p224 (or other annoying primes)? I’m not worried > about constant-time operation, since I probably have to blind anyway to > reduce other side channels, but a deterministic algorithm is preferable > because of possible real-time constraints. > > OpenSSL’s implementation takes something like 4700 multiplications on > average, which is more than a point*scalar multiply. > > Dan’s paper [1] gives a variant of the Tonelli-Shanks algorithm which is > fast and deterministic, requiring only 364 multiplies. But it’s very > heavy, with 1024 precomputed elements. Reducing the table size makes the > algorithm considerably slower. > > I’ve tried to combine Dan’s Tonelli-Shanks variant with Sutherland’s > algorithm, below. A variant with 4 precomputed points, a 64-element > discrete log table (not as big as it sounds) and 10 or so field elements in > memory takes 900 multiplies worst-case (~760 average-case). > > Cipolla’s algorithm is nondeterministic. > > Is Pocklington the way to go? It’s technically nondeterministic as well, > but if I read [1] correctly the failure probability is 2^-96. A > straightforward implementation probably takes more than 900 multiplies, but > hopefully it’s simpler and uses less memory, right? > > Is there some other algorithm I’m missing? > > Thanks, > — Mike > > [1] http://cr.yp.to/papers/sqroot.pdf > > #################################################### > > p = 2^224 - 2^96 + 1 > F = GF(p) > qnr = F(11)^(2^128-1) > > # 3-element precomputed table with roots of unity > precmp = dict(( > (3*2^i, qnr^2^(96-3*2^i)) > for i in [2,3,4] > )) > > # 64-element discrete log table > tbits = 6 > tlogger = qnr^2^(96-tbits) > dlog_table = dict(( > (tlogger^i,2^tbits-i) for i in xrange(2^tbits) > )) > > def isqrt(x): > # inverse square root of x, assuming x is a QR > # First, DJB’s addition chain > s = x^3 > s = s^2 * x # 2^3 - 1 > u = s^2^3 * s # 2^6 - 1 > s = u^2^6 * u # 2^12 - 1 > t = s^2^12 * s # 2^24 - 1 > s = t^2^24 * t # 2^48 - 1 > s = s^2^48 * s # 2^96 - 1 > s = s^2^24 * t # 2^120 - 1 > s = s^2^6 * u # 2^126 - 1 > s = s^2 * x > # here s == x^(2^127-1) > > ret = [s,s^2*x] # working powers of return value > rou = [qnr,qnr^2] # working roots of unity > stack = [95] > > def leaf(n): > dlog = dlog_table[ret.pop()]>>(tbits-n) > for i in xrange(len(rou)): > for j in xrange(n): > if dlog & (1<<j): ret[i] = ret[i] * rou[i] > rou[i] = rou[i]^2 > > # max stack depth is 4, so 5+5 field elements in ret+rou > while len(stack): > n = stack.pop() > ret.append(ret[-1]^2^floor(n/2)) > > if n <= 2*tbits: > leaf(ceil(n/2)) > rou.pop() > leaf(floor(n/2)) > else: > rou.append(precmp[ceil(n/2)]) > stack.append(floor(n/2)) > stack.append(ceil(n/2)) > > return ret[0] > > _______________________________________________ > Curves mailing list > [email protected] > https://moderncrypto.org/mailman/listinfo/curves > >
_______________________________________________ Curves mailing list [email protected] https://moderncrypto.org/mailman/listinfo/curves
