Thank you, Mike, for your reply!

>In addition to FourQ and Curve19119, other fast-ish options include 2^252-2^232-1 (but maybe not on tiny micros); >NIST’s 2^192-2^64-1 (but again maybe not on the M0?); and the Goldi-like 2^216-2^108-1. But I agree that you
>should stick with Curve25519 if you can afford it.

I've looked a bit closer at FourQ in the last days. I came to the conclusion that the speedup factor of ~2 reported by Liu, Longa, Pereira, Reparaz and Seo [LLPRS] in https://eprint.iacr.org/2017/434.pdf are realistic for the AVR and MSP430. I moreover expect that this might hold also for the M0 target (not reported by LLPRS).

It must, however, be clear that one buys this with significantly larger RAM requirements (that is most precious, particularly in the very small targets) and I believe that it will also be much harder to protect FourQ against side channel attacks, especially regarding fault injection, without loosing the speed advantage.

Elligator 2 works for any elliptic curve (over a field of odd characteristic 
IIRC) with at least one point of
order 2.  So you can use it on FourQ.

Thank you for pointing this out. I think that this makes FourQ for these targets also an interresting choice for PAKE.

I don't believe, however that the claimed huge speedup factor for the M4 from LLPRS actually holds. The Curve25519 figures taken for the comparision with FourQ for the M4 (~1.4 million cycles cor X25519) are in my perception much less optimized than the LLPRS implementation for their "complex" composed field.

I expect that actual speed advantage of FourQ on the M4 might in fact only be in the range of ~25%. For the M4, the UMAAL instruction gives you carry accumulation essentially for free and I know that constant-time field arithmetics on the M4 for the highly regular fe25519 field is actually a factor of ~1.5 faster for Curve25519 than the values reported for FourQ by LLPRS. I think that this will eat up most of the advantage from the endomorphisms. Actually this might hold also for other "larger" targets bigger than the M4, where the regular 255 bit field could be implemented ay faster than the more complex "complex" field of FourQ?
In addition to FourQ and Curve19119, other fast-ish options include

NIST’s 2^192-2^64-1 (but again maybe not on the M0?);

The problem in my opinion is to implement the many conditional additions for the solinas prime in constant time. At least, I did try it and I did not find a way to implement that efficiently.
and the Goldi-like 2^216-2^108-1 or 2^252-2^232-1 (but maybe not on tiny 
micros);
Here I expect that the fact that the field is not really much smaller than for 2^255 - 1 will be the reason that prevents significant speedups in comparison to Curve25519.

I realized at some point that “SPAKE2-EE” is actually an existing protocol 
called “PAK”, which is why I didn’t ever write it up.

On a constrained device, a SPEKE variant (like PACE) is a faster and simpler 
choice.  Also I thought the SPEKE patent expired, so why not use that?  SPAKE2 
(-EE and otherwise) has a stronger security proof, supports augmentation, and 
is potentially faster with precomputation.
Actually I did mix up EKE and SPEKE regarding the patent that has expired this spring. We had chosen PACE predominantly because our first products came on the market way earlier than the expiration date. I think also that it is a good idea to make both parties contribute entropy for generating a session-specific ephemeral generator, such as done in PACE.

When using such an ephemeral generator, one will need to mix in the entropy from both parties together with the password somehow. I think that the method used by PACE for doing this is not so bad. We do want predominantly protect the password. Note that regarding side channels it might be beneficial to actually use the password only in a symmetric encryption primitive (such as in PACE). I expect that it will be much easier to protect a symmetric cypher against side channel leakage (e.g. by masking) than e.g. a SHA variant function used for mixing the entropies.
I don’t know if there is an augmented version of SPEKE.
Yes it is possible to augment SPEKE. We have ourselves developed an augmented version of PACE for our bluetooth product line. It's not yet published but we are working on a paper on that topic.
It is possible to compute Elligator straight to affine with only one 
exponentiation, using the inverse square root trick.

You want x = -A/(1+ur^2), or -A-that, so you need to simultaneously divide by 
1+ur^2 and compute the Legendre symbol of the putative y^2 = x(x^2+Ax+1).  If 
you compute that projectively to get y^2 = a/b, then L(a/b) = L(ab), so you 
don’t need to do the division first.  Let c=ab, d=1+ur^2.  Then you can compute

s := (cd^2) ^ ((p-3)/2)

Then (s^2 * cd^2) = (cd^2)^(p-2) = 1/cd^2, so that 1/d = (s^2 * cd^2) * cd 
unless c=0 or d=0.  But if c=0 then the point is (0,0) or infinity, and if d=0 
then it’s infinity, so in either case you should return 0 anyway.  At the same 
time, (s * cd^2) = (cd^2)^((p-1)/2) = L(cd^2) = L(c) unless d=0.  So that gives 
you the inverse and Legendre symbol at the same time.  A similar trick with 
(cd^2)^((p-5)/8) can give you the square root, but you don’t actually need that 
here if you’re using an x-only ladder.
Thank you for pointing this out. I did not yet find time to analyze the details, but this will indeed give an additional speedup for any SPEKE and PACE implementation running on an curve for that Elligator2 is suitable.

Thank you again, Mike, for your valuable feedback.

Yours,

Björn.
_______________________________________________
Curves mailing list
[email protected]
https://moderncrypto.org/mailman/listinfo/curves

Reply via email to