Thanks Trevor!

> On Jan 19, 2015, at 1:47 PM, Trevor Perrin <[email protected]> wrote:
> 
> At Real World Crypto, Patrick Longa discussed the question of choosing
> efficient field primes for extra-strength curves.  He suggested such
> primes should be chosen to strike a balance between efficiency in
> full-radix and reduced-radix implementations (i.e. "saturated" and
> "unsaturated" arithmetic):
> 
> http://www.realworldcrypto.com/rwc2015/program-2/RWC-2015-Longa.pdf?attredirects=0
> 
> 2^379 - 19 was given as an example.
> 
> Most of the discussion of extra-strength primes I've seen assumes
> reduced-radix implementations, so this is an interesting twist.  His
> argument for considering full-radix was:
> 
> 1) Full-radix is more efficient on several platforms ("AMD, "Intel
> Atom, Intel Quark, ARM w/o Neon, microcontrollers"), whereas
> reduced-radix is most advantageous on more advanced processors ("Intel
> desktop/server", "ARM with NEON”).

This makes sense.  Some points

* Intel’s desktop/server parts have a really fast MUL and slowish ADC compared 
to everything else, so they favor reduced-radix computations which have more 
MULs and fewer ADCs.

* Anything running on a vector unit (NEON) won’t have access to ADC at all, so 
it will suck on full-radix.

* Medium-to-high-end ARM parts (M4 or higher) have an instruction UMAAL to 
accelerate full-radix arithmetic, though it still isn’t as fast as NEON.  They 
also have an instruction UMLAL which accelerates reduced-radix arithmetic.

* Low-end ARMs have UMLAL but not UMAAL.  Some might not even have that.  Parts 
with UMLAL and not UMAAL will favor reduced-radix.

* Other platforms may be in-between, if they don’t have acceleration for 
either.  A possible exception is RISC architectures with no carry flag and no 
magic accum register (eg RISC-V), which tend to perform terribly on full-radix 
code.

* Superscalar cores can multi-issue ADDs for the reduced-radix add/sub routine, 
but can’t multi-issue ADC.

* The advantage of full-radix decreases as the number of limbs increases, 
because it’s usually n^2 vs (n+1)^2.  That’s why for Curve25519 favors 
full-radix.

So the message that different processors favor different organizations is 
correct, and it sort of correlates with CPU size, but not perfectly.

> 2) Full-radix may be safer and easier to implement, since
> reduced-radix requires "Bound analysis" to prevent inadvertent word
> spilling, thus is "error prone, errors are more difficult to catch”.

Maybe.  With full-radix it can be annoying to get the carries right, though 
it’s easier if the size is not exactly a multiple of 64 (eg 2^379 instead of 
2^384) because then there’s headroom.

> I'm not qualified to assess (1).  But if it's true that full-radix
> implementations are faster on more primitive platforms that seems
> significant, since optimizing for these is probably more important
> than achieving maximum speed on advanced processors.

I agree.

> Anyways, I'm curious what other people think of this approach, and of
> 2^379 - 19 in particular.

Seems fine to me.  The prime 2^398 - 21 did bug me a little bit for this reason.

> The slides give performance numbers for a "Ted37919" curve based on
> this prime on Sandy Bridge, which I added to the spreadsheet here:

> https://docs.google.com/a/trevp.net/spreadsheet/ccc?key=0Aiexaz_YjIpddFJuWlNZaDBvVTRFSjVYZDdjakxoRkE&usp=sharing#gid=0
>  
> <https://docs.google.com/a/trevp.net/spreadsheet/ccc?key=0Aiexaz_YjIpddFJuWlNZaDBvVTRFSjVYZDdjakxoRkE&usp=sharing#gid=0>


NB those numbers do not include point decompression.

Cheers,
— Mike
_______________________________________________
Curves mailing list
[email protected]
https://moderncrypto.org/mailman/listinfo/curves

Reply via email to