(Collision resistance in a separate message, as that requires a longer
explanation.)

> Is this on Sandy Bridge or something else?  What do your benches look like?

Have only run it on my (very-ancient) Nehalem laptop thus far; output
of build/bench attached. (I'll try to get some Sandy Bridge numbers on
AWS at some point.)

(More than fast enough for me, btw.)

> Is it fairly expensive?  I didn’t see a significant difference in benchmarks, 
> between doing that step and leaving it out.

No, you're right! I thought it was, but it appears to be within noise.

> Keccak is good stuff, but I think SHA-512 is currently more conservative and 
> respected, and it’s definitely more widely deployed.  Blake2 just hasn’t had 
> enough time in the spotlight.  Of course, in your fork you can do whatever 
> you want :-)

(My problem with SHA2-512: essentially same strategies as SHA-1
(believed to) work to attack it, but are computationally costly. Too
expensive for academic cryptographers, well-within potential
adversaries' budgets. As a result, I think that Keccak has seen more
(public) study by this point than SHA2 has. But de gustibus. Agreed on
BLAKE2; only a good choice if performance is really critical.)

> Hardware Keccak may be somewhat difficult to come by.  You can’t just make an 
> instruction which does one round of it on a vector register, because the 
> state is 1600 bits.  Even for SHA, only two shipping processors that I know 
> of (Apple A7 and VIA’s chips) have instructions for SHA2, and that’s only 
> SHA256.  Having a separate one-per-chip accelerator core is a pain because of 
> context switches and such.

For Intel, not likely to see until after AVX-512; will need to use
multiple registers, but this isn't problematic. (Intel is shipping
SHA1&2 extensions soonish, btw.)

(Can, alternatively, have an instruction that operates on memory and
occupy hardware (rather than the virtual named) registers to ship data
to execution units; as I understand it, this is essentially what's
done for some instructions that produce microcode loops at present.)
Nehalem 2.66 GHz

mul:          76.3ns
sqr:          51.5ns
mul dep:      71.6ns
mulw:         20.6ns
rand448:     191.3ns
SHAKE256 1blk: 777.6ns
SHAKE256 blk:  832.3ns (153.79 MB/s)
isr auto:     24.5µs
elligator:    25.5µs
decompress:   25.4µs
compress:     25.5µs
barrett red: 282.5ns
barrett mac: 1162.3ns
exti+niels:  530.4ns
exti+pniels: 600.6ns
exti dbl:    452.4ns
i->a isog:   444.6ns
a->i isog:   455.7ns
monty step:  587.1ns
full ladder: 302.1µs
edwards smz: 288.6µs
edwards svl: 263.8µs
edwards smc: 309.2µs
edwards vtm: 251.4µs
wnaf6 pre:   106.3µs
edwards vt6: 224.8µs
wnaf4 pre:    47.0µs
edwards vt4: 234.5µs
wnaf5 pre:    62.1µs
edwards vt5: 225.7µs
vt vf combo: 289.0µs
edwards sm:  342.4µs
pre(5,5,18): 348.2µs
pre(3,5,30): 290.1µs
pre(5,3,30): 249.6µs
pre(15,3,10):324.1µs
pre(8,4,14): 321.1µs
com(5,5,18):  73.3µs
com(3,5,30):  77.7µs
com(8,4,14):  76.7µs
com(5,3,30): 102.0µs
com(15,3,10): 92.4µs

Goldilocks:
keygen:       99.2µs
ecdh:        310.3µs
sign:        104.1µs
verify:      341.3µs
precompute:  374.7µs
verify pre:  137.7µs
ecdh pre:    102.0µs



mul:          74.5ns
sqr:          50.0ns
mul dep:      69.4ns
mulw:         20.4ns
rand448:     214.0ns
SHAKE256 1blk: 843.5ns
SHAKE256 blk:  964.5ns (132.71 MB/s)
isr auto:     24.3µs
elligator:    25.6µs
decompress:   24.9µs
compress:     24.4µs
barrett red: 284.0ns
barrett mac: 1207.6ns
exti+niels:  521.0ns
exti+pniels: 584.9ns
exti dbl:    445.4ns
i->a isog:   450.5ns
a->i isog:   449.3ns
monty step:  599.0ns
full ladder: 295.5µs
edwards smz: 268.4µs
edwards svl: 255.5µs
edwards smc: 293.4µs
edwards vtm: 241.9µs
wnaf6 pre:   103.5µs
edwards vt6: 217.8µs
wnaf4 pre:    42.5µs
edwards vt4: 223.8µs
wnaf5 pre:    64.0µs
edwards vt5: 216.9µs
vt vf combo: 288.4µs
edwards sm:  338.9µs
pre(5,5,18): 314.0µs
pre(3,5,30): 281.9µs
pre(5,3,30): 247.1µs
pre(15,3,10):316.9µs
pre(8,4,14): 317.5µs
com(5,5,18):  72.1µs
com(3,5,30):  77.1µs
com(8,4,14):  77.1µs
com(5,3,30): 101.6µs
com(15,3,10): 92.0µs

Goldilocks:
keygen:       97.7µs
ecdh:        306.1µs
sign:        102.6µs
verify:      328.5µs
precompute:  362.7µs
verify pre:  135.1µs
ecdh pre:     99.9µs

Testing...
_______________________________________________
Curves mailing list
[email protected]
https://moderncrypto.org/mailman/listinfo/curves

Reply via email to