== Discussion of Candidates I've implemented and tested the following algorithms, all of which are 256-bit (in alphabetical order):
* BLAKE2b (libb2)
* BLAKE2bp (libb2)
* KangarooTwelve (imported from the Keccak Code Package)
* SHA-256 (OpenSSL)
* SHA-512/256 (OpenSSL)
* SHA3-256 (OpenSSL)
* SHAKE128 (OpenSSL)
I also rejected some other candidates. I couldn't find any reference or
implementation of SHA256×16, so I didn't implement it. I didn't
consider SHAKE256 because it is nearly identical to SHA3-256 in almost
all characteristics (including performance).
I imported the optimized 64-bit implementation of KangarooTwelve. The
AVX2 implementation was not considered for licensing reasons (it's
partially generated from external code, which falls foul of the GPL's
"preferred form for modifications" rule).
=== BLAKE2b and BLAKE2bp
These are the non-parallelized and parallelized 64-bit variants of
BLAKE2.
Benefits:
* Both algorithms provide 256-bit preimage resistance.
Downsides:
* Some people are uncomfortable that the security margin has been
decreased from the original SHA-3 submission, although it is still
considered secure.
* BLAKE2bp, as implemented in libb2, uses OpenMP (and therefore
multithreading) by default. It was no longer possible to run the
testsuite with -j3 on my laptop in this configuration.
=== Keccak-based Algorithms
SHA3-256 is the 256-bit Keccak algorithm with 24 rounds, processing 136
bytes at a time. SHAKE128 is an extendable output function with 24
rounds, processing 168 bytes at a time. KangarooTwelve is an extendable
output function with 12 rounds, processing 136 bytes at a time.
Benefits:
* SHA3-256 provides 256-bit preimage resistance.
* SHA3-256 has been heavily studied and is believed to have a large
security margin.
I noted the following downsides:
* There's a lack of a availability of KangarooTwelve in other
implementations. It may be the least available option in terms of
implementations.
* Some people are uncomfortable that the security margin of
KangarooTwelve has been decreased, although it is still considered
secure.
* SHAKE128 and KangarooTwelve provide only 128-bit preimage resistance.
=== SHA-256 and SHA-512/256
These are the 32-bit and 64-bit SHA-2 algorithms that are 256 bits in
size.
I noted the following benefits:
* Both algorithms are well known and heavily analyzed.
* Both algorithms provide 256-bit preimage resistance.
== Implementation Support
|===
| Implementation | OpenSSL | libb2 | NSS | ACC | gcrypt | Nettle| CL |
| SHA-1 | 🗸 | | 🗸 | 🗸 | 🗸 | 🗸 | {1} |
| BLAKE2b | f | 🗸 | | | 🗸 | | {2} |
| BLAKE2bp | | 🗸 | | | | | |
| KangarooTwelve | | | | | | | |
| SHA-256 | 🗸 | | 🗸 | 🗸 | 🗸 | 🗸 | {1} |
| SHA-512/256 | 🗸 | | | | | 🗸 | {3} |
| SHA3-256 | 🗸 | | | | 🗸 | 🗸 | {4} |
| SHAKE128 | 🗸 | | | | 🗸 | | {5} |
|===
f: future version (expected 1.2.0)
ACC: Apple Common Crypto
CL: Command-line
:1: OpenSSL, coreutils, Perl Digest::SHA.
:2: OpenSSL, coreutils.
:3: OpenSSL
:4: OpenSSL, Perl Digest::SHA3.
:5: Perl Digest::SHA3.
=== Performance Analysis
The test system used below is my personal laptop, a 2016 Lenovo ThinkPad
X1 Carbon with an Intel i7-6600U CPU (2.60 GHz) running Debian unstable.
I implemented a test tool helper to compute speed much like OpenSSL
does. Below is a comparison of speeds. The columns indicate the speed
in KiB/s for chunks of the given size. The runs are representative of
multiple similar runs.
256 and 1024 bytes were chosen to represent common tree and commit
object sizes and the 8 KiB is an approximate average blob size.
Algorithms are sorted by performance on the 1 KiB column.
|===
| Implementation | 256 B | 1 KiB | 8 KiB | 16 KiB |
| SHA-1 (OpenSSL) | 513963 | 685966 | 748993 | 754270 |
| BLAKE2b (libb2) | 488123 | 552839 | 576246 | 579292 |
| SHA-512/256 (OpenSSL) | 181177 | 349002 | 499113 | 495169 |
| BLAKE2bp (libb2) | 139891 | 344786 | 488390 | 522575 |
| SHA-256 (OpenSSL) | 264276 | 333560 | 357830 | 355761 |
| KangarooTwelve | 239305 | 307300 | 355257 | 364261 |
| SHAKE128 (OpenSSL) | 154775 | 253344 | 337811 | 346732 |
| SHA3-256 (OpenSSL) | 128597 | 185381 | 198931 | 207365 |
| BLAKE2bp (libb2; threaded) | 12223 | 49306 | 132833 | 179616 |
|===
SUPERCOP (a crypto benchmarking tool;
https://bench.cr.yp.to/results-hash.html) has also benchmarked these
algorithms. Note that BLAKE2bp is not listed, KangarooTwelve is k12,
SHA-512/256 is equivalent to sha512, SHA3-256 is keccakc512, and SHAKE128 is
keccakc256.
Information is for kizomba, a Kaby Lake system. Counts are in cycles
per byte (smaller is better; sorted by 1536 B column):
|===
| Algorithm | 576 B | 1536 B | 4096 B | long |
| BLAKE2b | 3.51 | 3.10 | 3.08 | 3.07 |
| SHA-1 | 4.36 | 3.81 | 3.59 | 3.49 |
| KangarooTwelve | 4.99 | 4.57 | 4.13 | 3.86 |
| SHA-512/256 | 6.39 | 5.76 | 5.31 | 5.05 |
| SHAKE128 | 8.23 | 7.67 | 7.17 | 6.97 |
| SHA-256 | 8.90 | 8.08 | 7.77 | 7.59 |
| SHA3-256 | 10.26 | 9.15 | 8.84 | 8.57 |
|===
Numbers for genji262, an AMD Ryzen System, which has SHA acceleration:
|===
| Algorithm | 576 B | 1536 B | 4096 B | long |
| SHA-1 | 1.87 | 1.69 | 1.60 | 1.54 |
| SHA-256 | 1.95 | 1.72 | 1.68 | 1.64 |
| BLAKE2b | 2.94 | 2.59 | 2.59 | 2.59 |
| KangarooTwelve | 4.09 | 3.65 | 3.35 | 3.17 |
| SHA-512/256 | 5.54 | 5.08 | 4.71 | 4.48 |
| SHAKE128 | 6.95 | 6.23 | 5.71 | 5.49 |
| SHA3-256 | 8.29 | 7.35 | 7.04 | 6.81 |
|===
Note that no mid- to high-end Intel processors provide acceleration.
AMD Ryzen and some ARM64 processors do.
== Summary
The algorithms with the greatest implementation availability are
SHA-256, SHA3-256, BLAKE2b, and SHAKE128.
In terms of command-line availability, BLAKE2b, SHA-256, SHA-512/256,
and SHA3-256 should be available in the near future on a reasonably
small Debian, Ubuntu, or Fedora install.
As far as security, the most conservative choices appear to be SHA-256,
SHA-512/256, and SHA3-256.
The performance winners are BLAKE2b unaccelerated and SHA-256 accelerated.
--
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204
signature.asc
Description: PGP signature

