Looking at the current implementation in openssl, only the EVP interface is available, which is not currently wrapped by gnulib, nor used by coreutils. Also even though there is an SSE implementation available, this is not available in openssl. Therefore we'll probably have to skip openssl usage in this iteration.
Some testing of the C reference implementation on a 4 core i3-2310M CPU @ 2.10GHz system: $ truncate -s 1G file.test $ time ./b2sum -a blake2b file.test real 0m3.279s $ time ./b2sum -a blake2bp file.test real 0m1.867s $ taskset -a 01 time ./b2sum -a blake2bp file.test real 0m3.745s $ time ./b2sum-sse file.test real 0m2.335s $ time ./b2sum-sse -a blake2bp file.test real 0m1.578s $ taskset -a 01 time ./b2sum-sse -a blake2bp file.test real 0m2.785s $ time ./sha1sum-sse file.test real 0m2.751s Notes: Parallel versions give different checksums to non parallel, which is as implemented but a bit surprising. Parallel versions are only a little slower than non parallel in the single CPU case, so it's a bit surprising that these are not the default. OMP_NUM_THREADS=1 not honored by the parallel versions even though OMP is used. Rather than having to worry about runtime selection of sse and non sse (as is already done in openssl sha* implementations), I'll only include the non sse implementations. I might also drop the blake2s and blake2sp implementations as 32 bit can get equivalent but somewhat slower functionality with `b2sum -l 256 -a blake2b` I can handle the OPENMP compiler options with AC_OPENMP so will try to support that. thanks, Pádraig.