On 08/06/15 22:17, Taylor R Campbell wrote: > Date: Mon, 08 Jun 2015 21:24:30 +0100 > From: Padraig Brady <[email protected]> > > On 08/06/15 21:08, Taylor R Campbell wrote: > > Zooko asked me to send the following timings of portable BLAKE2 C code > > versus the hand-optimized assembly for MD5 and portable C for SHA-256 > > that one finds in OpenSSL 1.0.1k, computed on a 1.2 GHz Freescale > > i.MX6 CPU (on different file, from /dev/urandom, of the same size as > > Zooko reported timings for, 1073741824 bytes): > > Questions... > > You probably shouldn't read too much into this crude measurement. > > Here is a much more precise performance comparison, closer to what you > will find in SUPERCOP (<http://bench.cr.yp.to/>, which is where you > should look for high-quality performance comparisons of crypto > algorithms): > > http://mumble.net/~campbell/tmp/blake2.imx6 > > The first number on each line is the size of the message in bytes. > The remaining numbers are nanoseconds per byte, measured by > clock_gettime(CLOCK_MONOTONIC) before and after computing the hash, > averaged over 16 trials. The +1 means the input buffer was unaligned. > > The BLAKE2 code, and timing code, for those data are at > > http://mumble.net/~campbell/hg/blake2 > > with the MD5 and SHA-256 timing code adapted slightly to use OpenSSL's > API instead of the BSD libc API for MD5 and SHA-256. > > (Yes, that code should use the ARM cycle counter instead of > clock_gettime(CLOCK_MONOTONIC). Patches welcome!) > > The rest of this message is about the less precise measurements of the > code at <https://blake2.net/> previously under discussion here. > > Does the file fit in cache? > > Yes. The machine has 4 GB of RAM. > > A file about quarter the size would be enough for this test I think. > > Yes. I used 1073741824 bytes because that is what zooko had used. > > The md5sum, sha256sum, and sha512sum below were from coreutils > ./configured --with-openssl=yes ? > > On second thought, I'm not sure: md5sum and sha256sum are not linked > against libcrypto, so perhaps not. It was from the Debian jessie > coreutils 8.23-4 package for armhf.
Right it's not enabled by default there. > On the other hand, I get about the same timings from `openssl md5' and > `openssl sha256', so perhaps md5sum and sha256sum were just statically > linked against OpenSSL. Probably just the arm specific code (if any) is not significantly different than straight C. Note on x86_64 the biggest difference (40%) was with sha1sum anyway. > > > $ time md5sum randfile.0 > > 7af160fa500c6ad20be1c8119c9141f8 randfile.0 > > > > real 0m9.132s > > user 0m6.600s > > sys 0m2.530s > > I presume this was precached? > > Yes. I warmed the cache by running each program twice first. > > > $ time b2sum randfile.0 > > > ea2c77e755d0f5c84e9fff444cd6ce83a566b134d43e4fe37ed53886e0ca5c7e6141968498d5d765c4190e4b567c437337e8e57ef5ba9306cc11db29a4b9e987 > randfile.0 > > > > real 0m48.012s > > user 0m46.070s > > sys 0m1.900s > > I presume the above was for sha512sum > > This was BLAKE2b, i.e. the 512-bit BLAKE2 hash function, which is the > default for b2sum. I copied zooko's invocations verbatim. > > > $ time b2sum -a blake2sp randfile.0 > > 2886c0adfd613381d02f18a8ed18527c98d88b115a974e61e030fb914118bd0d > randfile.0 > > > > real 0m9.880s > > user 0m23.610s > > sys 0m3.260s > > So this b2sum implementation is multithreaded > and has about the same total computational cost as sha256sum? > > It appears to be multithreaded with OpenMP. I'm using more or less > the same BLAKE2 code that zooko reported from <https://blake2.net/>, > specifically blake2_code_20150529.zip. thanks, Pádraig.
