Python's native hashing module (hashlib), shows similar results: - about the same time when passed the 8MB blob in one go (probably expected as both use openssl) - substantial overhead when looping over small chunks (up to 100 times) - except that it's about 6 times faster per single byte..
n: 8388608 b: 1.958238124847412 b2: 1.0818939208984375 b8: 0.2987058162689209 b32: 0.10640311241149902 b64: 0.06242084503173828 b128: 0.04123806953430176 b256: 0.03258681297302246 8388608 bs: 0.02389383316040039 Guess hashlib used some better optimization on the C-calls (?). This is my last update on this observation. Conclusion is "so be it", and using bigger chunks for hashing gives (much) better performance. -Frank. On Tue, Jul 12, 2016 at 10:49 AM, Frank Siebenlist <frank.siebenl...@gmail.com> wrote: > After I sent my message yesterday evening, I was also wondering about > that 512bit (64byte) block-size of sha256, and if that would add to > the observed slowness. > The following output shows time as a function of byte-chunk size > (1,2,8,32,64,128,256 bytes) > > b: 12.111763954162598 > b2: 5.806451082229614 > b8: 1.4664850234985352 > b32: 0.37551307678222656 > b64: 0.20229697227478027 > b128: 0.11141395568847656 > b256: 0.06758689880371094 > 8388608 bs: 0.020879030227661133 > > Time seems to go down linearly with increase of chunk size, and there > is no perceived "speed boost" when we go through the 64byte > thresh-hold. > Time seems to be only linearly related to the number of python-to-C calls. > > And again, I can understand that the overhead is proportional to the > number of python-to-C calls, but it's just the factor of 500 (2-3 > order of magnitude) that (unpleasantly) surprised me. It requires one > to optimize on byte-string size to pass in the update(), when you have > many bytes to hash. For example, if you read from a file or socket, > don't update() 1 byte at the time while you read from the stream, but > fill-up a (big) buffer first and pass that buffer. > > -Frank. > > PS. I haven't looked at the sha256 C-code, but I can imagine that when > you pass the update() one byte at the time, it will fill-up some > 64byte-buffer, and if that buffer is filled, it will churn/hash that > block. The adding a byte to the buffer is all low-level fast code in > C, while the churning would use significantly more CPU cycles... hard > to phantom that you would see much slower performance when you pass a > single byte at the time in C... > > > On Tue, Jul 12, 2016 at 8:07 AM, lvh <_...@lvh.io> wrote: >> Hi, >> >>> On Jul 11, 2016, at 10:42 PM, Frank Siebenlist <frank.siebenl...@gmail.com> >>> wrote: >> >> <snipsnip> >> >>> I understand that there may be a few more object-creations and casts >>> involved in the looping, but 500 times slower… that was un unexpected >>> surprise. >> >> As expected. You both get massively increased C call overhead and the worst >> case because you don’t get to hit a block until every 512/8 == 64 updates. >> Alas, openssl speed doesn’t distinguish between the same message sizes but >> in different chunk sizes, but you can at least clearly see the performance >> multiplier for larger messages. >> >> lvh >> _______________________________________________ >> Cryptography-dev mailing list >> Cryptography-dev@python.org >> https://mail.python.org/mailman/listinfo/cryptography-dev _______________________________________________ Cryptography-dev mailing list Cryptography-dev@python.org https://mail.python.org/mailman/listinfo/cryptography-dev