> The perf by chunk is a consequence of how SHA256 works.
I politely disagree... Having chunks smaller or larger than SHA256's 64 byte block size doesn't seem to affect the timing results in any noticeable way. If you do not fill-up SHA256's block-size buffer with update(), it simply returns, and there is only the overhead of the function call. Unless I misunderstand the inner workings... Regards, Frank. On Thu, Jul 14, 2016 at 6:57 AM, lvh <_...@lvh.io> wrote: > Hi Frank, > >> On Jul 14, 2016, at 12:23 AM, Frank Siebenlist <frank.siebenl...@gmail.com> >> wrote: >> >> Python's native hashing module (hashlib), shows similar results: >> - about the same time when passed the 8MB blob in one go >> (probably expected as both use openssl) >> - substantial overhead when looping over small chunks (up to 100 times) >> - except that it's about 6 times faster per single byte.. > > The perf by chunk is a consequence of how SHA256 works. The higher perf for > many calls is a consequence of extension modules vs cffi. > > lvh > >> n: 8388608 >> b: 1.958238124847412 >> b2: 1.0818939208984375 >> b8: 0.2987058162689209 >> b32: 0.10640311241149902 >> b64: 0.06242084503173828 >> b128: 0.04123806953430176 >> b256: 0.03258681297302246 >> 8388608 bs: 0.02389383316040039 >> >> Guess hashlib used some better optimization on the C-calls (?). >> >> This is my last update on this observation. >> Conclusion is "so be it", and using bigger chunks for hashing gives >> (much) better performance. >> >> -Frank. >> >> On Tue, Jul 12, 2016 at 10:49 AM, Frank Siebenlist >> <frank.siebenl...@gmail.com> wrote: >>> After I sent my message yesterday evening, I was also wondering about >>> that 512bit (64byte) block-size of sha256, and if that would add to >>> the observed slowness. >>> The following output shows time as a function of byte-chunk size >>> (1,2,8,32,64,128,256 bytes) >>> >>> b: 12.111763954162598 >>> b2: 5.806451082229614 >>> b8: 1.4664850234985352 >>> b32: 0.37551307678222656 >>> b64: 0.20229697227478027 >>> b128: 0.11141395568847656 >>> b256: 0.06758689880371094 >>> 8388608 bs: 0.020879030227661133 >>> >>> Time seems to go down linearly with increase of chunk size, and there >>> is no perceived "speed boost" when we go through the 64byte >>> thresh-hold. >>> Time seems to be only linearly related to the number of python-to-C calls. >>> >>> And again, I can understand that the overhead is proportional to the >>> number of python-to-C calls, but it's just the factor of 500 (2-3 >>> order of magnitude) that (unpleasantly) surprised me. It requires one >>> to optimize on byte-string size to pass in the update(), when you have >>> many bytes to hash. For example, if you read from a file or socket, >>> don't update() 1 byte at the time while you read from the stream, but >>> fill-up a (big) buffer first and pass that buffer. >>> >>> -Frank. >>> >>> PS. I haven't looked at the sha256 C-code, but I can imagine that when >>> you pass the update() one byte at the time, it will fill-up some >>> 64byte-buffer, and if that buffer is filled, it will churn/hash that >>> block. The adding a byte to the buffer is all low-level fast code in >>> C, while the churning would use significantly more CPU cycles... hard >>> to phantom that you would see much slower performance when you pass a >>> single byte at the time in C... >>> >>> >>> On Tue, Jul 12, 2016 at 8:07 AM, lvh <_...@lvh.io> wrote: >>>> Hi, >>>> >>>>> On Jul 11, 2016, at 10:42 PM, Frank Siebenlist >>>>> <frank.siebenl...@gmail.com> wrote: >>>> >>>> <snipsnip> >>>> >>>>> I understand that there may be a few more object-creations and casts >>>>> involved in the looping, but 500 times slower… that was un unexpected >>>>> surprise. >>>> >>>> As expected. You both get massively increased C call overhead and the >>>> worst case because you don’t get to hit a block until every 512/8 == 64 >>>> updates. Alas, openssl speed doesn’t distinguish between the same message >>>> sizes but in different chunk sizes, but you can at least clearly see the >>>> performance multiplier for larger messages. >>>> >>>> lvh >>>> _______________________________________________ >>>> Cryptography-dev mailing list >>>> Cryptography-dev@python.org >>>> https://mail.python.org/mailman/listinfo/cryptography-dev >> _______________________________________________ >> Cryptography-dev mailing list >> Cryptography-dev@python.org >> https://mail.python.org/mailman/listinfo/cryptography-dev > > > _______________________________________________ > Cryptography-dev mailing list > Cryptography-dev@python.org > https://mail.python.org/mailman/listinfo/cryptography-dev > _______________________________________________ Cryptography-dev mailing list Cryptography-dev@python.org https://mail.python.org/mailman/listinfo/cryptography-dev