Python's native hashing module (hashlib), shows similar results:
- about the same time when passed the 8MB blob in one go
(probably expected as both use openssl)
- substantial overhead when looping over small chunks (up to 100 times)
- except that it's about 6 times faster per single byte..

n: 8388608
b:  1.958238124847412
b2:  1.0818939208984375
b8:  0.2987058162689209
b32:  0.10640311241149902
b64:  0.06242084503173828
b128:  0.04123806953430176
b256:  0.03258681297302246
8388608  bs:  0.02389383316040039

Guess hashlib used some better optimization on the C-calls (?).

This is my last update on this observation.
Conclusion is "so be it", and using bigger chunks for hashing gives
(much) better performance.

-Frank.

On Tue, Jul 12, 2016 at 10:49 AM, Frank Siebenlist
<frank.siebenl...@gmail.com> wrote:
> After I sent my message yesterday evening, I was also wondering about
> that 512bit (64byte) block-size of sha256, and if that would add to
> the observed slowness.
> The following output shows time as a function of byte-chunk size
> (1,2,8,32,64,128,256 bytes)
>
> b:  12.111763954162598
> b2:  5.806451082229614
> b8:  1.4664850234985352
> b32:  0.37551307678222656
> b64:  0.20229697227478027
> b128:  0.11141395568847656
> b256:  0.06758689880371094
> 8388608  bs:  0.020879030227661133
>
> Time seems to go down linearly with increase of chunk size, and there
> is no perceived "speed boost"  when we go through the 64byte
> thresh-hold.
> Time seems to be only linearly related to the number of python-to-C calls.
>
> And again, I can understand that the overhead is proportional to the
> number of python-to-C calls, but it's just the factor of 500 (2-3
> order of magnitude) that (unpleasantly) surprised me. It requires one
> to optimize on byte-string size to pass in the update(), when you have
> many bytes to hash. For example, if you read from a file or socket,
> don't update() 1 byte at the time while you read from the stream, but
> fill-up a (big) buffer first and pass that buffer.
>
> -Frank.
>
> PS. I haven't looked at the sha256 C-code, but I can imagine that when
> you pass the update() one byte at the time, it will fill-up some
> 64byte-buffer, and if that buffer is filled, it will churn/hash that
> block. The adding a byte to the buffer is all low-level fast code in
> C, while the churning would use significantly more CPU cycles... hard
> to phantom that you would see much slower performance when you pass a
> single byte at the time in C...
>
>
> On Tue, Jul 12, 2016 at 8:07 AM, lvh <_...@lvh.io> wrote:
>> Hi,
>>
>>> On Jul 11, 2016, at 10:42 PM, Frank Siebenlist <frank.siebenl...@gmail.com> 
>>> wrote:
>>
>> <snipsnip>
>>
>>> I understand that there may be a few more object-creations and casts 
>>> involved in the looping, but 500 times slower… that was un unexpected 
>>> surprise.
>>
>> As expected. You both get massively increased C call overhead and the worst 
>> case because you don’t get to hit a block until every 512/8 == 64 updates. 
>> Alas, openssl speed doesn’t distinguish between the same message sizes but 
>> in different chunk sizes, but you can at least clearly see the performance 
>> multiplier for larger messages.
>>
>> lvh
>> _______________________________________________
>> Cryptography-dev mailing list
>> Cryptography-dev@python.org
>> https://mail.python.org/mailman/listinfo/cryptography-dev
_______________________________________________
Cryptography-dev mailing list
Cryptography-dev@python.org
https://mail.python.org/mailman/listinfo/cryptography-dev

Reply via email to