After I sent my message yesterday evening, I was also wondering about
that 512bit (64byte) block-size of sha256, and if that would add to
the observed slowness.
The following output shows time as a function of byte-chunk size
(1,2,8,32,64,128,256 bytes)

b:  12.111763954162598
b2:  5.806451082229614
b8:  1.4664850234985352
b32:  0.37551307678222656
b64:  0.20229697227478027
b128:  0.11141395568847656
b256:  0.06758689880371094
8388608  bs:  0.020879030227661133

Time seems to go down linearly with increase of chunk size, and there
is no perceived "speed boost"  when we go through the 64byte
thresh-hold.
Time seems to be only linearly related to the number of python-to-C calls.

And again, I can understand that the overhead is proportional to the
number of python-to-C calls, but it's just the factor of 500 (2-3
order of magnitude) that (unpleasantly) surprised me. It requires one
to optimize on byte-string size to pass in the update(), when you have
many bytes to hash. For example, if you read from a file or socket,
don't update() 1 byte at the time while you read from the stream, but
fill-up a (big) buffer first and pass that buffer.

-Frank.

PS. I haven't looked at the sha256 C-code, but I can imagine that when
you pass the update() one byte at the time, it will fill-up some
64byte-buffer, and if that buffer is filled, it will churn/hash that
block. The adding a byte to the buffer is all low-level fast code in
C, while the churning would use significantly more CPU cycles... hard
to phantom that you would see much slower performance when you pass a
single byte at the time in C...


On Tue, Jul 12, 2016 at 8:07 AM, lvh <_...@lvh.io> wrote:
> Hi,
>
>> On Jul 11, 2016, at 10:42 PM, Frank Siebenlist <frank.siebenl...@gmail.com> 
>> wrote:
>
> <snipsnip>
>
>> I understand that there may be a few more object-creations and casts 
>> involved in the looping, but 500 times slower… that was un unexpected 
>> surprise.
>
> As expected. You both get massively increased C call overhead and the worst 
> case because you don’t get to hit a block until every 512/8 == 64 updates. 
> Alas, openssl speed doesn’t distinguish between the same message sizes but in 
> different chunk sizes, but you can at least clearly see the performance 
> multiplier for larger messages.
>
> lvh
> _______________________________________________
> Cryptography-dev mailing list
> Cryptography-dev@python.org
> https://mail.python.org/mailman/listinfo/cryptography-dev
_______________________________________________
Cryptography-dev mailing list
Cryptography-dev@python.org
https://mail.python.org/mailman/listinfo/cryptography-dev

Reply via email to