> > It seems that the implementation fails just like described in the blog
> > post, as soon as ChaCha() is called with a length which is not a multiple
> > of 64, all further uses of the method produce incorrect results.
>
> Thanks for pointing this out - I've just fixed this in -current. The
> underlying ChaCha implementation (from djb) is written for single-shot use
> (as exposed via CRYPTO_chacha_20(), which is also used for
> ChaCha20Poly1305).
> I obviously overlooked this when I added the ChaCha() and EVP interfaces.
>
> Regress tests already existed, however they did not trigger this specific
> issue. They've now been extended to cover the ChaCha interface (which was
> already tested via the EVP regress) with partial/single-byte writes.
>

Thank you for your quick fixes. I tested your changes with the other tests
provided here too, and it all appears to be good now.


>
> > The blog's author provided an implementation which does not suffer from
> > this problem, along with test vectors: http://chacha20.insanecoding.org/
> > The license on that code appears to be friendly, although I don't know if
> > the code itself is any good.
>
> Performance-wise the implementation would be rather ordinary (as noted in
> the
> code) - the existing implementation does 64-byte blocks in 4-byte pieces,
> whereas this implementation does a byte a time.


Since you brought up performance, curiosity got the better of me, so I
decided to run some quick benchmarks to see what the difference is. I
modified test.c to change the random tests from 1000 to 1000000 iterations
per loop, and timed both implementations to see what the difference was.

Your code on my system (amd64) compiled with gcc 4.7 and -O3 averages
~6.976u, whereas the other code averages ~6.344u.

I found this quite surprising that the code which is advertised as
unoptimized and operates a byte at a time runs quicker. I checked the
assembly GCC is producing for that byte at a time code in the xoring
section, and I see GCC is producing what looks like SSE instructions! I
didn't even know GCC was able to do that without help.

Maybe something about the optimizations in your code are counter productive?
Or is my benchmark methodology somehow flawed?


> Additionally, the code is
> pretty horrific from a style perspective.
>

I take it you mean the testing code, as that does look awful.
The encryption code itself IMO anyway looks neat and clean, and much more
understandable than what currently is in your chacha_encrypt_bytes() method.

Looking at this new code, I also see it has an interesting comment
regarding incrementing the counter. Is there any validity to the approach
its taking? Would that be more secure for very long streams?

J

Reply via email to