Showing only the key parts of the message:

> From: John Gilmore <g...@toad.com>

An exceedingly knowledgeable guy, one we should probably take seriously.
https://en.wikipedia.org/wiki/John_Gilmore_(activist)

> Most hardware crypto accelerators are useless, ...
> ... you might as well have
> just computed the answer in userspace using ordinary instructions.

A strong claim, but one I'm inclined to believe. In the cases where it
applies, it may be a problem for much of the Linux crypto work.

Some CPUs have special instructions to speed up some crypto
operations, and not just AES. For example, Intel has them for several
hashes and for elliptic curve calculations:
https://software.intel.com/en-us/articles/intel-sha-extensions
https://en.wikipedia.org/wiki/CLMUL_instruction_set
https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/polynomial-multiplication-instructions-paper.pdf

These move the goalposts; if doing it "using ordinary instructions" is
sometimes faster than hardware, then doing it with
application-specific instructions is even more likely to be faster.

> Even if a /dev/crypto interface existed and was faster for some kinds
> of operations than just doing the crypto manually, the standard crypto
> libraries would have to be portably tuned to detect when to use
> hardware and when to use software.  The libraries generally use
> hardware if it's available, since they were written with the
> assumption that nobody would bother with hardware crypto if it was
> slower than software.
>
> "Just make it fast for all cases" is hard when the hardware is poorly
> designed.  When the hardware is well designed, it *is* faster for all
> cases.  But that's uncommon.
>
> Making this determination in realtime would be a substantial
> enhancement to each crypto library.  Since it'd have to be written
> portably (or the maintainers of the portable crypto libraries won't
> take it back), it couldn't assume any particular timings of any
> particular driver, either in hardware or software.  So it would have
> to run some fraction of the calls (perhaps 1%) in more than one
> driver, and time each one, and then make decisions on which driver to
> use by default for the other 99% of the calls.  The resulting times
> differ dramatically, based on many factors, ...
>
> One advantage of running some of the calls using both hardware and
> software is that the library can check that the results match exactly,
> and abort with a clear message.  That would likely have caught some bugs
> that snuck through in earlier crypto libraries.

I'm not at all sure I'd want run-time testing of this since, at least
as a general rule, introducing complications to crypto code is rarely
a good idea. Such tests at development time seem like a fine idea,
though; do we have those already?

What about testing when it is time to decide on kernel configuration;
include a particular module or not? Another issue is whether the
module choice is all-or-nothing; if there is a hardware RNG can one
use that without loading the rest of the code for the crypto
accelerator?

Reply via email to