On 08/11/14 20:45, John-Mark Gurney wrote:
Vsevolod Stakhov wrote this message on Sat, Nov 08, 2014 at 18:55 +0000:
On 08/11/14 04:23, John-Mark Gurney wrote:

Over the last few months, I've been working on a project to add support
for AES-GCM and AES-CTR modes to our OpenCrypto framework.  The work is
sponsored by The FreeBSD Foundation and Netgate.

I plan on committing these patches early next week.  If you need more
time for review, please email me privately and I will make delay.

The code has already been reviewed by Watson Ladd (the software crypto
implementations) and Trevor Perrin (the aesni module part) and I have
integrated these changes into the patch.

There are two patches, one is the changes for OpenCrypto and the test
framework.  The other is the data files used by the test framework.
The data is from NIST's CAVP program, and is about 20MB worth of test
vectors.  (I just realized, should we look at compressing these on

Main patch (192KB):

Data files (~20MB):

A list of notable changes in the patch:
- Replacing crypto(4) w/ NetBSD's version + updates
- Lots of man page updates, including CIOCFINDDEV and crypto(7) which
   adds specifics about restrictions on the modes.
- Allow sane useage of both _HARDWARE and _SOFTWARE flags.
- Add a timing safe bcmp for MAC comparision.
- Add a software implementation of GCM that uses a four bit lookup
   table with parallelization.  This algorithm is possibly vulnerable to
   timing attacks, but best known mitigation methods are used.  Using
   a timing safe version is many times slower.
- Added a CRYPTDEB macro that defaults to off.
- Bring in some of OpenBSD's improvements to the OpenCrypto framework.
- If an mbuf passed to the aesni module is only one segment, don't do
   a copy.  This needs to be improved to support segmented buffers.
- Remove the CRYPTO_F_REL flag.  It was meaningless.  It was used but
   did not change any behavior.
- Add function crypto_mbuftoiov to convert an mbuf to an iov.  This
   also converts the software crypto to only use iov's even for a simple
   linear buffer, and so simplifies the processing.
- Add a dtrace probe for errors from the ioctl.
- Add the CIOCCRYPTAEAD ioctl that allows userland processing (testing)
   of AES-GCM and future AEAD modes.

Future improvements:
- Support IV's longer than 12 bytes for GCM.
- Make AES-NI support segmented buffers (iov or mbuf) so multisegmented
   inputs don't have to be copied.

I have the question regarding to the algorithm of GF field calculations
used in the proposed implementation: why not use the recent researches
in GCM calculations, e.g. described in [1], for further speed optimizations?

[1] - https://eprint.iacr.org/2013/157.pdf

The paper you linked to does not describe a new way of calculating
GHASH, but evalutation of a bug in their implementation using the
PCLMULQDQ instruction...

If you mean, why don't I use OpenSSL's code?  The reason is that their
code is a perl script that generates assmebly...  We don't have
perl in base.. and I didn't want more assembly in our tree (see below)..

Instead, I decided to use code from Intel's whitepaper:
Intel® Carry-Less Multiplication Instruction and its Usage for
Computing the GCM Mode

I didn't use their assembly version because I wanted to have
maintainable code, and also the same code can be used on both i386
and amd64 arches..  This turns out to also be a good thing, as when
I add segmented buffer support, it'll be much easier to add to the C
version, and I only have to do the work once for two arches...

Also, the software GF library that I wrote is using state of the art
algorithms...  An OpenBSD developer has tested my code and has seen
a significant performance improvement over their old code, and are
evaluating if they want to/can include it in their tree...

Hope this answers your question.  If not, please be more specific so
I can answer it.


I'm sorry, I thought that is the paper that is a transcript of the following presentation:


made by the same authors. The transcript is not available so far it seems.

And regarding assembler/C maintainability I would argue that the current intrinsics based implementation is more readable than the pure assembler solution (and it is still machine dependent). Of course, I'm not the expert in such optimizations, so that is just my own feeling.

By the way, do you have some concrete numbers about the performance of your aes-gcm? (I recently could do aes-128-gcm at about 32 gigabits/sec that is not a limit of the modern hardware for sure).
freebsd-current@freebsd.org mailing list
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Reply via email to