Re: [PATCH v2 2/2] crypto, x86: SSSE3 based SHA1 implementation for x86-64

2011-08-14 Thread Mathias Krause
Hi Max, 2011/8/8 Locktyukhin, Maxim maxim.locktyuk...@intel.com: I'd like to note that at Intel we very much appreciate Mathias effort to port/integrate this implementation into Linux kernel! $0.02 re tcrypt perf numbers below: I believe something must be terribly broken with the tcrypt

Re: [PATCH v2 2/2] crypto, x86: SSSE3 based SHA1 implementation for x86-64

2011-08-14 Thread Mathias Krause
On Thu, Aug 11, 2011 at 4:50 PM, Andy Lutomirski l...@mit.edu wrote: I have vague plans to clean up extended state handling and make kernel_fpu_begin work efficiently from any context.  (i.e. the first kernel_fpu_begin after a context switch could take up to ~60 ns on Sandy Bridge, but further

Re: [PATCH v2 2/2] crypto, x86: SSSE3 based SHA1 implementation for x86-64

2011-08-11 Thread Andy Lutomirski
On 08/04/2011 02:44 AM, Herbert Xu wrote: On Sun, Jul 24, 2011 at 07:53:14PM +0200, Mathias Krause wrote: With this algorithm I was able to increase the throughput of a single IPsec link from 344 Mbit/s to 464 Mbit/s on a Core 2 Quad CPU using the SSSE3 variant -- a speedup of +34.8%. Were

Re: [PATCH v2 2/2] crypto, x86: SSSE3 based SHA1 implementation for x86-64

2011-08-11 Thread Herbert Xu
On Thu, Aug 11, 2011 at 10:50:49AM -0400, Andy Lutomirski wrote: This is pretty similar to the situation with the Intel AES code. Over there they solved it by using the asynchronous interface and deferring the processing to a work queue. I have vague plans to clean up extended state handling

Re: [PATCH v2 2/2] crypto, x86: SSSE3 based SHA1 implementation for x86-64

2011-08-11 Thread Andrew Lutomirski
On Thu, Aug 11, 2011 at 11:08 AM, Herbert Xu herb...@gondor.hengli.com.au wrote: On Thu, Aug 11, 2011 at 10:50:49AM -0400, Andy Lutomirski wrote: This is pretty similar to the situation with the Intel AES code. Over there they solved it by using the asynchronous interface and deferring the

Re: [PATCH v2 2/2] crypto, x86: SSSE3 based SHA1 implementation for x86-64

2011-08-08 Thread Sandy Harris
On Mon, Aug 8, 2011 at 1:48 PM, Locktyukhin, Maxim maxim.locktyuk...@intel.com wrote: 20 (and more) cycles per byte shown below are not reasonable numbers for SHA-1 - ~6 c/b (as can be seen in some of the results for Core2) is the expected results ... Ten years ago, on Pentium II, one

RE: [PATCH v2 2/2] crypto, x86: SSSE3 based SHA1 implementation for x86-64

2011-08-07 Thread Locktyukhin, Maxim
, x86: SSSE3 based SHA1 implementation for x86-64 On Thu, Aug 4, 2011 at 8:44 AM, Herbert Xu herb...@gondor.apana.org.au wrote: On Sun, Jul 24, 2011 at 07:53:14PM +0200, Mathias Krause wrote: With this algorithm I was able to increase the throughput of a single IPsec link from 344 Mbit/s to 464

Re: [PATCH v2 2/2] crypto, x86: SSSE3 based SHA1 implementation for x86-64

2011-08-04 Thread Herbert Xu
On Sun, Jul 24, 2011 at 07:53:14PM +0200, Mathias Krause wrote: With this algorithm I was able to increase the throughput of a single IPsec link from 344 Mbit/s to 464 Mbit/s on a Core 2 Quad CPU using the SSSE3 variant -- a speedup of +34.8%. Were you testing this on the transmit side or the

Re: [PATCH v2 2/2] crypto, x86: SSSE3 based SHA1 implementation for x86-64

2011-08-04 Thread Mathias Krause
On Thu, Aug 4, 2011 at 8:44 AM, Herbert Xu herb...@gondor.apana.org.au wrote: On Sun, Jul 24, 2011 at 07:53:14PM +0200, Mathias Krause wrote: With this algorithm I was able to increase the throughput of a single IPsec link from 344 Mbit/s to 464 Mbit/s on a Core 2 Quad CPU using the SSSE3

Re: [PATCH v2 2/2] crypto, x86: SSSE3 based SHA1 implementation for x86-64

2011-08-04 Thread Mathias Krause
On Thu, Aug 4, 2011 at 7:05 PM, Mathias Krause mini...@googlemail.com wrote: It does. Just have a look at how fpu_available() is implemented: read: irq_fpu_usable() -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More

[PATCH v2 2/2] crypto, x86: SSSE3 based SHA1 implementation for x86-64

2011-07-24 Thread Mathias Krause
This is an assembler implementation of the SHA1 algorithm using the Supplemental SSE3 (SSSE3) instructions or, when available, the Advanced Vector Extensions (AVX). Testing with the tcrypt module shows the raw hash performance is up to 2.3 times faster than the C implementation, using 8k data