Re: [PATCH resend 00/18] crypto: ARM/arm64 roundup for v4.14
On Thu, Aug 03, 2017 at 02:26:53PM +0800, Herbert Xu wrote: > On Mon, Jul 24, 2017 at 11:28:02AM +0100, Ard Biesheuvel wrote: > > This is a resend of all the patches I sent out recently that I would > > like to be considered for v4.14. Their main purpose is to prepare the > > arm64 crypto code to deal with situations where the SIMD register file > > is unavailable, which never occurs at present, but this will change in > > the future when support for SVE is added. > > > > Patches #1 and #2 have been sent out last week as 'crypto/algapi - refactor > > crypto_xor() to avoid memcpy()s' (v2). This version of #2 fixes an error > > caught by kbuild. The non-SIMD fallback code added in the remaining patches > > relies on crypto_xor() extensively, which is why these patches have been > > included here. > > > > Patches #3 - #13 implement the non-SIMD fallbacks for the various NEON > > based drivers. > > > > Patch #14 implements AES-GCM natively instead of relying on the generic > > GCM module to wire accelerated AES-CTR and GHASH together, resulting in > > a ~37% speedup. > > > > Patches #15 and #16 implement an accelerated GHASH algorithm for ARM cores > > that lack the 64x64 PMULL instruction. > > > > Patches #17 and #18 update the scalar AES implementations to stop using > > the expanded lookup tables for the final round. This reduces the Dcache > > footprint, and thus the key correlated jitter. > > > > This supersedes all other crypto patches I have outstanding, including the > > AES refactor ones which I will rework later. > > > > Ard Biesheuvel (18): > > crypto/algapi - use separate dst and src operands for __crypto_xor() > > crypto/algapi - make crypto_xor() take separate dst and src arguments > > crypto: arm64/ghash-ce - add non-SIMD scalar fallback > > crypto: arm64/crct10dif - add non-SIMD generic fallback > > crypto: arm64/crc32 - add non-SIMD scalar fallback > > crypto: arm64/sha1-ce - add non-SIMD generic fallback > > crypto: arm64/sha2-ce - add non-SIMD scalar fallback > > crypto: arm64/aes-ce-cipher - match round key endianness with generic > > code > > crypto: arm64/aes-ce-cipher: add non-SIMD generic fallback > > crypto: arm64/aes-ce-ccm: add non-SIMD generic fallback > > crypto: arm64/aes-blk - add a non-SIMD fallback for synchronous CTR > > crypto: arm64/chacha20 - take may_use_simd() into account > > crypto: arm64/aes-bs - implement non-SIMD fallback for AES-CTR > > crypto: arm64/gcm - implement native driver using v8 Crypto Extensions > > crypto: arm/ghash - add NEON accelerated fallback for vmull.p64 > > crypto: arm64/ghash - add NEON accelerated fallback for 64-bit PMULL > > crypto: arm/aes - avoid expanded lookup tables in the final round > > crypto: arm64/aes - avoid expanded lookup tables in the final round > > All applied. Thanks. Awesome, thanks ---Dave
Re: [PATCH resend 00/18] crypto: ARM/arm64 roundup for v4.14
On Mon, Jul 24, 2017 at 11:28:02AM +0100, Ard Biesheuvel wrote: > This is a resend of all the patches I sent out recently that I would > like to be considered for v4.14. Their main purpose is to prepare the > arm64 crypto code to deal with situations where the SIMD register file > is unavailable, which never occurs at present, but this will change in > the future when support for SVE is added. > > Patches #1 and #2 have been sent out last week as 'crypto/algapi - refactor > crypto_xor() to avoid memcpy()s' (v2). This version of #2 fixes an error > caught by kbuild. The non-SIMD fallback code added in the remaining patches > relies on crypto_xor() extensively, which is why these patches have been > included here. > > Patches #3 - #13 implement the non-SIMD fallbacks for the various NEON > based drivers. > > Patch #14 implements AES-GCM natively instead of relying on the generic > GCM module to wire accelerated AES-CTR and GHASH together, resulting in > a ~37% speedup. > > Patches #15 and #16 implement an accelerated GHASH algorithm for ARM cores > that lack the 64x64 PMULL instruction. > > Patches #17 and #18 update the scalar AES implementations to stop using > the expanded lookup tables for the final round. This reduces the Dcache > footprint, and thus the key correlated jitter. > > This supersedes all other crypto patches I have outstanding, including the > AES refactor ones which I will rework later. > > Ard Biesheuvel (18): > crypto/algapi - use separate dst and src operands for __crypto_xor() > crypto/algapi - make crypto_xor() take separate dst and src arguments > crypto: arm64/ghash-ce - add non-SIMD scalar fallback > crypto: arm64/crct10dif - add non-SIMD generic fallback > crypto: arm64/crc32 - add non-SIMD scalar fallback > crypto: arm64/sha1-ce - add non-SIMD generic fallback > crypto: arm64/sha2-ce - add non-SIMD scalar fallback > crypto: arm64/aes-ce-cipher - match round key endianness with generic > code > crypto: arm64/aes-ce-cipher: add non-SIMD generic fallback > crypto: arm64/aes-ce-ccm: add non-SIMD generic fallback > crypto: arm64/aes-blk - add a non-SIMD fallback for synchronous CTR > crypto: arm64/chacha20 - take may_use_simd() into account > crypto: arm64/aes-bs - implement non-SIMD fallback for AES-CTR > crypto: arm64/gcm - implement native driver using v8 Crypto Extensions > crypto: arm/ghash - add NEON accelerated fallback for vmull.p64 > crypto: arm64/ghash - add NEON accelerated fallback for 64-bit PMULL > crypto: arm/aes - avoid expanded lookup tables in the final round > crypto: arm64/aes - avoid expanded lookup tables in the final round All applied. Thanks. -- Email: Herbert XuHome Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Re: [PATCH resend 00/18] crypto: ARM/arm64 roundup for v4.14
On Wed, Aug 02, 2017 at 03:46:16PM +0100, Dave Martin wrote: > Hi Herbert, > > This series from Ard is a prerequisite for an arm64 series [1] that I'd > like to get merged this cycle (because it is in turn a prerequisite for > another major series I want to progress). > > [1] without this series will break the kernel, whereas this series > without [1] won't break the kernel, but will cause performance > regressions in the arm64 crypto code due to unnecessary execution of C > fallbacks. > > So it would be good to get both merged this cycle. > > Can Ard's series be merged for v4.14, do you think? I don't see any issues with this making 4.14. Cheers, -- Email: Herbert XuHome Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Re: [PATCH resend 00/18] crypto: ARM/arm64 roundup for v4.14
Hi Herbert, This series from Ard is a prerequisite for an arm64 series [1] that I'd like to get merged this cycle (because it is in turn a prerequisite for another major series I want to progress). [1] without this series will break the kernel, whereas this series without [1] won't break the kernel, but will cause performance regressions in the arm64 crypto code due to unnecessary execution of C fallbacks. So it would be good to get both merged this cycle. Can Ard's series be merged for v4.14, do you think? I'll let Catalin comment the readiness of [1] for merging via arm64. (I just need to repost it to fold in a late squash.) Cheers ---Dave [1] [RFC PATCH v4 0/5] Simplify kernel-mode NEON http://lists.infradead.org/pipermail/linux-arm-kernel/2017-July/521838.html On Mon, Jul 24, 2017 at 11:28:02AM +0100, Ard Biesheuvel wrote: > This is a resend of all the patches I sent out recently that I would > like to be considered for v4.14. Their main purpose is to prepare the > arm64 crypto code to deal with situations where the SIMD register file > is unavailable, which never occurs at present, but this will change in > the future when support for SVE is added. > > Patches #1 and #2 have been sent out last week as 'crypto/algapi - refactor > crypto_xor() to avoid memcpy()s' (v2). This version of #2 fixes an error > caught by kbuild. The non-SIMD fallback code added in the remaining patches > relies on crypto_xor() extensively, which is why these patches have been > included here. > > Patches #3 - #13 implement the non-SIMD fallbacks for the various NEON > based drivers. > > Patch #14 implements AES-GCM natively instead of relying on the generic > GCM module to wire accelerated AES-CTR and GHASH together, resulting in > a ~37% speedup. > > Patches #15 and #16 implement an accelerated GHASH algorithm for ARM cores > that lack the 64x64 PMULL instruction. > > Patches #17 and #18 update the scalar AES implementations to stop using > the expanded lookup tables for the final round. This reduces the Dcache > footprint, and thus the key correlated jitter. > > This supersedes all other crypto patches I have outstanding, including the > AES refactor ones which I will rework later. > > Ard Biesheuvel (18): > crypto/algapi - use separate dst and src operands for __crypto_xor() > crypto/algapi - make crypto_xor() take separate dst and src arguments > crypto: arm64/ghash-ce - add non-SIMD scalar fallback > crypto: arm64/crct10dif - add non-SIMD generic fallback > crypto: arm64/crc32 - add non-SIMD scalar fallback > crypto: arm64/sha1-ce - add non-SIMD generic fallback > crypto: arm64/sha2-ce - add non-SIMD scalar fallback > crypto: arm64/aes-ce-cipher - match round key endianness with generic > code > crypto: arm64/aes-ce-cipher: add non-SIMD generic fallback > crypto: arm64/aes-ce-ccm: add non-SIMD generic fallback > crypto: arm64/aes-blk - add a non-SIMD fallback for synchronous CTR > crypto: arm64/chacha20 - take may_use_simd() into account > crypto: arm64/aes-bs - implement non-SIMD fallback for AES-CTR > crypto: arm64/gcm - implement native driver using v8 Crypto Extensions > crypto: arm/ghash - add NEON accelerated fallback for vmull.p64 > crypto: arm64/ghash - add NEON accelerated fallback for 64-bit PMULL > crypto: arm/aes - avoid expanded lookup tables in the final round > crypto: arm64/aes - avoid expanded lookup tables in the final round > > arch/arm/crypto/Kconfig| 5 +- > arch/arm/crypto/aes-ce-glue.c | 4 +- > arch/arm/crypto/aes-cipher-core.S | 88 +++- > arch/arm/crypto/aes-neonbs-glue.c | 5 +- > arch/arm/crypto/ghash-ce-core.S| 234 +++-- > arch/arm/crypto/ghash-ce-glue.c| 24 +- > arch/arm64/crypto/Kconfig | 22 +- > arch/arm64/crypto/aes-ce-ccm-core.S| 30 +- > arch/arm64/crypto/aes-ce-ccm-glue.c| 174 +-- > arch/arm64/crypto/aes-ce-cipher.c | 55 ++- > arch/arm64/crypto/aes-ce.S | 12 +- > arch/arm64/crypto/aes-cipher-core.S| 152 -- > arch/arm64/crypto/aes-ctr-fallback.h | 53 ++ > arch/arm64/crypto/aes-glue.c | 63 ++- > arch/arm64/crypto/aes-neonbs-glue.c| 53 +- > arch/arm64/crypto/chacha20-neon-glue.c | 5 +- > arch/arm64/crypto/crc32-ce-glue.c | 11 +- > arch/arm64/crypto/crct10dif-ce-glue.c | 13 +- > arch/arm64/crypto/ghash-ce-core.S | 401 ++- > arch/arm64/crypto/ghash-ce-glue.c | 517 ++-- > arch/arm64/crypto/sha1-ce-glue.c | 18 +- > arch/arm64/crypto/sha2-ce-glue.c | 30 +- > arch/arm64/crypto/sha256-glue.c| 1 + > arch/sparc/crypto/aes_glue.c | 3 +- > arch/x86/crypto/aesni-intel_glue.c | 4 +- > arch/x86/crypto/blowfish_glue.c| 3 +- > arch/x86/crypto/cast5_avx_glue.c | 3 +- > arch/x86/crypto/des3_ede_glue.c| 3 +- > crypto/algapi.c| 25 +- > crypto/ctr.c
[PATCH resend 00/18] crypto: ARM/arm64 roundup for v4.14
This is a resend of all the patches I sent out recently that I would like to be considered for v4.14. Their main purpose is to prepare the arm64 crypto code to deal with situations where the SIMD register file is unavailable, which never occurs at present, but this will change in the future when support for SVE is added. Patches #1 and #2 have been sent out last week as 'crypto/algapi - refactor crypto_xor() to avoid memcpy()s' (v2). This version of #2 fixes an error caught by kbuild. The non-SIMD fallback code added in the remaining patches relies on crypto_xor() extensively, which is why these patches have been included here. Patches #3 - #13 implement the non-SIMD fallbacks for the various NEON based drivers. Patch #14 implements AES-GCM natively instead of relying on the generic GCM module to wire accelerated AES-CTR and GHASH together, resulting in a ~37% speedup. Patches #15 and #16 implement an accelerated GHASH algorithm for ARM cores that lack the 64x64 PMULL instruction. Patches #17 and #18 update the scalar AES implementations to stop using the expanded lookup tables for the final round. This reduces the Dcache footprint, and thus the key correlated jitter. This supersedes all other crypto patches I have outstanding, including the AES refactor ones which I will rework later. Ard Biesheuvel (18): crypto/algapi - use separate dst and src operands for __crypto_xor() crypto/algapi - make crypto_xor() take separate dst and src arguments crypto: arm64/ghash-ce - add non-SIMD scalar fallback crypto: arm64/crct10dif - add non-SIMD generic fallback crypto: arm64/crc32 - add non-SIMD scalar fallback crypto: arm64/sha1-ce - add non-SIMD generic fallback crypto: arm64/sha2-ce - add non-SIMD scalar fallback crypto: arm64/aes-ce-cipher - match round key endianness with generic code crypto: arm64/aes-ce-cipher: add non-SIMD generic fallback crypto: arm64/aes-ce-ccm: add non-SIMD generic fallback crypto: arm64/aes-blk - add a non-SIMD fallback for synchronous CTR crypto: arm64/chacha20 - take may_use_simd() into account crypto: arm64/aes-bs - implement non-SIMD fallback for AES-CTR crypto: arm64/gcm - implement native driver using v8 Crypto Extensions crypto: arm/ghash - add NEON accelerated fallback for vmull.p64 crypto: arm64/ghash - add NEON accelerated fallback for 64-bit PMULL crypto: arm/aes - avoid expanded lookup tables in the final round crypto: arm64/aes - avoid expanded lookup tables in the final round arch/arm/crypto/Kconfig| 5 +- arch/arm/crypto/aes-ce-glue.c | 4 +- arch/arm/crypto/aes-cipher-core.S | 88 +++- arch/arm/crypto/aes-neonbs-glue.c | 5 +- arch/arm/crypto/ghash-ce-core.S| 234 +++-- arch/arm/crypto/ghash-ce-glue.c| 24 +- arch/arm64/crypto/Kconfig | 22 +- arch/arm64/crypto/aes-ce-ccm-core.S| 30 +- arch/arm64/crypto/aes-ce-ccm-glue.c| 174 +-- arch/arm64/crypto/aes-ce-cipher.c | 55 ++- arch/arm64/crypto/aes-ce.S | 12 +- arch/arm64/crypto/aes-cipher-core.S| 152 -- arch/arm64/crypto/aes-ctr-fallback.h | 53 ++ arch/arm64/crypto/aes-glue.c | 63 ++- arch/arm64/crypto/aes-neonbs-glue.c| 53 +- arch/arm64/crypto/chacha20-neon-glue.c | 5 +- arch/arm64/crypto/crc32-ce-glue.c | 11 +- arch/arm64/crypto/crct10dif-ce-glue.c | 13 +- arch/arm64/crypto/ghash-ce-core.S | 401 ++- arch/arm64/crypto/ghash-ce-glue.c | 517 ++-- arch/arm64/crypto/sha1-ce-glue.c | 18 +- arch/arm64/crypto/sha2-ce-glue.c | 30 +- arch/arm64/crypto/sha256-glue.c| 1 + arch/sparc/crypto/aes_glue.c | 3 +- arch/x86/crypto/aesni-intel_glue.c | 4 +- arch/x86/crypto/blowfish_glue.c| 3 +- arch/x86/crypto/cast5_avx_glue.c | 3 +- arch/x86/crypto/des3_ede_glue.c| 3 +- crypto/algapi.c| 25 +- crypto/ctr.c | 3 +- crypto/pcbc.c | 12 +- drivers/crypto/vmx/aes_ctr.c | 3 +- drivers/md/dm-crypt.c | 11 +- include/crypto/algapi.h| 23 +- 34 files changed, 1719 insertions(+), 344 deletions(-) create mode 100644 arch/arm64/crypto/aes-ctr-fallback.h -- 2.9.3