Re: [RFC PATCH] crypto: arm64/speck - add NEON-accelerated implementation of Speck-XTS

2018-03-16 Thread Herbert Xu
On Mon, Mar 05, 2018 at 11:17:07AM -0800, Eric Biggers wrote:
> Add a NEON-accelerated implementation of Speck128-XTS and Speck64-XTS
> for ARM64.  This is ported from the 32-bit version.  It may be useful on
> devices with 64-bit ARM CPUs that don't have the Cryptography
> Extensions, so cannot do AES efficiently -- e.g. the Cortex-A53
> processor on the Raspberry Pi 3.
> 
> It generally works the same way as the 32-bit version, but there are
> some slight differences due to the different instructions, registers,
> and syntax available in ARM64 vs. in ARM32.  For example, in the 64-bit
> version there are enough registers to hold the XTS tweaks for each
> 128-byte chunk, so they don't need to be saved on the stack.
> 
> Benchmarks on a Raspberry Pi 3 running a 64-bit kernel:
> 
>Algorithm  Encryption Decryption
>-  -- --
>Speck64/128-XTS (NEON) 92.2 MB/s  92.2 MB/s
>Speck128/256-XTS (NEON)75.0 MB/s  75.0 MB/s
>Speck128/256-XTS (generic) 47.4 MB/s  35.6 MB/s
>AES-128-XTS (NEON bit-sliced)  33.4 MB/s  29.6 MB/s
>AES-256-XTS (NEON bit-sliced)  24.6 MB/s  21.7 MB/s
> 
> The code performs well on higher-end ARM64 processors as well, though
> such processors tend to have the Crypto Extensions which make AES
> preferred.  For example, here are the same benchmarks run on a HiKey960
> (with CPU affinity set for the A73 cores), with the Crypto Extensions
> implementation of AES-256-XTS added:
> 
>Algorithm  Encryption Decryption
>-  ------
>AES-256-XTS (Crypto Extensions)1273.3 MB/s1274.7 MB/s
>Speck64/128-XTS (NEON)  359.8 MB/s 348.0 MB/s
>Speck128/256-XTS (NEON) 292.5 MB/s 286.1 MB/s
>Speck128/256-XTS (generic)  186.3 MB/s 181.8 MB/s
>AES-128-XTS (NEON bit-sliced)   142.0 MB/s 124.3 MB/s
>AES-256-XTS (NEON bit-sliced)   104.7 MB/s  91.1 MB/s
> 
> Signed-off-by: Eric Biggers 

Patch applied.  Thanks.
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: [RFC PATCH] crypto: arm64/speck - add NEON-accelerated implementation of Speck-XTS

2018-03-06 Thread Dave Martin
On Tue, Mar 06, 2018 at 12:47:45PM +, Ard Biesheuvel wrote:
> On 6 March 2018 at 12:35, Dave Martin  wrote:
> > On Mon, Mar 05, 2018 at 11:17:07AM -0800, Eric Biggers wrote:
> >> Add a NEON-accelerated implementation of Speck128-XTS and Speck64-XTS
> >> for ARM64.  This is ported from the 32-bit version.  It may be useful on
> >> devices with 64-bit ARM CPUs that don't have the Cryptography
> >> Extensions, so cannot do AES efficiently -- e.g. the Cortex-A53
> >> processor on the Raspberry Pi 3.
> >>
> >> It generally works the same way as the 32-bit version, but there are
> >> some slight differences due to the different instructions, registers,
> >> and syntax available in ARM64 vs. in ARM32.  For example, in the 64-bit
> >> version there are enough registers to hold the XTS tweaks for each
> >> 128-byte chunk, so they don't need to be saved on the stack.
> >>
> >> Benchmarks on a Raspberry Pi 3 running a 64-bit kernel:
> >>
> >>Algorithm  Encryption Decryption
> >>-  -- --
> >>Speck64/128-XTS (NEON) 92.2 MB/s  92.2 MB/s
> >>Speck128/256-XTS (NEON)75.0 MB/s  75.0 MB/s
> >>Speck128/256-XTS (generic) 47.4 MB/s  35.6 MB/s
> >>AES-128-XTS (NEON bit-sliced)  33.4 MB/s  29.6 MB/s
> >>AES-256-XTS (NEON bit-sliced)  24.6 MB/s  21.7 MB/s
> >>
> >> The code performs well on higher-end ARM64 processors as well, though
> >> such processors tend to have the Crypto Extensions which make AES
> >> preferred.  For example, here are the same benchmarks run on a HiKey960
> >> (with CPU affinity set for the A73 cores), with the Crypto Extensions
> >> implementation of AES-256-XTS added:
> >>
> >>Algorithm  Encryption Decryption
> >>-  ------
> >>AES-256-XTS (Crypto Extensions)1273.3 MB/s1274.7 MB/s
> >>Speck64/128-XTS (NEON)  359.8 MB/s 348.0 MB/s
> >>Speck128/256-XTS (NEON) 292.5 MB/s 286.1 MB/s
> >>Speck128/256-XTS (generic)  186.3 MB/s 181.8 MB/s
> >>AES-128-XTS (NEON bit-sliced)   142.0 MB/s 124.3 MB/s
> >>AES-256-XTS (NEON bit-sliced)   104.7 MB/s  91.1 MB/s
> >>
> >> Signed-off-by: Eric Biggers 
> >> ---
> >>  arch/arm64/crypto/Kconfig   |   6 +
> >>  arch/arm64/crypto/Makefile  |   3 +
> >>  arch/arm64/crypto/speck-neon-core.S | 352 
> >>  arch/arm64/crypto/speck-neon-glue.c | 282 ++
> >>  4 files changed, 643 insertions(+)
> >>  create mode 100644 arch/arm64/crypto/speck-neon-core.S
> >>  create mode 100644 arch/arm64/crypto/speck-neon-glue.c
> >>
> >> diff --git a/arch/arm64/crypto/Kconfig b/arch/arm64/crypto/Kconfig
> >> index 285c36c7b408..cb5a243110c4 100644
> >> --- a/arch/arm64/crypto/Kconfig
> >> +++ b/arch/arm64/crypto/Kconfig
> >> @@ -113,4 +113,10 @@ config CRYPTO_AES_ARM64_BS
> >>   select CRYPTO_AES_ARM64
> >>   select CRYPTO_SIMD
> >>
> >> +config CRYPTO_SPECK_NEON
> >> + tristate "NEON accelerated Speck cipher algorithms"
> >> + depends on KERNEL_MODE_NEON
> >> + select CRYPTO_BLKCIPHER
> >> + select CRYPTO_SPECK
> >> +
> >>  endif
> >> diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile
> >> index cee9b8d9830b..d94ebd15a859 100644
> >> --- a/arch/arm64/crypto/Makefile
> >> +++ b/arch/arm64/crypto/Makefile
> >> @@ -53,6 +53,9 @@ sha512-arm64-y := sha512-glue.o sha512-core.o
> >>  obj-$(CONFIG_CRYPTO_CHACHA20_NEON) += chacha20-neon.o
> >>  chacha20-neon-y := chacha20-neon-core.o chacha20-neon-glue.o
> >>
> >> +obj-$(CONFIG_CRYPTO_SPECK_NEON) += speck-neon.o
> >> +speck-neon-y := speck-neon-core.o speck-neon-glue.o
> >> +
> >>  obj-$(CONFIG_CRYPTO_AES_ARM64) += aes-arm64.o
> >>  aes-arm64-y := aes-cipher-core.o aes-cipher-glue.o
> >>
> >> diff --git a/arch/arm64/crypto/speck-neon-core.S 
> >> b/arch/arm64/crypto/speck-neon-core.S
> >> new file mode 100644
> >> index ..b14463438b09
> >> --- /dev/null
> >> +++ b/arch/arm64/crypto/speck-neon-core.S
> >> @@ -0,0 +1,352 @@
> >> +// SPDX-License-Identifier: GPL-2.0
> >> +/*
> >> + * ARM64 NEON-accelerated implementation of Speck128-XTS and Speck64-XTS
> >> + *
> >> + * Copyright (c) 2018 Google, Inc
> >> + *
> >> + * Author: Eric Biggers 
> >> + */
> >> +
> >> +#include 
> >> +
> >> + .text
> >> +
> >> + // arguments
> >> + ROUND_KEYS  .reqx0  // const {u64,u32} *round_keys
> >> + NROUNDS .reqw1  // int nrounds
> >> + NROUNDS_X   .reqx1
> >> + DST .reqx2  // void *dst
> >> + SRC .reqx3  // const void *src
> >> + NBYTES  .reqw4  // unsigned int nbytes
> >> + 

Re: [RFC PATCH] crypto: arm64/speck - add NEON-accelerated implementation of Speck-XTS

2018-03-06 Thread Ard Biesheuvel
On 6 March 2018 at 12:35, Dave Martin  wrote:
> On Mon, Mar 05, 2018 at 11:17:07AM -0800, Eric Biggers wrote:
>> Add a NEON-accelerated implementation of Speck128-XTS and Speck64-XTS
>> for ARM64.  This is ported from the 32-bit version.  It may be useful on
>> devices with 64-bit ARM CPUs that don't have the Cryptography
>> Extensions, so cannot do AES efficiently -- e.g. the Cortex-A53
>> processor on the Raspberry Pi 3.
>>
>> It generally works the same way as the 32-bit version, but there are
>> some slight differences due to the different instructions, registers,
>> and syntax available in ARM64 vs. in ARM32.  For example, in the 64-bit
>> version there are enough registers to hold the XTS tweaks for each
>> 128-byte chunk, so they don't need to be saved on the stack.
>>
>> Benchmarks on a Raspberry Pi 3 running a 64-bit kernel:
>>
>>Algorithm  Encryption Decryption
>>-  -- --
>>Speck64/128-XTS (NEON) 92.2 MB/s  92.2 MB/s
>>Speck128/256-XTS (NEON)75.0 MB/s  75.0 MB/s
>>Speck128/256-XTS (generic) 47.4 MB/s  35.6 MB/s
>>AES-128-XTS (NEON bit-sliced)  33.4 MB/s  29.6 MB/s
>>AES-256-XTS (NEON bit-sliced)  24.6 MB/s  21.7 MB/s
>>
>> The code performs well on higher-end ARM64 processors as well, though
>> such processors tend to have the Crypto Extensions which make AES
>> preferred.  For example, here are the same benchmarks run on a HiKey960
>> (with CPU affinity set for the A73 cores), with the Crypto Extensions
>> implementation of AES-256-XTS added:
>>
>>Algorithm  Encryption Decryption
>>-  ------
>>AES-256-XTS (Crypto Extensions)1273.3 MB/s1274.7 MB/s
>>Speck64/128-XTS (NEON)  359.8 MB/s 348.0 MB/s
>>Speck128/256-XTS (NEON) 292.5 MB/s 286.1 MB/s
>>Speck128/256-XTS (generic)  186.3 MB/s 181.8 MB/s
>>AES-128-XTS (NEON bit-sliced)   142.0 MB/s 124.3 MB/s
>>AES-256-XTS (NEON bit-sliced)   104.7 MB/s  91.1 MB/s
>>
>> Signed-off-by: Eric Biggers 
>> ---
>>  arch/arm64/crypto/Kconfig   |   6 +
>>  arch/arm64/crypto/Makefile  |   3 +
>>  arch/arm64/crypto/speck-neon-core.S | 352 
>>  arch/arm64/crypto/speck-neon-glue.c | 282 ++
>>  4 files changed, 643 insertions(+)
>>  create mode 100644 arch/arm64/crypto/speck-neon-core.S
>>  create mode 100644 arch/arm64/crypto/speck-neon-glue.c
>>
>> diff --git a/arch/arm64/crypto/Kconfig b/arch/arm64/crypto/Kconfig
>> index 285c36c7b408..cb5a243110c4 100644
>> --- a/arch/arm64/crypto/Kconfig
>> +++ b/arch/arm64/crypto/Kconfig
>> @@ -113,4 +113,10 @@ config CRYPTO_AES_ARM64_BS
>>   select CRYPTO_AES_ARM64
>>   select CRYPTO_SIMD
>>
>> +config CRYPTO_SPECK_NEON
>> + tristate "NEON accelerated Speck cipher algorithms"
>> + depends on KERNEL_MODE_NEON
>> + select CRYPTO_BLKCIPHER
>> + select CRYPTO_SPECK
>> +
>>  endif
>> diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile
>> index cee9b8d9830b..d94ebd15a859 100644
>> --- a/arch/arm64/crypto/Makefile
>> +++ b/arch/arm64/crypto/Makefile
>> @@ -53,6 +53,9 @@ sha512-arm64-y := sha512-glue.o sha512-core.o
>>  obj-$(CONFIG_CRYPTO_CHACHA20_NEON) += chacha20-neon.o
>>  chacha20-neon-y := chacha20-neon-core.o chacha20-neon-glue.o
>>
>> +obj-$(CONFIG_CRYPTO_SPECK_NEON) += speck-neon.o
>> +speck-neon-y := speck-neon-core.o speck-neon-glue.o
>> +
>>  obj-$(CONFIG_CRYPTO_AES_ARM64) += aes-arm64.o
>>  aes-arm64-y := aes-cipher-core.o aes-cipher-glue.o
>>
>> diff --git a/arch/arm64/crypto/speck-neon-core.S 
>> b/arch/arm64/crypto/speck-neon-core.S
>> new file mode 100644
>> index ..b14463438b09
>> --- /dev/null
>> +++ b/arch/arm64/crypto/speck-neon-core.S
>> @@ -0,0 +1,352 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +/*
>> + * ARM64 NEON-accelerated implementation of Speck128-XTS and Speck64-XTS
>> + *
>> + * Copyright (c) 2018 Google, Inc
>> + *
>> + * Author: Eric Biggers 
>> + */
>> +
>> +#include 
>> +
>> + .text
>> +
>> + // arguments
>> + ROUND_KEYS  .reqx0  // const {u64,u32} *round_keys
>> + NROUNDS .reqw1  // int nrounds
>> + NROUNDS_X   .reqx1
>> + DST .reqx2  // void *dst
>> + SRC .reqx3  // const void *src
>> + NBYTES  .reqw4  // unsigned int nbytes
>> + TWEAK   .reqx5  // void *tweak
>> +
>> + // registers which hold the data being encrypted/decrypted
>> + // (underscores avoid a naming collision with ARM64 registers x0-x3)
>> + X_0 .reqv0
>> + Y_0 .reqv1

Re: [RFC PATCH] crypto: arm64/speck - add NEON-accelerated implementation of Speck-XTS

2018-03-06 Thread Dave Martin
On Mon, Mar 05, 2018 at 11:17:07AM -0800, Eric Biggers wrote:
> Add a NEON-accelerated implementation of Speck128-XTS and Speck64-XTS
> for ARM64.  This is ported from the 32-bit version.  It may be useful on
> devices with 64-bit ARM CPUs that don't have the Cryptography
> Extensions, so cannot do AES efficiently -- e.g. the Cortex-A53
> processor on the Raspberry Pi 3.
> 
> It generally works the same way as the 32-bit version, but there are
> some slight differences due to the different instructions, registers,
> and syntax available in ARM64 vs. in ARM32.  For example, in the 64-bit
> version there are enough registers to hold the XTS tweaks for each
> 128-byte chunk, so they don't need to be saved on the stack.
> 
> Benchmarks on a Raspberry Pi 3 running a 64-bit kernel:
> 
>Algorithm  Encryption Decryption
>-  -- --
>Speck64/128-XTS (NEON) 92.2 MB/s  92.2 MB/s
>Speck128/256-XTS (NEON)75.0 MB/s  75.0 MB/s
>Speck128/256-XTS (generic) 47.4 MB/s  35.6 MB/s
>AES-128-XTS (NEON bit-sliced)  33.4 MB/s  29.6 MB/s
>AES-256-XTS (NEON bit-sliced)  24.6 MB/s  21.7 MB/s
> 
> The code performs well on higher-end ARM64 processors as well, though
> such processors tend to have the Crypto Extensions which make AES
> preferred.  For example, here are the same benchmarks run on a HiKey960
> (with CPU affinity set for the A73 cores), with the Crypto Extensions
> implementation of AES-256-XTS added:
> 
>Algorithm  Encryption Decryption
>-  ------
>AES-256-XTS (Crypto Extensions)1273.3 MB/s1274.7 MB/s
>Speck64/128-XTS (NEON)  359.8 MB/s 348.0 MB/s
>Speck128/256-XTS (NEON) 292.5 MB/s 286.1 MB/s
>Speck128/256-XTS (generic)  186.3 MB/s 181.8 MB/s
>AES-128-XTS (NEON bit-sliced)   142.0 MB/s 124.3 MB/s
>AES-256-XTS (NEON bit-sliced)   104.7 MB/s  91.1 MB/s
> 
> Signed-off-by: Eric Biggers 
> ---
>  arch/arm64/crypto/Kconfig   |   6 +
>  arch/arm64/crypto/Makefile  |   3 +
>  arch/arm64/crypto/speck-neon-core.S | 352 
>  arch/arm64/crypto/speck-neon-glue.c | 282 ++
>  4 files changed, 643 insertions(+)
>  create mode 100644 arch/arm64/crypto/speck-neon-core.S
>  create mode 100644 arch/arm64/crypto/speck-neon-glue.c
> 
> diff --git a/arch/arm64/crypto/Kconfig b/arch/arm64/crypto/Kconfig
> index 285c36c7b408..cb5a243110c4 100644
> --- a/arch/arm64/crypto/Kconfig
> +++ b/arch/arm64/crypto/Kconfig
> @@ -113,4 +113,10 @@ config CRYPTO_AES_ARM64_BS
>   select CRYPTO_AES_ARM64
>   select CRYPTO_SIMD
>  
> +config CRYPTO_SPECK_NEON
> + tristate "NEON accelerated Speck cipher algorithms"
> + depends on KERNEL_MODE_NEON
> + select CRYPTO_BLKCIPHER
> + select CRYPTO_SPECK
> +
>  endif
> diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile
> index cee9b8d9830b..d94ebd15a859 100644
> --- a/arch/arm64/crypto/Makefile
> +++ b/arch/arm64/crypto/Makefile
> @@ -53,6 +53,9 @@ sha512-arm64-y := sha512-glue.o sha512-core.o
>  obj-$(CONFIG_CRYPTO_CHACHA20_NEON) += chacha20-neon.o
>  chacha20-neon-y := chacha20-neon-core.o chacha20-neon-glue.o
>  
> +obj-$(CONFIG_CRYPTO_SPECK_NEON) += speck-neon.o
> +speck-neon-y := speck-neon-core.o speck-neon-glue.o
> +
>  obj-$(CONFIG_CRYPTO_AES_ARM64) += aes-arm64.o
>  aes-arm64-y := aes-cipher-core.o aes-cipher-glue.o
>  
> diff --git a/arch/arm64/crypto/speck-neon-core.S 
> b/arch/arm64/crypto/speck-neon-core.S
> new file mode 100644
> index ..b14463438b09
> --- /dev/null
> +++ b/arch/arm64/crypto/speck-neon-core.S
> @@ -0,0 +1,352 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * ARM64 NEON-accelerated implementation of Speck128-XTS and Speck64-XTS
> + *
> + * Copyright (c) 2018 Google, Inc
> + *
> + * Author: Eric Biggers 
> + */
> +
> +#include 
> +
> + .text
> +
> + // arguments
> + ROUND_KEYS  .reqx0  // const {u64,u32} *round_keys
> + NROUNDS .reqw1  // int nrounds
> + NROUNDS_X   .reqx1
> + DST .reqx2  // void *dst
> + SRC .reqx3  // const void *src
> + NBYTES  .reqw4  // unsigned int nbytes
> + TWEAK   .reqx5  // void *tweak
> +
> + // registers which hold the data being encrypted/decrypted
> + // (underscores avoid a naming collision with ARM64 registers x0-x3)
> + X_0 .reqv0
> + Y_0 .reqv1
> + X_1 .reqv2
> + Y_1 .reqv3
> + X_2 .reqv4
> + Y_2 .reqv5
> + X_3