Re: [PATCH 1/4] crypto: powerpc - Factor out the core CRC vpmsum algorithm

2017-03-16 Thread Anton Blanchard
Hi David,

> While not part of this change, the unrolled loops look as though
> they just destroy the cpu cache.
> I'd like be convinced that anything does CRC over long enough buffers
> to make it a gain at all.

btrfs data checksumming is one area.

> With modern (not that modern now) superscalar cpus you can often
> get the loop instructions 'for free'.

A branch on POWER8 is a three cycle redirect. The vpmsum instructions
are 6 cycles.

> Sometimes pipelining the loop is needed to get full throughput.
> Unlike the IP checksum, you don't even have to 'loop carry' the
> cpu carry flag.

It went through quite a lot of simulation to reach peak performance.
The loop is quite delicate, we have to pace it just right to avoid
some pipeline reject conditions.

Note also that we already modulo schedule the loop across three
iterations, required to hide the latency of the vpmsum instructions.

Anton


Re: [PATCH] crypto: powerpc - Fix initialisation of crc32c context

2017-03-05 Thread Anton Blanchard
Hi Daniel,

> Turning on crypto self-tests on a POWER8 shows:
> 
> alg: hash: Test 1 failed for crc32c-vpmsum
> : ff ff ff ff
> 
> Comparing the code with the Intel CRC32c implementation on which
> ours is based shows that we are doing an init with 0, not ~0
> as CRC32c requires.
> 
> This probably wasn't caught because btrfs does its own weird
> open-coded initialisation.
> 
> Initialise our internal context to ~0 on init.
> 
> This makes the self-tests pass, and btrfs continues to work.

Thanks! Not sure how I screwed that up.

Acked-by: Anton Blanchard <an...@samba.org>

> Fixes: 6dd7a82cc54e ("crypto: powerpc - Add POWER8 optimised crc32c")
> Cc: Anton Blanchard <an...@samba.org>
> Cc: sta...@vger.kernel.org
> Signed-off-by: Daniel Axtens <d...@axtens.net>
> ---
>  arch/powerpc/crypto/crc32c-vpmsum_glue.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/crypto/crc32c-vpmsum_glue.c
> b/arch/powerpc/crypto/crc32c-vpmsum_glue.c index
> 9fa046d56eba..411994551afc 100644 ---
> a/arch/powerpc/crypto/crc32c-vpmsum_glue.c +++
> b/arch/powerpc/crypto/crc32c-vpmsum_glue.c @@ -52,7 +52,7 @@ static
> int crc32c_vpmsum_cra_init(struct crypto_tfm *tfm) {
>   u32 *key = crypto_tfm_ctx(tfm);
>  
> - *key = 0;
> + *key = ~0;
>  
>   return 0;
>  }



Re: [PATCH] crypto: powerpc - Rename CRYPT_CRC32C_VPMSUM option

2016-11-22 Thread Anton Blanchard
Hi Jean,

> For consistency with the other 246 kernel configuration options,
> rename CRYPT_CRC32C_VPMSUM to CRYPTO_CRC32C_VPMSUM.

Thanks! Not sure how I missed that.

Acked-by: Anton Blanchard <an...@samba.org>

Anton
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] Fix crypto/vmx/p8_ghash memory corruption

2016-09-28 Thread Anton Blanchard
Hi Marcelo

> This series fixes the memory corruption found by Jan Stancek in
> 4.8-rc7. The problem however also affects previous versions of the
> driver.

If it affects previous versions, please add the lines in the sign off to
get it into the stable kernels.

Anton
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] crypto: crc32c-vpmsum - Convert to CPU feature based module autoloading

2016-08-04 Thread Anton Blanchard
Hi Michael,

> Is VEC_CRYPTO the right feature?
> 
> That's new power8 crypto stuff.

The vpmsum* instructions are part of the same pipeline as the vcipher*
instructions, introduced in POWER8.

> I thought this only used VMX? (but I haven't looked closely)

Yes, vcipher* and vpmsum* are VMX instructions.

Anton
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] crypto: crc32c-vpmsum - Convert to CPU feature based module autoloading

2016-08-04 Thread Anton Blanchard
From: Anton Blanchard <an...@samba.org>

This patch utilises the GENERIC_CPU_AUTOPROBE infrastructure
to automatically load the crc32c-vpmsum module if the CPU supports
it.

Signed-off-by: Anton Blanchard <an...@samba.org>
---
 arch/powerpc/crypto/crc32c-vpmsum_glue.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/crypto/crc32c-vpmsum_glue.c 
b/arch/powerpc/crypto/crc32c-vpmsum_glue.c
index bfe3d37..9fa046d 100644
--- a/arch/powerpc/crypto/crc32c-vpmsum_glue.c
+++ b/arch/powerpc/crypto/crc32c-vpmsum_glue.c
@@ -4,6 +4,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #define CHKSUM_BLOCK_SIZE  1
@@ -157,7 +158,7 @@ static void __exit crc32c_vpmsum_mod_fini(void)
crypto_unregister_shash();
 }
 
-module_init(crc32c_vpmsum_mod_init);
+module_cpu_feature_match(PPC_MODULE_FEATURE_VEC_CRYPTO, 
crc32c_vpmsum_mod_init);
 module_exit(crc32c_vpmsum_mod_fini);
 
 MODULE_AUTHOR("Anton Blanchard <an...@samba.org>");
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] powerpc: define FUNC_START/FUNC_END

2016-06-30 Thread Anton Blanchard
From: Anton Blanchard <an...@samba.org>

gcc provides FUNC_START/FUNC_END macros to help with creating
assembly functions. Mirror these in the kernel so we can more easily
share code between userspace and the kernel. FUNC_END is just a
stub since we don't currently annotate the end of kernel functions.

It might make sense to do a wholesale search and replace, but for
now just create a couple of defines.

Signed-off-by: Anton Blanchard <an...@samba.org>
---
 arch/powerpc/include/asm/ppc_asm.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/powerpc/include/asm/ppc_asm.h 
b/arch/powerpc/include/asm/ppc_asm.h
index 7b591f9..7a924da 100644
--- a/arch/powerpc/include/asm/ppc_asm.h
+++ b/arch/powerpc/include/asm/ppc_asm.h
@@ -286,6 +286,9 @@ n:
 
 #endif
 
+#define FUNC_START(name)   _GLOBAL(name)
+#define FUNC_END(name)
+
 /* 
  * LOAD_REG_IMMEDIATE(rn, expr)
  *   Loads the value of the constant expression 'expr' into register 'rn'
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] crypto: powerpc: Add POWER8 optimised crc32c

2016-06-30 Thread Anton Blanchard
From: Anton Blanchard <an...@samba.org>

Use the vector polynomial multiply-sum instructions in POWER8 to
speed up crc32c.

This is just over 41x faster than the slice-by-8 method that it
replaces. Measurements on a 4.1 GHz POWER8 show it sustaining
52 GiB/sec.

A simple btrfs write performance test:

dd if=/dev/zero of=/mnt/tmpfile bs=1M count=4096
sync

is over 3.7x faster.

Signed-off-by: Anton Blanchard <an...@samba.org>
---
 arch/powerpc/crypto/Makefile |2 +
 arch/powerpc/crypto/crc32c-vpmsum_asm.S  | 1553 ++
 arch/powerpc/crypto/crc32c-vpmsum_glue.c |  167 
 arch/powerpc/include/asm/ppc-opcode.h|   12 +
 crypto/Kconfig   |   11 +
 5 files changed, 1745 insertions(+)
 create mode 100644 arch/powerpc/crypto/crc32c-vpmsum_asm.S
 create mode 100644 arch/powerpc/crypto/crc32c-vpmsum_glue.c

diff --git a/arch/powerpc/crypto/Makefile b/arch/powerpc/crypto/Makefile
index 9c221b6..7998c17 100644
--- a/arch/powerpc/crypto/Makefile
+++ b/arch/powerpc/crypto/Makefile
@@ -9,9 +9,11 @@ obj-$(CONFIG_CRYPTO_MD5_PPC) += md5-ppc.o
 obj-$(CONFIG_CRYPTO_SHA1_PPC) += sha1-powerpc.o
 obj-$(CONFIG_CRYPTO_SHA1_PPC_SPE) += sha1-ppc-spe.o
 obj-$(CONFIG_CRYPTO_SHA256_PPC_SPE) += sha256-ppc-spe.o
+obj-$(CONFIG_CRYPT_CRC32C_VPMSUM) += crc32c-vpmsum.o
 
 aes-ppc-spe-y := aes-spe-core.o aes-spe-keys.o aes-tab-4k.o aes-spe-modes.o 
aes-spe-glue.o
 md5-ppc-y := md5-asm.o md5-glue.o
 sha1-powerpc-y := sha1-powerpc-asm.o sha1.o
 sha1-ppc-spe-y := sha1-spe-asm.o sha1-spe-glue.o
 sha256-ppc-spe-y := sha256-spe-asm.o sha256-spe-glue.o
+crc32c-vpmsum-y := crc32c-vpmsum_asm.o crc32c-vpmsum_glue.o
diff --git a/arch/powerpc/crypto/crc32c-vpmsum_asm.S 
b/arch/powerpc/crypto/crc32c-vpmsum_asm.S
new file mode 100644
index 000..dc640b2
--- /dev/null
+++ b/arch/powerpc/crypto/crc32c-vpmsum_asm.S
@@ -0,0 +1,1553 @@
+/*
+ * Calculate the checksum of data that is 16 byte aligned and a multiple of
+ * 16 bytes.
+ *
+ * The first step is to reduce it to 1024 bits. We do this in 8 parallel
+ * chunks in order to mask the latency of the vpmsum instructions. If we
+ * have more than 32 kB of data to checksum we repeat this step multiple
+ * times, passing in the previous 1024 bits.
+ *
+ * The next step is to reduce the 1024 bits to 64 bits. This step adds
+ * 32 bits of 0s to the end - this matches what a CRC does. We just
+ * calculate constants that land the data in this 32 bits.
+ *
+ * We then use fixed point Barrett reduction to compute a mod n over GF(2)
+ * for n = CRC using POWER8 instructions. We use x = 32.
+ *
+ * http://en.wikipedia.org/wiki/Barrett_reduction
+ *
+ * Copyright (C) 2015 Anton Blanchard <an...@au.ibm.com>, IBM
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+#include 
+#include 
+
+   .section.rodata
+.balign 16
+
+.byteswap_constant:
+   /* byte reverse permute constant */
+   .octa 0x0F0E0D0C0B0A09080706050403020100
+
+#define MAX_SIZE   32768
+.constants:
+
+   /* Reduce 262144 kbits to 1024 bits */
+   /* x^261120 mod p(x)` << 1, x^261184 mod p(x)` << 1 */
+   .octa 0xb6ca9e209c37c408
+
+   /* x^260096 mod p(x)` << 1, x^260160 mod p(x)` << 1 */
+   .octa 0x350249a80001b51df26c
+
+   /* x^259072 mod p(x)` << 1, x^259136 mod p(x)` << 1 */
+   .octa 0x0001862dac540724b9d0
+
+   /* x^258048 mod p(x)` << 1, x^258112 mod p(x)` << 1 */
+   .octa 0x0001d87fb48c0001c00532fe
+
+   /* x^257024 mod p(x)` << 1, x^257088 mod p(x)` << 1 */
+   .octa 0x0001f39b699ef05a9362
+
+   /* x^256000 mod p(x)` << 1, x^256064 mod p(x)` << 1 */
+   .octa 0x000101da11b40001e1007970
+
+   /* x^254976 mod p(x)` << 1, x^255040 mod p(x)` << 1 */
+   .octa 0x0001cab571e0a57366ee
+
+   /* x^253952 mod p(x)` << 1, x^254016 mod p(x)` << 1 */
+   .octa 0xc7020cfe000192011284
+
+   /* x^252928 mod p(x)` << 1, x^252992 mod p(x)` << 1 */
+   .octa 0xcdaed1ae000162716d9a
+
+   /* x^251904 mod p(x)` << 1, x^251968 mod p(x)` << 1 */
+   .octa 0x0001e804effccd97ecde
+
+   /* x^250880 mod p(x)` << 1, x^250944 mod p(x)` << 1 */
+   .octa 0x77c3ea3a58812bc0
+
+   /* x^249856 mod p(x)` << 1, x^249920 mod p(x)` << 1 */
+   .octa 0x68df31b488b8c12e
+
+   /* x^248832 mod p(x)` << 1, x^248896 mod p(x)` << 1 */
+   .octa 0xb059b6c20001230b234c
+
+   /* x^247808 mod p(x)` << 1, x^247872 mod p(x)` << 1 *

[PATCH 1/2] crypto: vmx: Fix ABI detection

2016-06-10 Thread Anton Blanchard
From: Anton Blanchard <an...@samba.org>

When calling ppc-xlate.pl, we pass it either linux-ppc64 or
linux-ppc64le. The script however was expecting linux64le, a result
of its OpenSSL origins. This means we aren't obeying the ppc64le
ABIv2 rules.

Fix this by checking for linux-ppc64le.

Fixes: 5ca55738201c ("crypto: vmx - comply with ABIs that specify vrsave as 
reserved.")
Cc: sta...@vger.kernel.org
Signed-off-by: Anton Blanchard <an...@samba.org>
---
 drivers/crypto/vmx/ppc-xlate.pl | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/crypto/vmx/ppc-xlate.pl b/drivers/crypto/vmx/ppc-xlate.pl
index 9f4994c..b18e67d 100644
--- a/drivers/crypto/vmx/ppc-xlate.pl
+++ b/drivers/crypto/vmx/ppc-xlate.pl
@@ -141,7 +141,7 @@ my $vmr = sub {
 
 # Some ABIs specify vrsave, special-purpose register #256, as reserved
 # for system use.
-my $no_vrsave = ($flavour =~ /aix|linux64le/);
+my $no_vrsave = ($flavour =~ /linux-ppc64le/);
 my $mtspr = sub {
 my ($f,$idx,$ra) = @_;
 if ($idx == 256 && $no_vrsave) {
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] crypto: vmx: Increase priority of aes-cbc cipher

2016-06-10 Thread Anton Blanchard
From: Anton Blanchard <an...@samba.org>

All of the VMX AES ciphers (AES, AES-CBC and AES-CTR) are set at
priority 1000. Unfortunately this means we never use AES-CBC and
AES-CTR, because the base AES-CBC cipher that is implemented on
top of AES inherits its priority.

To fix this, AES-CBC and AES-CTR have to be a higher priority. Set
them to 2000.

Testing on a POWER8 with:

cryptsetup benchmark --cipher aes --key-size 256

Shows decryption speed increase from 402.4 MB/s to 3069.2 MB/s,
over 7x faster. Thanks to Mike Strosaker for helping me debug
this issue.

Fixes: 8c755ace357c ("crypto: vmx - Adding CBC routines for VMX module")
Cc: sta...@vger.kernel.org
Signed-off-by: Anton Blanchard <an...@samba.org>
---
 drivers/crypto/vmx/aes_cbc.c | 2 +-
 drivers/crypto/vmx/aes_ctr.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/crypto/vmx/aes_cbc.c b/drivers/crypto/vmx/aes_cbc.c
index 495577b..94ad5c0 100644
--- a/drivers/crypto/vmx/aes_cbc.c
+++ b/drivers/crypto/vmx/aes_cbc.c
@@ -182,7 +182,7 @@ struct crypto_alg p8_aes_cbc_alg = {
.cra_name = "cbc(aes)",
.cra_driver_name = "p8_aes_cbc",
.cra_module = THIS_MODULE,
-   .cra_priority = 1000,
+   .cra_priority = 2000,
.cra_type = _blkcipher_type,
.cra_flags = CRYPTO_ALG_TYPE_BLKCIPHER | CRYPTO_ALG_NEED_FALLBACK,
.cra_alignmask = 0,
diff --git a/drivers/crypto/vmx/aes_ctr.c b/drivers/crypto/vmx/aes_ctr.c
index 0a3c1b0..38ed10d 100644
--- a/drivers/crypto/vmx/aes_ctr.c
+++ b/drivers/crypto/vmx/aes_ctr.c
@@ -166,7 +166,7 @@ struct crypto_alg p8_aes_ctr_alg = {
.cra_name = "ctr(aes)",
.cra_driver_name = "p8_aes_ctr",
.cra_module = THIS_MODULE,
-   .cra_priority = 1000,
+   .cra_priority = 2000,
.cra_type = _blkcipher_type,
.cra_flags = CRYPTO_ALG_TYPE_BLKCIPHER | CRYPTO_ALG_NEED_FALLBACK,
.cra_alignmask = 0,
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] crypto/nx: disable NX on little endian builds

2014-06-06 Thread Anton Blanchard
The NX driver has endian issues so disable it for now.

Signed-off-by: Anton Blanchard an...@samba.org
---

diff --git a/drivers/crypto/Kconfig b/drivers/crypto/Kconfig
index 03ccdb0..8280a7a3 100644
--- a/drivers/crypto/Kconfig
+++ b/drivers/crypto/Kconfig
@@ -313,7 +313,7 @@ config CRYPTO_DEV_S5P
 
 config CRYPTO_DEV_NX
bool Support for IBM Power7+ in-Nest cryptographic acceleration
-   depends on PPC64  IBMVIO
+   depends on PPC64  IBMVIO  !CPU_LITTLE_ENDIAN
default n
help
  Support for Power7+ in-Nest cryptographic acceleration.
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html