Re: [PATCH 1/4] crypto: powerpc - Factor out the core CRC vpmsum algorithm

2017-03-16 Thread Daniel Axtens
> So although this sits in arch/powerpc, it's heavy on the crypto which is
> not my area of expertise (to say the least!), so I think it should
> probably go via Herbert and the crypto tree?

That was my thought as well. Sorry - probably should have put that in
the comments somewhere.

Regards,
Daniel


RE: [PATCH 1/4] crypto: powerpc - Factor out the core CRC vpmsum algorithm

2017-03-15 Thread Daniel Axtens
Hi David,

> While not part of this change, the unrolled loops look as though
> they just destroy the cpu cache.
> I'd like be convinced that anything does CRC over long enough buffers
> to make it a gain at all.
>
> With modern (not that modern now) superscalar cpus you can often
> get the loop instructions 'for free'.
> Sometimes pipelining the loop is needed to get full throughput.
> Unlike the IP checksum, you don't even have to 'loop carry' the
> cpu carry flag.

Internal testing on a NVMe device with T10DIF enabled on 4k blocks
shows a 20x - 30x improvement. Without these patches, crc_t10dif_generic
uses over 60% of CPU time - with these patches CRC drops to single
digits.

I should probably have lead with that, sorry.

FWIW, the original patch showed a 3.7x gain on btrfs as well -
6dd7a82cc54e ("crypto: powerpc - Add POWER8 optimised crc32c")

When Anton wrote the original code he had access to IBM's internal
tooling for looking at how instructions flow through the various stages
of the CPU, so I trust it's pretty much optimal from that point of view.

Regards,
Daniel


[PATCH 4/4] crypto: powerpc - Stress test for vpmsum implementations

2017-03-15 Thread Daniel Axtens
vpmsum implementations often don't kick in for short test vectors.
This is a simple test module that does a configurable number of
random tests, each up to 64kB and each with random offsets.

Both CRC-T10DIF and CRC32C are tested.

Cc: Anton Blanchard <an...@samba.org>
Signed-off-by: Daniel Axtens <d...@axtens.net>

--

Not super fussy about the inclusion or otherwise of this - it was very
useful for debugging my code, and more tests are good :)

Also, I originally found the bug in Anton's CRC32c using this.

Tests pass on both BE 64 bit and LE 64 bit.
---
 arch/powerpc/crypto/Makefile  |   1 +
 arch/powerpc/crypto/crc-vpmsum_test.c | 137 ++
 crypto/Kconfig|   8 ++
 3 files changed, 146 insertions(+)
 create mode 100644 arch/powerpc/crypto/crc-vpmsum_test.c

diff --git a/arch/powerpc/crypto/Makefile b/arch/powerpc/crypto/Makefile
index e66aaf19764d..67eca3af9fc7 100644
--- a/arch/powerpc/crypto/Makefile
+++ b/arch/powerpc/crypto/Makefile
@@ -11,6 +11,7 @@ obj-$(CONFIG_CRYPTO_SHA1_PPC_SPE) += sha1-ppc-spe.o
 obj-$(CONFIG_CRYPTO_SHA256_PPC_SPE) += sha256-ppc-spe.o
 obj-$(CONFIG_CRYPTO_CRC32C_VPMSUM) += crc32c-vpmsum.o
 obj-$(CONFIG_CRYPTO_CRCT10DIF_VPMSUM) += crct10dif-vpmsum.o
+obj-$(CONFIG_CRYPTO_VPMSUM_TESTER) += crc-vpmsum_test.o
 
 aes-ppc-spe-y := aes-spe-core.o aes-spe-keys.o aes-tab-4k.o aes-spe-modes.o 
aes-spe-glue.o
 md5-ppc-y := md5-asm.o md5-glue.o
diff --git a/arch/powerpc/crypto/crc-vpmsum_test.c 
b/arch/powerpc/crypto/crc-vpmsum_test.c
new file mode 100644
index ..d58242557f33
--- /dev/null
+++ b/arch/powerpc/crypto/crc-vpmsum_test.c
@@ -0,0 +1,137 @@
+/*
+ * CRC vpmsum tester
+ * Copyright 2017 Daniel Axtens, IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+static unsigned long iterations = 1;
+
+#define MAX_CRC_LENGTH 65535
+
+
+static int __init crc_test_init(void)
+{
+   u16 crc16 = 0, verify16 = 0;
+   u32 crc32 = 0, verify32 = 0;
+   __le32 verify32le = 0;
+   unsigned char *data;
+   unsigned long i;
+   int ret;
+
+   struct crypto_shash *crct10dif_tfm;
+   struct crypto_shash *crc32c_tfm;
+   
+   if (!cpu_has_feature(CPU_FTR_ARCH_207S))
+   return -ENODEV;
+   
+   data = kmalloc(MAX_CRC_LENGTH, GFP_KERNEL);
+   if (!data)
+   return -ENOMEM;
+
+   crct10dif_tfm = crypto_alloc_shash("crct10dif", 0, 0);
+
+   if (IS_ERR(crct10dif_tfm)) {
+   pr_err("Error allocating crc-t10dif\n");
+   goto free_buf;
+   }
+
+   crc32c_tfm = crypto_alloc_shash("crc32c", 0, 0);
+
+   if (IS_ERR(crc32c_tfm)) {
+   pr_err("Error allocating crc32c\n");
+   goto free_16;
+   }
+
+   do {
+   SHASH_DESC_ON_STACK(crct10dif_shash, crct10dif_tfm);
+   SHASH_DESC_ON_STACK(crc32c_shash, crc32c_tfm);
+
+   crct10dif_shash->tfm = crct10dif_tfm;
+   ret = crypto_shash_init(crct10dif_shash);
+
+   if (ret) {
+   pr_err("Error initing crc-t10dif\n");
+   goto free_32;
+   }
+   
+
+   crc32c_shash->tfm = crc32c_tfm;
+   ret = crypto_shash_init(crc32c_shash);
+
+   if (ret) {
+   pr_err("Error initing crc32c\n");
+   goto free_32;
+   }
+   
+   pr_info("crc-vpmsum_test begins, %lu iterations\n", iterations);
+   for (i=0; i<iterations; i++) {
+   size_t len, offset;
+
+   get_random_bytes(data, MAX_CRC_LENGTH);
+   get_random_bytes(, sizeof(len));
+   get_random_bytes(, sizeof(offset));
+   
+   len %= MAX_CRC_LENGTH;
+   offset &= 15;
+   if (len <= offset)
+   continue;
+   len -= offset;
+   
+   crypto_shash_update(crct10dif_shash, data+offset, len);
+   crypto_shash_final(crct10dif_shash, (u8 *)());
+   verify16 = crc_t10dif_generic(verify16, data+offset, 
len);
+
+   
+   if (crc16 != verify16) {
+   pr_err("FAILURE in CRC16: got 0x%04x expected 
0x%04x (len %lu)\n",
+  crc16, verify16, len);
+   break;
+   }
+
+   crypt

[PATCH 2/4] crypto: powerpc - Re-enable non-REFLECTed CRCs

2017-03-15 Thread Daniel Axtens
When CRC32c was included in the kernel, Anton ripped out
the #ifdefs around reflected polynomials, because CRC32c
is always reflected. However, not all CRCs use reflection
so we'd like to make it optional.

Restore the REFLECT parts from Anton's original CRC32
implementation (https://github.com/antonblanchard/crc32-vpmsum)

That implementation is available under GPLv2+, so we're OK
from a licensing point of view:
https://github.com/antonblanchard/crc32-vpmsum/blob/master/LICENSE.TXT

As CRC32c requires REFLECT, add that #define.

Cc: Anton Blanchard <an...@samba.org>
Signed-off-by: Daniel Axtens <d...@axtens.net>

---

I compared the disassembly of the CRC32c module on LE before and
after the change, and verified that they were the same.

I verified that the crypto self-tests still pass on LE and BE, and
my tests in patch 4 still pass as well.
---
 arch/powerpc/crypto/crc32-vpmsum_core.S | 31 ++-
 arch/powerpc/crypto/crc32c-vpmsum_asm.S |  1 +
 2 files changed, 31 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/crypto/crc32-vpmsum_core.S 
b/arch/powerpc/crypto/crc32-vpmsum_core.S
index 629244ef170e..87fabf4d391a 100644
--- a/arch/powerpc/crypto/crc32-vpmsum_core.S
+++ b/arch/powerpc/crypto/crc32-vpmsum_core.S
@@ -35,7 +35,9 @@
 
.text
 
-#if defined(__BIG_ENDIAN__)
+#if defined(__BIG_ENDIAN__) && defined(REFLECT)
+#define BYTESWAP_DATA
+#elif defined(__LITTLE_ENDIAN__) && !defined(REFLECT)
 #define BYTESWAP_DATA
 #else
 #undef BYTESWAP_DATA
@@ -108,7 +110,11 @@ FUNC_START(CRC_FUNCTION_NAME)
/* Get the initial value into v8 */
vxorv8,v8,v8
MTVRD(v8, R3)
+#ifdef REFLECT
vsldoi  v8,zeroes,v8,8  /* shift into bottom 32 bits */
+#else
+   vsldoi  v8,v8,zeroes,4  /* shift into top 32 bits */
+#endif
 
 #ifdef BYTESWAP_DATA
addis   r3,r2,.byteswap_constant@toc@ha
@@ -354,6 +360,7 @@ FUNC_START(CRC_FUNCTION_NAME)
vxorv6,v6,v14
vxorv7,v7,v15
 
+#ifdef REFLECT
/*
 * vpmsumd produces a 96 bit result in the least significant bits
 * of the register. Since we are bit reflected we have to shift it
@@ -368,6 +375,7 @@ FUNC_START(CRC_FUNCTION_NAME)
vsldoi  v5,v5,zeroes,4
vsldoi  v6,v6,zeroes,4
vsldoi  v7,v7,zeroes,4
+#endif
 
/* xor with last 1024 bits */
lvx v8,0,r4
@@ -511,13 +519,33 @@ FUNC_START(CRC_FUNCTION_NAME)
vsldoi  v1,v0,v0,8
vxorv0,v0,v1/* xor two 64 bit results together */
 
+#ifdef REFLECT
/* shift left one bit */
vspltisb v1,1
vsl v0,v0,v1
+#endif
 
vandv0,v0,mask_64bit
+#ifndef REFLECT
+   /*
+* Now for the Barrett reduction algorithm. The idea is to calculate q,
+* the multiple of our polynomial that we need to subtract. By
+* doing the computation 2x bits higher (ie 64 bits) and shifting the
+* result back down 2x bits, we round down to the nearest multiple.
+*/
+   VPMSUMD(v1,v0,const1)   /* ma */
+   vsldoi  v1,zeroes,v1,8  /* q = floor(ma/(2^64)) */
+   VPMSUMD(v1,v1,const2)   /* qn */
+   vxorv0,v0,v1/* a - qn, subtraction is xor in GF(2) */
 
/*
+* Get the result into r3. We need to shift it left 8 bytes:
+* V0 [ 0 1 2 X ]
+* V0 [ 0 X 2 3 ]
+*/
+   vsldoi  v0,v0,zeroes,8  /* shift result into top 64 bits */
+#else
+   /*
 * The reflected version of Barrett reduction. Instead of bit
 * reflecting our data (which is expensive to do), we bit reflect our
 * constants and our algorithm, which means the intermediate data in
@@ -537,6 +565,7 @@ FUNC_START(CRC_FUNCTION_NAME)
 * V0 [ 0 X 2 3 ]
 */
vsldoi  v0,v0,zeroes,4  /* shift result into top 64 bits of */
+#endif
 
/* Get it into r3 */
MFVRD(R3, v0)
diff --git a/arch/powerpc/crypto/crc32c-vpmsum_asm.S 
b/arch/powerpc/crypto/crc32c-vpmsum_asm.S
index c0d080caefc1..d2bea48051a0 100644
--- a/arch/powerpc/crypto/crc32c-vpmsum_asm.S
+++ b/arch/powerpc/crypto/crc32c-vpmsum_asm.S
@@ -842,4 +842,5 @@
.octa 0x000105ec76f1
 
 #define CRC_FUNCTION_NAME __crc32c_vpmsum
+#define REFLECT
 #include "crc32-vpmsum_core.S"
-- 
2.9.3



[PATCH 3/4] crypto: powerpc - Add CRC-T10DIF acceleration

2017-03-15 Thread Daniel Axtens
T10DIF is a CRC16 used heavily in NVMe.

It turns out we can accelerate it with a CRC32 library and a few
little tricks.

Provide the accelerator based the refactored CRC32 code.

Cc: Anton Blanchard <an...@samba.org>
Thanks-to: Hong Bo Peng <pen...@cn.ibm.com>
Signed-off-by: Daniel Axtens <d...@axtens.net>
---
 arch/powerpc/crypto/Makefile|   2 +
 arch/powerpc/crypto/crct10dif-vpmsum_asm.S  | 850 
 arch/powerpc/crypto/crct10dif-vpmsum_glue.c | 125 
 crypto/Kconfig  |   9 +
 4 files changed, 986 insertions(+)
 create mode 100644 arch/powerpc/crypto/crct10dif-vpmsum_asm.S
 create mode 100644 arch/powerpc/crypto/crct10dif-vpmsum_glue.c

diff --git a/arch/powerpc/crypto/Makefile b/arch/powerpc/crypto/Makefile
index 87f40454bad3..e66aaf19764d 100644
--- a/arch/powerpc/crypto/Makefile
+++ b/arch/powerpc/crypto/Makefile
@@ -10,6 +10,7 @@ obj-$(CONFIG_CRYPTO_SHA1_PPC) += sha1-powerpc.o
 obj-$(CONFIG_CRYPTO_SHA1_PPC_SPE) += sha1-ppc-spe.o
 obj-$(CONFIG_CRYPTO_SHA256_PPC_SPE) += sha256-ppc-spe.o
 obj-$(CONFIG_CRYPTO_CRC32C_VPMSUM) += crc32c-vpmsum.o
+obj-$(CONFIG_CRYPTO_CRCT10DIF_VPMSUM) += crct10dif-vpmsum.o
 
 aes-ppc-spe-y := aes-spe-core.o aes-spe-keys.o aes-tab-4k.o aes-spe-modes.o 
aes-spe-glue.o
 md5-ppc-y := md5-asm.o md5-glue.o
@@ -17,3 +18,4 @@ sha1-powerpc-y := sha1-powerpc-asm.o sha1.o
 sha1-ppc-spe-y := sha1-spe-asm.o sha1-spe-glue.o
 sha256-ppc-spe-y := sha256-spe-asm.o sha256-spe-glue.o
 crc32c-vpmsum-y := crc32c-vpmsum_asm.o crc32c-vpmsum_glue.o
+crct10dif-vpmsum-y := crct10dif-vpmsum_asm.o crct10dif-vpmsum_glue.o
diff --git a/arch/powerpc/crypto/crct10dif-vpmsum_asm.S 
b/arch/powerpc/crypto/crct10dif-vpmsum_asm.S
new file mode 100644
index ..5e3d81a0af1b
--- /dev/null
+++ b/arch/powerpc/crypto/crct10dif-vpmsum_asm.S
@@ -0,0 +1,850 @@
+/*
+ * Calculate a CRC T10DIF  with vpmsum acceleration
+ *
+ * Constants generated by crc32-vpmsum, available at
+ * https://github.com/antonblanchard/crc32-vpmsum
+ *
+ * crc32-vpmsum is
+ * Copyright (C) 2015 Anton Blanchard <an...@au.ibm.com>, IBM
+ * and is available under the GPL v2 or later.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+   .section.rodata
+.balign 16
+
+.byteswap_constant:
+   /* byte reverse permute constant */
+   .octa 0x0F0E0D0C0B0A09080706050403020100
+
+.constants:
+
+   /* Reduce 262144 kbits to 1024 bits */
+   /* x^261184 mod p(x), x^261120 mod p(x) */
+   .octa 0x56d35255
+
+   /* x^260160 mod p(x), x^260096 mod p(x) */
+   .octa 0xee67a1e4
+
+   /* x^259136 mod p(x), x^259072 mod p(x) */
+   .octa 0x60834ad1
+
+   /* x^258112 mod p(x), x^258048 mod p(x) */
+   .octa 0x8cfe9ab4
+
+   /* x^257088 mod p(x), x^257024 mod p(x) */
+   .octa 0x3e93fdb5
+
+   /* x^256064 mod p(x), x^256000 mod p(x) */
+   .octa 0x3c204548
+
+   /* x^255040 mod p(x), x^254976 mod p(x) */
+   .octa 0xb1fc8d69
+
+   /* x^254016 mod p(x), x^253952 mod p(x) */
+   .octa 0xf82b24ad
+
+   /* x^252992 mod p(x), x^252928 mod p(x) */
+   .octa 0x44429f1a
+
+   /* x^251968 mod p(x), x^251904 mod p(x) */
+   .octa 0xe88c66ec
+
+   /* x^250944 mod p(x), x^250880 mod p(x) */
+   .octa 0x385cc87d
+
+   /* x^249920 mod p(x), x^249856 mod p(x) */
+   .octa 0x3227c8ff
+
+   /* x^248896 mod p(x), x^248832 mod p(x) */
+   .octa 0xa9a93344
+
+   /* x^247872 mod p(x), x^247808 mod p(x) */
+   .octa 0xabaa66eb
+
+   /* x^246848 mod p(x), x^246784 mod p(x) */
+   .octa 0x1ac3c4ef
+
+   /* x^245824 mod p(x), x^245760 mod p(x) */
+   .octa 0x63f056f3
+
+   /* x^244800 mod p(x), x^244736 mod p(x) */
+   .octa 0x32cc0205
+
+   /* x^243776 mod p(x), x^243712 mod p(x) */
+   .octa 0xf8b5568e
+
+   /* x^242752 mod p(x), x^242688 mod p(x) */
+   .octa 0x8db16429
+
+   /* x^241728 mod p(x), x^241664 mod p(x) */
+   .octa 0x59ca6b66
+
+   /* x^240704 mod p(x), x^240640 mod p(x) */
+   .octa 0x5f5c18f8
+
+   /* x^239680 mod p(x), x^239616 mod p(x) */
+   .octa 0x61afb609
+
+   /* x^238656 mod p(x), x^238592 mod p(x) */
+   .octa 0x

[PATCH 1/4] crypto: powerpc - Factor out the core CRC vpmsum algorithm

2017-03-15 Thread Daniel Axtens
The core nuts and bolts of the crc32c vpmsum algorithm will
also work for a number of other CRC algorithms with different
polynomials. Factor out the function into a new asm file.

To handle multiple users of the function, a user simply
provides constants, defines the name of their CRC function,
and then #includes the core algorithm file.

Cc: Anton Blanchard <an...@samba.org>
Signed-off-by: Daniel Axtens <d...@axtens.net>

--

It's possible at this point to argue that the address
of the constant tables should be passed in to the function,
rather than doing this somewhat unconventional #include.

However, we're about to add further #ifdef's back into the core
that will be provided by the encapsulaing code, and which couldn't
be done as a variable without performance loss.
---
 arch/powerpc/crypto/crc32-vpmsum_core.S | 726 
 arch/powerpc/crypto/crc32c-vpmsum_asm.S | 714 +--
 2 files changed, 729 insertions(+), 711 deletions(-)
 create mode 100644 arch/powerpc/crypto/crc32-vpmsum_core.S

diff --git a/arch/powerpc/crypto/crc32-vpmsum_core.S 
b/arch/powerpc/crypto/crc32-vpmsum_core.S
new file mode 100644
index ..629244ef170e
--- /dev/null
+++ b/arch/powerpc/crypto/crc32-vpmsum_core.S
@@ -0,0 +1,726 @@
+/*
+ * Core of the accelerated CRC algorithm.
+ * In your file, define the constants and CRC_FUNCTION_NAME
+ * Then include this file.
+ *
+ * Calculate the checksum of data that is 16 byte aligned and a multiple of
+ * 16 bytes.
+ *
+ * The first step is to reduce it to 1024 bits. We do this in 8 parallel
+ * chunks in order to mask the latency of the vpmsum instructions. If we
+ * have more than 32 kB of data to checksum we repeat this step multiple
+ * times, passing in the previous 1024 bits.
+ *
+ * The next step is to reduce the 1024 bits to 64 bits. This step adds
+ * 32 bits of 0s to the end - this matches what a CRC does. We just
+ * calculate constants that land the data in this 32 bits.
+ *
+ * We then use fixed point Barrett reduction to compute a mod n over GF(2)
+ * for n = CRC using POWER8 instructions. We use x = 32.
+ *
+ * http://en.wikipedia.org/wiki/Barrett_reduction
+ *
+ * Copyright (C) 2015 Anton Blanchard <an...@au.ibm.com>, IBM
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+*/
+   
+#include 
+#include 
+
+#define MAX_SIZE   32768
+
+   .text
+
+#if defined(__BIG_ENDIAN__)
+#define BYTESWAP_DATA
+#else
+#undef BYTESWAP_DATA
+#endif
+
+#define off16  r25
+#define off32  r26
+#define off48  r27
+#define off64  r28
+#define off80  r29
+#define off96  r30
+#define off112 r31
+
+#define const1 v24
+#define const2 v25
+
+#define byteswap   v26
+#definemask_32bit  v27
+#definemask_64bit  v28
+#define zeroes v29
+
+#ifdef BYTESWAP_DATA
+#define VPERM(A, B, C, D) vpermA, B, C, D
+#else
+#define VPERM(A, B, C, D)
+#endif
+
+/* unsigned int CRC_FUNCTION_NAME(unsigned int crc, void *p, unsigned long 
len) */
+FUNC_START(CRC_FUNCTION_NAME)
+   std r31,-8(r1)
+   std r30,-16(r1)
+   std r29,-24(r1)
+   std r28,-32(r1)
+   std r27,-40(r1)
+   std r26,-48(r1)
+   std r25,-56(r1)
+
+   li  off16,16
+   li  off32,32
+   li  off48,48
+   li  off64,64
+   li  off80,80
+   li  off96,96
+   li  off112,112
+   li  r0,0
+
+   /* Enough room for saving 10 non volatile VMX registers */
+   subir6,r1,56+10*16
+   subir7,r1,56+2*16
+
+   stvxv20,0,r6
+   stvxv21,off16,r6
+   stvxv22,off32,r6
+   stvxv23,off48,r6
+   stvxv24,off64,r6
+   stvxv25,off80,r6
+   stvxv26,off96,r6
+   stvxv27,off112,r6
+   stvxv28,0,r7
+   stvxv29,off16,r7
+
+   mr  r10,r3
+
+   vxorzeroes,zeroes,zeroes
+   vspltisw v0,-1
+
+   vsldoi  mask_32bit,zeroes,v0,4
+   vsldoi  mask_64bit,zeroes,v0,8
+
+   /* Get the initial value into v8 */
+   vxorv8,v8,v8
+   MTVRD(v8, R3)
+   vsldoi  v8,zeroes,v8,8  /* shift into bottom 32 bits */
+
+#ifdef BYTESWAP_DATA
+   addis   r3,r2,.byteswap_constant@toc@ha
+   addir3,r3,.byteswap_constant@toc@l
+
+   lvx byteswap,0,r3
+   addir3,r3,16
+#endif
+
+   cmpdi   r5,256
+   blt .Lshort
+
+   rldicr  r6,r5,0,56
+
+   /* Checksum in blocks of MAX_SIZE */
+1: lis r7,MAX_SIZE@h
+   ori r7,r7,MAX_SIZE@l
+   mr  r9,r7
+   cmpdr6,r7
+   bgt 2f
+   mr  r7,r6
+2: subfr6,r7,r6
+
+   /* our main loop does 128 bytes at a ti

[PATCH] crypto: powerpc - Fix initialisation of crc32c context

2017-03-02 Thread Daniel Axtens
Turning on crypto self-tests on a POWER8 shows:

alg: hash: Test 1 failed for crc32c-vpmsum
: ff ff ff ff

Comparing the code with the Intel CRC32c implementation on which
ours is based shows that we are doing an init with 0, not ~0
as CRC32c requires.

This probably wasn't caught because btrfs does its own weird
open-coded initialisation.

Initialise our internal context to ~0 on init.

This makes the self-tests pass, and btrfs continues to work.

Fixes: 6dd7a82cc54e ("crypto: powerpc - Add POWER8 optimised crc32c")
Cc: Anton Blanchard <an...@samba.org>
Cc: sta...@vger.kernel.org
Signed-off-by: Daniel Axtens <d...@axtens.net>
---
 arch/powerpc/crypto/crc32c-vpmsum_glue.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/crypto/crc32c-vpmsum_glue.c 
b/arch/powerpc/crypto/crc32c-vpmsum_glue.c
index 9fa046d56eba..411994551afc 100644
--- a/arch/powerpc/crypto/crc32c-vpmsum_glue.c
+++ b/arch/powerpc/crypto/crc32c-vpmsum_glue.c
@@ -52,7 +52,7 @@ static int crc32c_vpmsum_cra_init(struct crypto_tfm *tfm)
 {
u32 *key = crypto_tfm_ctx(tfm);
 
-   *key = 0;
+   *key = ~0;
 
return 0;
 }
-- 
2.9.3



Re: crypto/nx842: Ignore queue overflow informative error

2015-12-06 Thread Daniel Axtens
Haren Myneni  writes:

> NX842 coprocessor sets bit 3 if queue is overflow. It is just for
> information to the user. So the driver prints this informative message
> and ignores it.

What queue, and what happens when the queue overflows? It seems like
*something* would need to be done, somewhere, by someone?

I realise that as a piece of IBM hardware this is probably an incredibly
optimistic question, but is this behaviour documented publically anywhere?
(As a distant second best, is it documented internally anywhere that I
can read?)

> --- a/drivers/crypto/nx/nx-842-powernv.c
> +++ b/drivers/crypto/nx/nx-842-powernv.c
> @@ -442,6 +442,15 @@ static int nx842_powernv_function(const unsigned char 
> *in, unsigned int inlen,
>(unsigned int)ccw,
>(unsigned int)be32_to_cpu(crb->ccw));
>  
> + /*
> +  * NX842 coprocessor uses 3rd bit to report queue overflow which is
> +  * not an error, just for information to user. So, ignore this bit.
> +  */
> + if (ret & ICSWX_BIT3) {
> + pr_info_ratelimited("842 coprocessor queue overflow\n");
It doesn't look like this is done anywhere else in the file, but should
this be prefixed with something? Something like "nx-842: Coprocessor
queue overflow"?

Regards,
Daniel

> + ret &= ~ICSWX_BIT3;
> + }
> +
>   switch (ret) {
>   case ICSWX_INITIATED:
>   ret = wait_for_csb(wmem, csb);
>
>
> ___
> Linuxppc-dev mailing list
> linuxppc-...@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev


signature.asc
Description: PGP signature