[PATCH] crypto, x86: aesni - fix token pasting for clang

2017-03-15 Thread Michael Davidson
aes_ctrby8_avx-x86_64.S uses the C preprocessor for token pasting
of character sequences that are not valid preprocessor tokens.
While this is allowed when preprocessing assembler files it exposes
an incompatibilty between the clang and gcc preprocessors where
clang does not strip leading white space from macro parameters,
leading to the CONCAT(%xmm, i) macro expansion on line 96 resulting
in a token with a space character embedded in it.

While this could be resolved by deleting the offending space character,
the assembler is perfectly capable of doing the token pasting correctly
for itself so we can just get rid of the preprocessor macros.

Signed-off-by: Michael Davidson 
---
 arch/x86/crypto/aes_ctrby8_avx-x86_64.S | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/arch/x86/crypto/aes_ctrby8_avx-x86_64.S 
b/arch/x86/crypto/aes_ctrby8_avx-x86_64.S
index a916c4a61165..5f6a5af9c489 100644
--- a/arch/x86/crypto/aes_ctrby8_avx-x86_64.S
+++ b/arch/x86/crypto/aes_ctrby8_avx-x86_64.S
@@ -65,7 +65,6 @@
 #include 
 #include 
 
-#define CONCAT(a,b)a##b
 #define VMOVDQ vmovdqu
 
 #define xdata0 %xmm0
@@ -92,8 +91,6 @@
 #define num_bytes  %r8
 
 #define tmp%r10
-#defineDDQ(i)  CONCAT(ddq_add_,i)
-#defineXMM(i)  CONCAT(%xmm, i)
 #defineDDQ_DATA0
 #defineXDATA   1
 #define KEY_1281
@@ -131,12 +128,12 @@ ddq_add_8:
 /* generate a unique variable for ddq_add_x */
 
 .macro setddq n
-   var_ddq_add = DDQ(\n)
+   var_ddq_add = ddq_add_\n
 .endm
 
 /* generate a unique variable for xmm register */
 .macro setxdata n
-   var_xdata = XMM(\n)
+   var_xdata = %xmm\n
 .endm
 
 /* club the numeric 'id' to the symbol 'name' */
-- 
2.12.0.367.g23dc2f6d3c-goog



RE: [PATCH 1/4] crypto: powerpc - Factor out the core CRC vpmsum algorithm

2017-03-15 Thread Daniel Axtens
Hi David,

> While not part of this change, the unrolled loops look as though
> they just destroy the cpu cache.
> I'd like be convinced that anything does CRC over long enough buffers
> to make it a gain at all.
>
> With modern (not that modern now) superscalar cpus you can often
> get the loop instructions 'for free'.
> Sometimes pipelining the loop is needed to get full throughput.
> Unlike the IP checksum, you don't even have to 'loop carry' the
> cpu carry flag.

Internal testing on a NVMe device with T10DIF enabled on 4k blocks
shows a 20x - 30x improvement. Without these patches, crc_t10dif_generic
uses over 60% of CPU time - with these patches CRC drops to single
digits.

I should probably have lead with that, sorry.

FWIW, the original patch showed a 3.7x gain on btrfs as well -
6dd7a82cc54e ("crypto: powerpc - Add POWER8 optimised crc32c")

When Anton wrote the original code he had access to IBM's internal
tooling for looking at how instructions flow through the various stages
of the CPU, so I trust it's pretty much optimal from that point of view.

Regards,
Daniel


[PATCH V3 0/3] Support new function in new CCPs

2017-03-15 Thread Gary R Hook
The following series implements new function in a version 5 coprocessor.
New features are:
 - Support for SHA-2 384-bit and 512-bit hashing
 - Support for 3DES encryption
 - Support for AES GCM encryption

Changes from V2:
 - Correct a comment in the GCM support code.
 - Ensure the patches apply to the current repo

Changes from V1:
 - Ensure the patches build correctly

---

Gary R Hook (3):
  crypto: ccp - Add SHA-2 384- and 512-bit support
  crypto: ccp - Enable 3DES function on v5 CCPs
  crypto: ccp - Enable support for AES GCM on v5 CCPs


 drivers/crypto/ccp/Makefile|2 
 drivers/crypto/ccp/ccp-crypto-aes-galois.c |  252 ++
 drivers/crypto/ccp/ccp-crypto-des3.c   |  254 ++
 drivers/crypto/ccp/ccp-crypto-main.c   |   22 +
 drivers/crypto/ccp/ccp-crypto-sha.c|   22 +
 drivers/crypto/ccp/ccp-crypto.h|   44 ++
 drivers/crypto/ccp/ccp-dev-v3.c|1 
 drivers/crypto/ccp/ccp-dev-v5.c|   54 +++
 drivers/crypto/ccp/ccp-dev.h   |   14 +
 drivers/crypto/ccp/ccp-ops.c   |  522 
 include/linux/ccp.h|   68 
 11 files changed, 1249 insertions(+), 6 deletions(-)
 create mode 100644 drivers/crypto/ccp/ccp-crypto-aes-galois.c
 create mode 100644 drivers/crypto/ccp/ccp-crypto-des3.c

--
Signature


[PATCH V3 2/3] crypto: ccp - Enable 3DES function on v5 CCPs

2017-03-15 Thread Gary R Hook
Wire up support for Triple DES in ECB mode.

Signed-off-by: Gary R Hook 
---
 drivers/crypto/ccp/Makefile  |1 
 drivers/crypto/ccp/ccp-crypto-des3.c |  254 ++
 drivers/crypto/ccp/ccp-crypto-main.c |   10 +
 drivers/crypto/ccp/ccp-crypto.h  |   22 +++
 drivers/crypto/ccp/ccp-dev-v3.c  |1 
 drivers/crypto/ccp/ccp-dev-v5.c  |   54 +++
 drivers/crypto/ccp/ccp-dev.h |   14 ++
 drivers/crypto/ccp/ccp-ops.c |  198 +++
 include/linux/ccp.h  |   57 +++-
 9 files changed, 608 insertions(+), 3 deletions(-)
 create mode 100644 drivers/crypto/ccp/ccp-crypto-des3.c

diff --git a/drivers/crypto/ccp/Makefile b/drivers/crypto/ccp/Makefile
index 346ceb8..d2044b7 100644
--- a/drivers/crypto/ccp/Makefile
+++ b/drivers/crypto/ccp/Makefile
@@ -12,4 +12,5 @@ ccp-crypto-objs := ccp-crypto-main.o \
   ccp-crypto-aes.o \
   ccp-crypto-aes-cmac.o \
   ccp-crypto-aes-xts.o \
+  ccp-crypto-des3.o \
   ccp-crypto-sha.o
diff --git a/drivers/crypto/ccp/ccp-crypto-des3.c 
b/drivers/crypto/ccp/ccp-crypto-des3.c
new file mode 100644
index 000..5af7347
--- /dev/null
+++ b/drivers/crypto/ccp/ccp-crypto-des3.c
@@ -0,0 +1,254 @@
+/*
+ * AMD Cryptographic Coprocessor (CCP) DES3 crypto API support
+ *
+ * Copyright (C) 2016 Advanced Micro Devices, Inc.
+ *
+ * Author: Gary R Hook 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "ccp-crypto.h"
+
+static int ccp_des3_complete(struct crypto_async_request *async_req, int ret)
+{
+   struct ablkcipher_request *req = ablkcipher_request_cast(async_req);
+   struct ccp_ctx *ctx = crypto_tfm_ctx(req->base.tfm);
+   struct ccp_des3_req_ctx *rctx = ablkcipher_request_ctx(req);
+
+   if (ret)
+   return ret;
+
+   if (ctx->u.des3.mode != CCP_DES3_MODE_ECB)
+   memcpy(req->info, rctx->iv, DES3_EDE_BLOCK_SIZE);
+
+   return 0;
+}
+
+static int ccp_des3_setkey(struct crypto_ablkcipher *tfm, const u8 *key,
+   unsigned int key_len)
+{
+   struct ccp_ctx *ctx = crypto_tfm_ctx(crypto_ablkcipher_tfm(tfm));
+   struct ccp_crypto_ablkcipher_alg *alg =
+   ccp_crypto_ablkcipher_alg(crypto_ablkcipher_tfm(tfm));
+   u32 *flags = >base.crt_flags;
+
+
+   /* From des_generic.c:
+*
+* RFC2451:
+*   If the first two or last two independent 64-bit keys are
+*   equal (k1 == k2 or k2 == k3), then the DES3 operation is simply the
+*   same as DES.  Implementers MUST reject keys that exhibit this
+*   property.
+*/
+   const u32 *K = (const u32 *)key;
+
+   if (unlikely(!((K[0] ^ K[2]) | (K[1] ^ K[3])) ||
+!((K[2] ^ K[4]) | (K[3] ^ K[5]))) &&
+(*flags & CRYPTO_TFM_REQ_WEAK_KEY)) {
+   *flags |= CRYPTO_TFM_RES_WEAK_KEY;
+   return -EINVAL;
+   }
+
+   /* It's not clear that there is any support for a keysize of 112.
+* If needed, the caller should make K1 == K3
+*/
+   ctx->u.des3.type = CCP_DES3_TYPE_168;
+   ctx->u.des3.mode = alg->mode;
+   ctx->u.des3.key_len = key_len;
+
+   memcpy(ctx->u.des3.key, key, key_len);
+   sg_init_one(>u.des3.key_sg, ctx->u.des3.key, key_len);
+
+   return 0;
+}
+
+static int ccp_des3_crypt(struct ablkcipher_request *req, bool encrypt)
+{
+   struct ccp_ctx *ctx = crypto_tfm_ctx(req->base.tfm);
+   struct ccp_des3_req_ctx *rctx = ablkcipher_request_ctx(req);
+   struct scatterlist *iv_sg = NULL;
+   unsigned int iv_len = 0;
+   int ret;
+
+   if (!ctx->u.des3.key_len)
+   return -EINVAL;
+
+   if (((ctx->u.des3.mode == CCP_DES3_MODE_ECB) ||
+(ctx->u.des3.mode == CCP_DES3_MODE_CBC)) &&
+   (req->nbytes & (DES3_EDE_BLOCK_SIZE - 1)))
+   return -EINVAL;
+
+   if (ctx->u.des3.mode != CCP_DES3_MODE_ECB) {
+   if (!req->info)
+   return -EINVAL;
+
+   memcpy(rctx->iv, req->info, DES3_EDE_BLOCK_SIZE);
+   iv_sg = >iv_sg;
+   iv_len = DES3_EDE_BLOCK_SIZE;
+   sg_init_one(iv_sg, rctx->iv, iv_len);
+   }
+
+   memset(>cmd, 0, sizeof(rctx->cmd));
+   INIT_LIST_HEAD(>cmd.entry);
+   rctx->cmd.engine = CCP_ENGINE_DES3;
+   rctx->cmd.u.des3.type = ctx->u.des3.type;
+   rctx->cmd.u.des3.mode = ctx->u.des3.mode;
+   rctx->cmd.u.des3.action = (encrypt)
+ ? CCP_DES3_ACTION_ENCRYPT
+ : CCP_DES3_ACTION_DECRYPT;
+   

[PATCH V3 1/3] crypto: ccp - Add SHA-2 384- and 512-bit support

2017-03-15 Thread Gary R Hook
Incorporate 384-bit and 512-bit hashing for a version 5 CCP
device


Signed-off-by: Gary R Hook 
---
 drivers/crypto/ccp/ccp-crypto-sha.c |   22 +++
 drivers/crypto/ccp/ccp-crypto.h |8 ++--
 drivers/crypto/ccp/ccp-ops.c|   72 +++
 include/linux/ccp.h |2 +
 4 files changed, 101 insertions(+), 3 deletions(-)

diff --git a/drivers/crypto/ccp/ccp-crypto-sha.c 
b/drivers/crypto/ccp/ccp-crypto-sha.c
index 84a652b..6b46eea 100644
--- a/drivers/crypto/ccp/ccp-crypto-sha.c
+++ b/drivers/crypto/ccp/ccp-crypto-sha.c
@@ -146,6 +146,12 @@ static int ccp_do_sha_update(struct ahash_request *req, 
unsigned int nbytes,
case CCP_SHA_TYPE_256:
rctx->cmd.u.sha.ctx_len = SHA256_DIGEST_SIZE;
break;
+   case CCP_SHA_TYPE_384:
+   rctx->cmd.u.sha.ctx_len = SHA384_DIGEST_SIZE;
+   break;
+   case CCP_SHA_TYPE_512:
+   rctx->cmd.u.sha.ctx_len = SHA512_DIGEST_SIZE;
+   break;
default:
/* Should never get here */
break;
@@ -393,6 +399,22 @@ struct ccp_sha_def {
.digest_size= SHA256_DIGEST_SIZE,
.block_size = SHA256_BLOCK_SIZE,
},
+   {
+   .version= CCP_VERSION(5, 0),
+   .name   = "sha384",
+   .drv_name   = "sha384-ccp",
+   .type   = CCP_SHA_TYPE_384,
+   .digest_size= SHA384_DIGEST_SIZE,
+   .block_size = SHA384_BLOCK_SIZE,
+   },
+   {
+   .version= CCP_VERSION(5, 0),
+   .name   = "sha512",
+   .drv_name   = "sha512-ccp",
+   .type   = CCP_SHA_TYPE_512,
+   .digest_size= SHA512_DIGEST_SIZE,
+   .block_size = SHA512_BLOCK_SIZE,
+   },
 };
 
 static int ccp_register_hmac_alg(struct list_head *head,
diff --git a/drivers/crypto/ccp/ccp-crypto.h b/drivers/crypto/ccp/ccp-crypto.h
index 8335b32..95cce27 100644
--- a/drivers/crypto/ccp/ccp-crypto.h
+++ b/drivers/crypto/ccp/ccp-crypto.h
@@ -137,9 +137,11 @@ struct ccp_aes_cmac_exp_ctx {
u8 buf[AES_BLOCK_SIZE];
 };
 
-/* SHA related defines */
-#define MAX_SHA_CONTEXT_SIZE   SHA256_DIGEST_SIZE
-#define MAX_SHA_BLOCK_SIZE SHA256_BLOCK_SIZE
+/* SHA-related defines
+ * These values must be large enough to accommodate any variant
+ */
+#define MAX_SHA_CONTEXT_SIZE   SHA512_DIGEST_SIZE
+#define MAX_SHA_BLOCK_SIZE SHA512_BLOCK_SIZE
 
 struct ccp_sha_ctx {
struct scatterlist opad_sg;
diff --git a/drivers/crypto/ccp/ccp-ops.c b/drivers/crypto/ccp/ccp-ops.c
index f1396c3..0d82080 100644
--- a/drivers/crypto/ccp/ccp-ops.c
+++ b/drivers/crypto/ccp/ccp-ops.c
@@ -41,6 +41,20 @@
cpu_to_be32(SHA256_H6), cpu_to_be32(SHA256_H7),
 };
 
+static const __be64 ccp_sha384_init[SHA512_DIGEST_SIZE / sizeof(__be64)] = {
+   cpu_to_be64(SHA384_H0), cpu_to_be64(SHA384_H1),
+   cpu_to_be64(SHA384_H2), cpu_to_be64(SHA384_H3),
+   cpu_to_be64(SHA384_H4), cpu_to_be64(SHA384_H5),
+   cpu_to_be64(SHA384_H6), cpu_to_be64(SHA384_H7),
+};
+
+static const __be64 ccp_sha512_init[SHA512_DIGEST_SIZE / sizeof(__be64)] = {
+   cpu_to_be64(SHA512_H0), cpu_to_be64(SHA512_H1),
+   cpu_to_be64(SHA512_H2), cpu_to_be64(SHA512_H3),
+   cpu_to_be64(SHA512_H4), cpu_to_be64(SHA512_H5),
+   cpu_to_be64(SHA512_H6), cpu_to_be64(SHA512_H7),
+};
+
 #defineCCP_NEW_JOBID(ccp)  ((ccp->vdata->version == CCP_VERSION(3, 
0)) ? \
ccp_gen_jobid(ccp) : 0)
 
@@ -955,6 +969,18 @@ static int ccp_run_sha_cmd(struct ccp_cmd_queue *cmd_q, 
struct ccp_cmd *cmd)
return -EINVAL;
block_size = SHA256_BLOCK_SIZE;
break;
+   case CCP_SHA_TYPE_384:
+   if (cmd_q->ccp->vdata->version < CCP_VERSION(4, 0)
+   || sha->ctx_len < SHA384_DIGEST_SIZE)
+   return -EINVAL;
+   block_size = SHA384_BLOCK_SIZE;
+   break;
+   case CCP_SHA_TYPE_512:
+   if (cmd_q->ccp->vdata->version < CCP_VERSION(4, 0)
+   || sha->ctx_len < SHA512_DIGEST_SIZE)
+   return -EINVAL;
+   block_size = SHA512_BLOCK_SIZE;
+   break;
default:
return -EINVAL;
}
@@ -1042,6 +1068,21 @@ static int ccp_run_sha_cmd(struct ccp_cmd_queue *cmd_q, 
struct ccp_cmd *cmd)
sb_count = 1;
ooffset = ioffset = 0;
break;
+   case CCP_SHA_TYPE_384:
+   digest_size = SHA384_DIGEST_SIZE;
+   init = (void *) ccp_sha384_init;
+   ctx_size = SHA512_DIGEST_SIZE;
+   sb_count = 2;
+   ioffset = 0;
+   ooffset = 2 * CCP_SB_BYTES - SHA384_DIGEST_SIZE;

Re: [PATCH] dt-bindings: rng: clocks property on omap_rng is optional

2017-03-15 Thread Rob Herring
On Tue, Mar 07, 2017 at 03:18:28PM +0100, Thomas Petazzoni wrote:
> Commit 52060836f79 ("dt-bindings: omap-rng: Document SafeXcel IP-76
> device variant") update the omap_rng Device Tree binding to add support
> for the IP-76 variation of the IP. As part of this change, a "clocks"
> property was added, but is indicated as "Required", while it is in fact
> "Optional": some SoCs do not require a clock for this IP block.
> 
> Fixes: 52060836f79 ("dt-bindings: omap-rng: Document SafeXcel IP-76 device 
> variant")
> Cc: 
> Signed-off-by: Thomas Petazzoni 
> ---
>  Documentation/devicetree/bindings/rng/omap_rng.txt | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/Documentation/devicetree/bindings/rng/omap_rng.txt 
> b/Documentation/devicetree/bindings/rng/omap_rng.txt
> index 4714772..20d435da 100644
> --- a/Documentation/devicetree/bindings/rng/omap_rng.txt
> +++ b/Documentation/devicetree/bindings/rng/omap_rng.txt
> @@ -12,6 +12,9 @@ Required properties:
>  - reg : Offset and length of the register set for the module
>  - interrupts : the interrupt number for the RNG module.
>   Used for "ti,omap4-rng" and "inside-secure,safexcel-eip76"
> +
> +Optional properties:
> +

Wouldn't just "for ? compatible only" be more correct?

>  - clocks: the trng clock source
>  
>  Example:
> -- 
> 2.7.4
> 


RE: [PATCH 1/4] crypto: powerpc - Factor out the core CRC vpmsum algorithm

2017-03-15 Thread David Laight
From: Linuxppc-dev Daniel Axtens
> Sent: 15 March 2017 12:38
> The core nuts and bolts of the crc32c vpmsum algorithm will
> also work for a number of other CRC algorithms with different
> polynomials. Factor out the function into a new asm file.
> 
> To handle multiple users of the function, a user simply
> provides constants, defines the name of their CRC function,
> and then #includes the core algorithm file.
...

While not part of this change, the unrolled loops look as though
they just destroy the cpu cache.
I'd like be convinced that anything does CRC over long enough buffers
to make it a gain at all.

With modern (not that modern now) superscalar cpus you can often
get the loop instructions 'for free'.
Sometimes pipelining the loop is needed to get full throughput.
Unlike the IP checksum, you don't even have to 'loop carry' the
cpu carry flag.

David



[PATCH 4/4] crypto: powerpc - Stress test for vpmsum implementations

2017-03-15 Thread Daniel Axtens
vpmsum implementations often don't kick in for short test vectors.
This is a simple test module that does a configurable number of
random tests, each up to 64kB and each with random offsets.

Both CRC-T10DIF and CRC32C are tested.

Cc: Anton Blanchard 
Signed-off-by: Daniel Axtens 

--

Not super fussy about the inclusion or otherwise of this - it was very
useful for debugging my code, and more tests are good :)

Also, I originally found the bug in Anton's CRC32c using this.

Tests pass on both BE 64 bit and LE 64 bit.
---
 arch/powerpc/crypto/Makefile  |   1 +
 arch/powerpc/crypto/crc-vpmsum_test.c | 137 ++
 crypto/Kconfig|   8 ++
 3 files changed, 146 insertions(+)
 create mode 100644 arch/powerpc/crypto/crc-vpmsum_test.c

diff --git a/arch/powerpc/crypto/Makefile b/arch/powerpc/crypto/Makefile
index e66aaf19764d..67eca3af9fc7 100644
--- a/arch/powerpc/crypto/Makefile
+++ b/arch/powerpc/crypto/Makefile
@@ -11,6 +11,7 @@ obj-$(CONFIG_CRYPTO_SHA1_PPC_SPE) += sha1-ppc-spe.o
 obj-$(CONFIG_CRYPTO_SHA256_PPC_SPE) += sha256-ppc-spe.o
 obj-$(CONFIG_CRYPTO_CRC32C_VPMSUM) += crc32c-vpmsum.o
 obj-$(CONFIG_CRYPTO_CRCT10DIF_VPMSUM) += crct10dif-vpmsum.o
+obj-$(CONFIG_CRYPTO_VPMSUM_TESTER) += crc-vpmsum_test.o
 
 aes-ppc-spe-y := aes-spe-core.o aes-spe-keys.o aes-tab-4k.o aes-spe-modes.o 
aes-spe-glue.o
 md5-ppc-y := md5-asm.o md5-glue.o
diff --git a/arch/powerpc/crypto/crc-vpmsum_test.c 
b/arch/powerpc/crypto/crc-vpmsum_test.c
new file mode 100644
index ..d58242557f33
--- /dev/null
+++ b/arch/powerpc/crypto/crc-vpmsum_test.c
@@ -0,0 +1,137 @@
+/*
+ * CRC vpmsum tester
+ * Copyright 2017 Daniel Axtens, IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+static unsigned long iterations = 1;
+
+#define MAX_CRC_LENGTH 65535
+
+
+static int __init crc_test_init(void)
+{
+   u16 crc16 = 0, verify16 = 0;
+   u32 crc32 = 0, verify32 = 0;
+   __le32 verify32le = 0;
+   unsigned char *data;
+   unsigned long i;
+   int ret;
+
+   struct crypto_shash *crct10dif_tfm;
+   struct crypto_shash *crc32c_tfm;
+   
+   if (!cpu_has_feature(CPU_FTR_ARCH_207S))
+   return -ENODEV;
+   
+   data = kmalloc(MAX_CRC_LENGTH, GFP_KERNEL);
+   if (!data)
+   return -ENOMEM;
+
+   crct10dif_tfm = crypto_alloc_shash("crct10dif", 0, 0);
+
+   if (IS_ERR(crct10dif_tfm)) {
+   pr_err("Error allocating crc-t10dif\n");
+   goto free_buf;
+   }
+
+   crc32c_tfm = crypto_alloc_shash("crc32c", 0, 0);
+
+   if (IS_ERR(crc32c_tfm)) {
+   pr_err("Error allocating crc32c\n");
+   goto free_16;
+   }
+
+   do {
+   SHASH_DESC_ON_STACK(crct10dif_shash, crct10dif_tfm);
+   SHASH_DESC_ON_STACK(crc32c_shash, crc32c_tfm);
+
+   crct10dif_shash->tfm = crct10dif_tfm;
+   ret = crypto_shash_init(crct10dif_shash);
+
+   if (ret) {
+   pr_err("Error initing crc-t10dif\n");
+   goto free_32;
+   }
+   
+
+   crc32c_shash->tfm = crc32c_tfm;
+   ret = crypto_shash_init(crc32c_shash);
+
+   if (ret) {
+   pr_err("Error initing crc32c\n");
+   goto free_32;
+   }
+   
+   pr_info("crc-vpmsum_test begins, %lu iterations\n", iterations);
+   for (i=0; i

[PATCH 2/4] crypto: powerpc - Re-enable non-REFLECTed CRCs

2017-03-15 Thread Daniel Axtens
When CRC32c was included in the kernel, Anton ripped out
the #ifdefs around reflected polynomials, because CRC32c
is always reflected. However, not all CRCs use reflection
so we'd like to make it optional.

Restore the REFLECT parts from Anton's original CRC32
implementation (https://github.com/antonblanchard/crc32-vpmsum)

That implementation is available under GPLv2+, so we're OK
from a licensing point of view:
https://github.com/antonblanchard/crc32-vpmsum/blob/master/LICENSE.TXT

As CRC32c requires REFLECT, add that #define.

Cc: Anton Blanchard 
Signed-off-by: Daniel Axtens 

---

I compared the disassembly of the CRC32c module on LE before and
after the change, and verified that they were the same.

I verified that the crypto self-tests still pass on LE and BE, and
my tests in patch 4 still pass as well.
---
 arch/powerpc/crypto/crc32-vpmsum_core.S | 31 ++-
 arch/powerpc/crypto/crc32c-vpmsum_asm.S |  1 +
 2 files changed, 31 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/crypto/crc32-vpmsum_core.S 
b/arch/powerpc/crypto/crc32-vpmsum_core.S
index 629244ef170e..87fabf4d391a 100644
--- a/arch/powerpc/crypto/crc32-vpmsum_core.S
+++ b/arch/powerpc/crypto/crc32-vpmsum_core.S
@@ -35,7 +35,9 @@
 
.text
 
-#if defined(__BIG_ENDIAN__)
+#if defined(__BIG_ENDIAN__) && defined(REFLECT)
+#define BYTESWAP_DATA
+#elif defined(__LITTLE_ENDIAN__) && !defined(REFLECT)
 #define BYTESWAP_DATA
 #else
 #undef BYTESWAP_DATA
@@ -108,7 +110,11 @@ FUNC_START(CRC_FUNCTION_NAME)
/* Get the initial value into v8 */
vxorv8,v8,v8
MTVRD(v8, R3)
+#ifdef REFLECT
vsldoi  v8,zeroes,v8,8  /* shift into bottom 32 bits */
+#else
+   vsldoi  v8,v8,zeroes,4  /* shift into top 32 bits */
+#endif
 
 #ifdef BYTESWAP_DATA
addis   r3,r2,.byteswap_constant@toc@ha
@@ -354,6 +360,7 @@ FUNC_START(CRC_FUNCTION_NAME)
vxorv6,v6,v14
vxorv7,v7,v15
 
+#ifdef REFLECT
/*
 * vpmsumd produces a 96 bit result in the least significant bits
 * of the register. Since we are bit reflected we have to shift it
@@ -368,6 +375,7 @@ FUNC_START(CRC_FUNCTION_NAME)
vsldoi  v5,v5,zeroes,4
vsldoi  v6,v6,zeroes,4
vsldoi  v7,v7,zeroes,4
+#endif
 
/* xor with last 1024 bits */
lvx v8,0,r4
@@ -511,13 +519,33 @@ FUNC_START(CRC_FUNCTION_NAME)
vsldoi  v1,v0,v0,8
vxorv0,v0,v1/* xor two 64 bit results together */
 
+#ifdef REFLECT
/* shift left one bit */
vspltisb v1,1
vsl v0,v0,v1
+#endif
 
vandv0,v0,mask_64bit
+#ifndef REFLECT
+   /*
+* Now for the Barrett reduction algorithm. The idea is to calculate q,
+* the multiple of our polynomial that we need to subtract. By
+* doing the computation 2x bits higher (ie 64 bits) and shifting the
+* result back down 2x bits, we round down to the nearest multiple.
+*/
+   VPMSUMD(v1,v0,const1)   /* ma */
+   vsldoi  v1,zeroes,v1,8  /* q = floor(ma/(2^64)) */
+   VPMSUMD(v1,v1,const2)   /* qn */
+   vxorv0,v0,v1/* a - qn, subtraction is xor in GF(2) */
 
/*
+* Get the result into r3. We need to shift it left 8 bytes:
+* V0 [ 0 1 2 X ]
+* V0 [ 0 X 2 3 ]
+*/
+   vsldoi  v0,v0,zeroes,8  /* shift result into top 64 bits */
+#else
+   /*
 * The reflected version of Barrett reduction. Instead of bit
 * reflecting our data (which is expensive to do), we bit reflect our
 * constants and our algorithm, which means the intermediate data in
@@ -537,6 +565,7 @@ FUNC_START(CRC_FUNCTION_NAME)
 * V0 [ 0 X 2 3 ]
 */
vsldoi  v0,v0,zeroes,4  /* shift result into top 64 bits of */
+#endif
 
/* Get it into r3 */
MFVRD(R3, v0)
diff --git a/arch/powerpc/crypto/crc32c-vpmsum_asm.S 
b/arch/powerpc/crypto/crc32c-vpmsum_asm.S
index c0d080caefc1..d2bea48051a0 100644
--- a/arch/powerpc/crypto/crc32c-vpmsum_asm.S
+++ b/arch/powerpc/crypto/crc32c-vpmsum_asm.S
@@ -842,4 +842,5 @@
.octa 0x000105ec76f1
 
 #define CRC_FUNCTION_NAME __crc32c_vpmsum
+#define REFLECT
 #include "crc32-vpmsum_core.S"
-- 
2.9.3



[PATCH 3/4] crypto: powerpc - Add CRC-T10DIF acceleration

2017-03-15 Thread Daniel Axtens
T10DIF is a CRC16 used heavily in NVMe.

It turns out we can accelerate it with a CRC32 library and a few
little tricks.

Provide the accelerator based the refactored CRC32 code.

Cc: Anton Blanchard 
Thanks-to: Hong Bo Peng 
Signed-off-by: Daniel Axtens 
---
 arch/powerpc/crypto/Makefile|   2 +
 arch/powerpc/crypto/crct10dif-vpmsum_asm.S  | 850 
 arch/powerpc/crypto/crct10dif-vpmsum_glue.c | 125 
 crypto/Kconfig  |   9 +
 4 files changed, 986 insertions(+)
 create mode 100644 arch/powerpc/crypto/crct10dif-vpmsum_asm.S
 create mode 100644 arch/powerpc/crypto/crct10dif-vpmsum_glue.c

diff --git a/arch/powerpc/crypto/Makefile b/arch/powerpc/crypto/Makefile
index 87f40454bad3..e66aaf19764d 100644
--- a/arch/powerpc/crypto/Makefile
+++ b/arch/powerpc/crypto/Makefile
@@ -10,6 +10,7 @@ obj-$(CONFIG_CRYPTO_SHA1_PPC) += sha1-powerpc.o
 obj-$(CONFIG_CRYPTO_SHA1_PPC_SPE) += sha1-ppc-spe.o
 obj-$(CONFIG_CRYPTO_SHA256_PPC_SPE) += sha256-ppc-spe.o
 obj-$(CONFIG_CRYPTO_CRC32C_VPMSUM) += crc32c-vpmsum.o
+obj-$(CONFIG_CRYPTO_CRCT10DIF_VPMSUM) += crct10dif-vpmsum.o
 
 aes-ppc-spe-y := aes-spe-core.o aes-spe-keys.o aes-tab-4k.o aes-spe-modes.o 
aes-spe-glue.o
 md5-ppc-y := md5-asm.o md5-glue.o
@@ -17,3 +18,4 @@ sha1-powerpc-y := sha1-powerpc-asm.o sha1.o
 sha1-ppc-spe-y := sha1-spe-asm.o sha1-spe-glue.o
 sha256-ppc-spe-y := sha256-spe-asm.o sha256-spe-glue.o
 crc32c-vpmsum-y := crc32c-vpmsum_asm.o crc32c-vpmsum_glue.o
+crct10dif-vpmsum-y := crct10dif-vpmsum_asm.o crct10dif-vpmsum_glue.o
diff --git a/arch/powerpc/crypto/crct10dif-vpmsum_asm.S 
b/arch/powerpc/crypto/crct10dif-vpmsum_asm.S
new file mode 100644
index ..5e3d81a0af1b
--- /dev/null
+++ b/arch/powerpc/crypto/crct10dif-vpmsum_asm.S
@@ -0,0 +1,850 @@
+/*
+ * Calculate a CRC T10DIF  with vpmsum acceleration
+ *
+ * Constants generated by crc32-vpmsum, available at
+ * https://github.com/antonblanchard/crc32-vpmsum
+ *
+ * crc32-vpmsum is
+ * Copyright (C) 2015 Anton Blanchard , IBM
+ * and is available under the GPL v2 or later.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+   .section.rodata
+.balign 16
+
+.byteswap_constant:
+   /* byte reverse permute constant */
+   .octa 0x0F0E0D0C0B0A09080706050403020100
+
+.constants:
+
+   /* Reduce 262144 kbits to 1024 bits */
+   /* x^261184 mod p(x), x^261120 mod p(x) */
+   .octa 0x56d35255
+
+   /* x^260160 mod p(x), x^260096 mod p(x) */
+   .octa 0xee67a1e4
+
+   /* x^259136 mod p(x), x^259072 mod p(x) */
+   .octa 0x60834ad1
+
+   /* x^258112 mod p(x), x^258048 mod p(x) */
+   .octa 0x8cfe9ab4
+
+   /* x^257088 mod p(x), x^257024 mod p(x) */
+   .octa 0x3e93fdb5
+
+   /* x^256064 mod p(x), x^256000 mod p(x) */
+   .octa 0x3c204548
+
+   /* x^255040 mod p(x), x^254976 mod p(x) */
+   .octa 0xb1fc8d69
+
+   /* x^254016 mod p(x), x^253952 mod p(x) */
+   .octa 0xf82b24ad
+
+   /* x^252992 mod p(x), x^252928 mod p(x) */
+   .octa 0x44429f1a
+
+   /* x^251968 mod p(x), x^251904 mod p(x) */
+   .octa 0xe88c66ec
+
+   /* x^250944 mod p(x), x^250880 mod p(x) */
+   .octa 0x385cc87d
+
+   /* x^249920 mod p(x), x^249856 mod p(x) */
+   .octa 0x3227c8ff
+
+   /* x^248896 mod p(x), x^248832 mod p(x) */
+   .octa 0xa9a93344
+
+   /* x^247872 mod p(x), x^247808 mod p(x) */
+   .octa 0xabaa66eb
+
+   /* x^246848 mod p(x), x^246784 mod p(x) */
+   .octa 0x1ac3c4ef
+
+   /* x^245824 mod p(x), x^245760 mod p(x) */
+   .octa 0x63f056f3
+
+   /* x^244800 mod p(x), x^244736 mod p(x) */
+   .octa 0x32cc0205
+
+   /* x^243776 mod p(x), x^243712 mod p(x) */
+   .octa 0xf8b5568e
+
+   /* x^242752 mod p(x), x^242688 mod p(x) */
+   .octa 0x8db16429
+
+   /* x^241728 mod p(x), x^241664 mod p(x) */
+   .octa 0x59ca6b66
+
+   /* x^240704 mod p(x), x^240640 mod p(x) */
+   .octa 0x5f5c18f8
+
+   /* x^239680 mod p(x), x^239616 mod p(x) */
+   .octa 0x61afb609
+
+   /* x^238656 mod p(x), x^238592 mod p(x) */
+   .octa 0xe29e099a
+
+  

[PATCH 1/4] crypto: powerpc - Factor out the core CRC vpmsum algorithm

2017-03-15 Thread Daniel Axtens
The core nuts and bolts of the crc32c vpmsum algorithm will
also work for a number of other CRC algorithms with different
polynomials. Factor out the function into a new asm file.

To handle multiple users of the function, a user simply
provides constants, defines the name of their CRC function,
and then #includes the core algorithm file.

Cc: Anton Blanchard 
Signed-off-by: Daniel Axtens 

--

It's possible at this point to argue that the address
of the constant tables should be passed in to the function,
rather than doing this somewhat unconventional #include.

However, we're about to add further #ifdef's back into the core
that will be provided by the encapsulaing code, and which couldn't
be done as a variable without performance loss.
---
 arch/powerpc/crypto/crc32-vpmsum_core.S | 726 
 arch/powerpc/crypto/crc32c-vpmsum_asm.S | 714 +--
 2 files changed, 729 insertions(+), 711 deletions(-)
 create mode 100644 arch/powerpc/crypto/crc32-vpmsum_core.S

diff --git a/arch/powerpc/crypto/crc32-vpmsum_core.S 
b/arch/powerpc/crypto/crc32-vpmsum_core.S
new file mode 100644
index ..629244ef170e
--- /dev/null
+++ b/arch/powerpc/crypto/crc32-vpmsum_core.S
@@ -0,0 +1,726 @@
+/*
+ * Core of the accelerated CRC algorithm.
+ * In your file, define the constants and CRC_FUNCTION_NAME
+ * Then include this file.
+ *
+ * Calculate the checksum of data that is 16 byte aligned and a multiple of
+ * 16 bytes.
+ *
+ * The first step is to reduce it to 1024 bits. We do this in 8 parallel
+ * chunks in order to mask the latency of the vpmsum instructions. If we
+ * have more than 32 kB of data to checksum we repeat this step multiple
+ * times, passing in the previous 1024 bits.
+ *
+ * The next step is to reduce the 1024 bits to 64 bits. This step adds
+ * 32 bits of 0s to the end - this matches what a CRC does. We just
+ * calculate constants that land the data in this 32 bits.
+ *
+ * We then use fixed point Barrett reduction to compute a mod n over GF(2)
+ * for n = CRC using POWER8 instructions. We use x = 32.
+ *
+ * http://en.wikipedia.org/wiki/Barrett_reduction
+ *
+ * Copyright (C) 2015 Anton Blanchard , IBM
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+*/
+   
+#include 
+#include 
+
+#define MAX_SIZE   32768
+
+   .text
+
+#if defined(__BIG_ENDIAN__)
+#define BYTESWAP_DATA
+#else
+#undef BYTESWAP_DATA
+#endif
+
+#define off16  r25
+#define off32  r26
+#define off48  r27
+#define off64  r28
+#define off80  r29
+#define off96  r30
+#define off112 r31
+
+#define const1 v24
+#define const2 v25
+
+#define byteswap   v26
+#definemask_32bit  v27
+#definemask_64bit  v28
+#define zeroes v29
+
+#ifdef BYTESWAP_DATA
+#define VPERM(A, B, C, D) vpermA, B, C, D
+#else
+#define VPERM(A, B, C, D)
+#endif
+
+/* unsigned int CRC_FUNCTION_NAME(unsigned int crc, void *p, unsigned long 
len) */
+FUNC_START(CRC_FUNCTION_NAME)
+   std r31,-8(r1)
+   std r30,-16(r1)
+   std r29,-24(r1)
+   std r28,-32(r1)
+   std r27,-40(r1)
+   std r26,-48(r1)
+   std r25,-56(r1)
+
+   li  off16,16
+   li  off32,32
+   li  off48,48
+   li  off64,64
+   li  off80,80
+   li  off96,96
+   li  off112,112
+   li  r0,0
+
+   /* Enough room for saving 10 non volatile VMX registers */
+   subir6,r1,56+10*16
+   subir7,r1,56+2*16
+
+   stvxv20,0,r6
+   stvxv21,off16,r6
+   stvxv22,off32,r6
+   stvxv23,off48,r6
+   stvxv24,off64,r6
+   stvxv25,off80,r6
+   stvxv26,off96,r6
+   stvxv27,off112,r6
+   stvxv28,0,r7
+   stvxv29,off16,r7
+
+   mr  r10,r3
+
+   vxorzeroes,zeroes,zeroes
+   vspltisw v0,-1
+
+   vsldoi  mask_32bit,zeroes,v0,4
+   vsldoi  mask_64bit,zeroes,v0,8
+
+   /* Get the initial value into v8 */
+   vxorv8,v8,v8
+   MTVRD(v8, R3)
+   vsldoi  v8,zeroes,v8,8  /* shift into bottom 32 bits */
+
+#ifdef BYTESWAP_DATA
+   addis   r3,r2,.byteswap_constant@toc@ha
+   addir3,r3,.byteswap_constant@toc@l
+
+   lvx byteswap,0,r3
+   addir3,r3,16
+#endif
+
+   cmpdi   r5,256
+   blt .Lshort
+
+   rldicr  r6,r5,0,56
+
+   /* Checksum in blocks of MAX_SIZE */
+1: lis r7,MAX_SIZE@h
+   ori r7,r7,MAX_SIZE@l
+   mr  r9,r7
+   cmpdr6,r7
+   bgt 2f
+   mr  r7,r6
+2: subfr6,r7,r6
+
+   /* our main loop does 128 bytes at a time */
+   srdi

Re: crypto: deadlock between crypto_alg_sem/rtnl_mutex/genl_mutex

2017-03-15 Thread Sowmini Varadhan
On (03/15/17 10:08), Dmitry Vyukov wrote:
> After I've applied the patch these reports stopped to happen, and I
> have not seem any other reports that look relevant.
> However, it there was one, but it looks like a different issue and it
> was probably masked by massive amounts of original deadlock reports:

Yes, this looks like a valid deadlock.

I think there may be some ->dumpit callbacks that take the rtnl_lock
and do not unlock it before return, e.g., I see nl80211_dump_interface()
doing this at 

   2612 rtnl_lock();
   2613 if (!cb->args[2]) {
 :
   2619 ret = nl80211_dump_wiphy_parse(skb, cb, );
   2620 if (ret)
   2621 return ret;

afaict, nl80211_dump_wiphy_parse does not itself do rtnl_unlock on error.


If that's the case then we'd run into the circular locking dependancy
flagged by lockdep. 

Disclaimer: I did not check every single ->dumpit, there may be more
than one of these..






[PATCH v2 01/14] crypto: sun4i-ss - simplify optional reset handling

2017-03-15 Thread Philipp Zabel
As of commit bb475230b8e5 ("reset: make optional functions really
optional"), the reset framework API calls use NULL pointers to describe
optional, non-present reset controls.

This allows to return errors from devm_reset_control_get_optional and to
call reset_control_(de)assert unconditionally.

Signed-off-by: Philipp Zabel 
---
 drivers/crypto/sunxi-ss/sun4i-ss-core.c | 25 +++--
 1 file changed, 11 insertions(+), 14 deletions(-)

diff --git a/drivers/crypto/sunxi-ss/sun4i-ss-core.c 
b/drivers/crypto/sunxi-ss/sun4i-ss-core.c
index 3ac6c6c4ad18e..e310e311d23ea 100644
--- a/drivers/crypto/sunxi-ss/sun4i-ss-core.c
+++ b/drivers/crypto/sunxi-ss/sun4i-ss-core.c
@@ -258,10 +258,11 @@ static int sun4i_ss_probe(struct platform_device *pdev)
 
ss->reset = devm_reset_control_get_optional(>dev, "ahb");
if (IS_ERR(ss->reset)) {
-   if (PTR_ERR(ss->reset) == -EPROBE_DEFER)
-   return PTR_ERR(ss->reset);
-   dev_info(>dev, "no reset control found\n");
-   ss->reset = NULL;
+   err = PTR_ERR(ss->reset);
+   if (err == -EPROBE_DEFER)
+   return err;
+   dev_err(>dev, "Cannot get reset control err=%d\n", err);
+   return err;
}
 
/* Enable both clocks */
@@ -287,12 +288,10 @@ static int sun4i_ss_probe(struct platform_device *pdev)
}
 
/* Deassert reset if we have a reset control */
-   if (ss->reset) {
-   err = reset_control_deassert(ss->reset);
-   if (err) {
-   dev_err(>dev, "Cannot deassert reset control\n");
-   goto error_clk;
-   }
+   err = reset_control_deassert(ss->reset);
+   if (err) {
+   dev_err(>dev, "Cannot deassert reset control\n");
+   goto error_clk;
}
 
/*
@@ -372,8 +371,7 @@ static int sun4i_ss_probe(struct platform_device *pdev)
break;
}
}
-   if (ss->reset)
-   reset_control_assert(ss->reset);
+   reset_control_assert(ss->reset);
 error_clk:
clk_disable_unprepare(ss->ssclk);
 error_ssclk:
@@ -398,8 +396,7 @@ static int sun4i_ss_remove(struct platform_device *pdev)
}
 
writel(0, ss->base + SS_CTL);
-   if (ss->reset)
-   reset_control_assert(ss->reset);
+   reset_control_assert(ss->reset);
clk_disable_unprepare(ss->busclk);
clk_disable_unprepare(ss->ssclk);
return 0;
-- 
2.11.0



Re: crypto: deadlock between crypto_alg_sem/rtnl_mutex/genl_mutex

2017-03-15 Thread Dmitry Vyukov
On Tue, Mar 14, 2017 at 4:25 PM, Sowmini Varadhan
 wrote:
> On (03/14/17 09:14), Dmitry Vyukov wrote:
>> Another one now involving rds_tcp_listen_stop
>:
>> kworker/u4:1/19 is trying to acquire lock:
>>  (sk_lock-AF_INET){+.+.+.}, at: [] lock_sock
>> include/net/sock.h:1460 [inline]
>>  (sk_lock-AF_INET){+.+.+.}, at: []
>> rds_tcp_listen_stop+0x5c/0x150 net/rds/tcp_listen.c:288
>>
>> but task is already holding lock:
>>  (rtnl_mutex){+.+.+.}, at: [] rtnl_lock+0x17/0x20
>> net/core/rtnetlink.c:70
>
> Is this also a false positive?
>
> genl_lock_dumpit takes the genl_lock and then waits on the rtnl_lock
> (e.g., out of tipc_nl_bearer_dump).
>
> netdev_run_todo takes the rtnl_lock and then wants lock_sock()
> for the TCP/IPv4 socket.
>
> Why is lockdep seeing a circular dependancy here? Same pattern
> seems to be happening  for
>   http://www.spinics.net/lists/netdev/msg423368.html
> and maybe also http://www.spinics.net/lists/netdev/msg423323.html?
>
> --Sowmini
>
>> Chain exists of:
>>   sk_lock-AF_INET --> genl_mutex --> rtnl_mutex
>>
>>  Possible unsafe locking scenario:
>>
>>CPU0CPU1
>>
>>   lock(rtnl_mutex);
>>lock(genl_mutex);
>>lock(rtnl_mutex);
>>   lock(sk_lock-AF_INET);
>>
>>  *** DEADLOCK ***
>>
>> 4 locks held by kworker/u4:1/19:
>>  #0:  ("%s""netns"){.+.+.+}, at: []
>> __write_once_size include/linux/compiler.h:283 [inline]
>>  #0:  ("%s""netns"){.+.+.+}, at: [] atomic64_set
>> arch/x86/include/asm/atomic64_64.h:33 [inline]
>>  #0:  ("%s""netns"){.+.+.+}, at: [] atomic_long_set
>> include/asm-generic/atomic-long.h:56 [inline]
>>  #0:  ("%s""netns"){.+.+.+}, at: [] set_work_data
>> kernel/workqueue.c:617 [inline]
>>  #0:  ("%s""netns"){.+.+.+}, at: []
>> set_work_pool_and_clear_pending kernel/workqueue.c:644 [inline]
>>  #0:  ("%s""netns"){.+.+.+}, at: []
>> process_one_work+0xab3/0x1c10 kernel/workqueue.c:2089
>>  #1:  (net_cleanup_work){+.+.+.}, at: []
>> process_one_work+0xb07/0x1c10 kernel/workqueue.c:2093
>>  #2:  (net_mutex){+.+.+.}, at: []
>> cleanup_net+0x22b/0xa90 net/core/net_namespace.c:429
>>  #3:  (rtnl_mutex){+.+.+.}, at: []
>> rtnl_lock+0x17/0x20 net/core/rtnetlink.c:70



After I've applied the patch these reports stopped to happen, and I
have not seem any other reports that look relevant.
However, it there was one, but it looks like a different issue and it
was probably masked by massive amounts of original deadlock reports:


[ INFO: possible circular locking dependency detected ]
4.10.0+ #29 Not tainted
---
syz-executor5/29222 is trying to acquire lock:
 (genl_mutex){+.+.+.}, at: [] genl_lock
net/netlink/genetlink.c:32 [inline]
 (genl_mutex){+.+.+.}, at: []
genl_family_rcv_msg+0xdae/0x1040 net/netlink/genetlink.c:547

but task is already holding lock:
 (rtnl_mutex){+.+.+.}, at: [] rtnl_lock+0x17/0x20
net/core/rtnetlink.c:70

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #1 (rtnl_mutex){+.+.+.}:
   validate_chain kernel/locking/lockdep.c:2267 [inline]
   __lock_acquire+0x2149/0x3430 kernel/locking/lockdep.c:3340
   lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3755
   __mutex_lock_common kernel/locking/mutex.c:756 [inline]
   __mutex_lock+0x172/0x1730 kernel/locking/mutex.c:893
   mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:908
   rtnl_lock+0x17/0x20 net/core/rtnetlink.c:70
   nl80211_dump_wiphy+0x45/0x6d0 net/wireless/nl80211.c:1946
   genl_lock_dumpit+0x68/0x90 net/netlink/genetlink.c:479
   netlink_dump+0x54d/0xd40 net/netlink/af_netlink.c:2168
   __netlink_dump_start+0x4e5/0x760 net/netlink/af_netlink.c:2258
   genl_family_rcv_msg+0xd9d/0x1040 net/netlink/genetlink.c:546
   genl_rcv_msg+0xa6/0x140 net/netlink/genetlink.c:620
   netlink_rcv_skb+0x2ab/0x390 net/netlink/af_netlink.c:2339
   genl_rcv+0x28/0x40 net/netlink/genetlink.c:631
   netlink_unicast_kernel net/netlink/af_netlink.c:1272 [inline]
   netlink_unicast+0x514/0x730 net/netlink/af_netlink.c:1298
   netlink_sendmsg+0xa9f/0xe50 net/netlink/af_netlink.c:1844
   sock_sendmsg_nosec net/socket.c:633 [inline]
   sock_sendmsg+0xca/0x110 net/socket.c:643
   ___sys_sendmsg+0x8fa/0x9f0 net/socket.c:1985
   __sys_sendmsg+0x138/0x300 net/socket.c:2019
   SYSC_sendmsg net/socket.c:2030 [inline]
   SyS_sendmsg+0x2d/0x50 net/socket.c:2026
   do_syscall_64+0x2e8/0x930 arch/x86/entry/common.c:281
   return_from_SYSCALL_64+0x0/0x7a

-> #0 (genl_mutex){+.+.+.}:
   check_prev_add kernel/locking/lockdep.c:1830 [inline]
   check_prevs_add+0xa8f/0x19f0 kernel/locking/lockdep.c:1940
   validate_chain kernel/locking/lockdep.c:2267 [inline]
   __lock_acquire+0x2149/0x3430 kernel/locking/lockdep.c:3340
   lock_acquire+0x2a1/0x630 

Re: [PATCH v6 0/4] Broadcom SBA RAID support

2017-03-15 Thread Anup Patel
On Wed, Mar 15, 2017 at 12:18 AM, Shaohua Li  wrote:
> On Tue, Mar 14, 2017 at 09:56:35AM -0700, Dan Williams wrote:
>> On Mon, Mar 6, 2017 at 1:43 AM, Anup Patel  wrote:
>> > The Broadcom SBA RAID is a stream-based device which provides
>> > RAID5/6 offload.
>> >
>> > It requires a SoC specific ring manager (such as Broadcom FlexRM
>> > ring manager) to provide ring-based programming interface. Due to
>> > this, the Broadcom SBA RAID driver (mailbox client) implements
>> > DMA device having one DMA channel using a set of mailbox channels
>> > provided by Broadcom SoC specific ring manager driver (mailbox
>> > controller).
>> >
>> > The Broadcom SBA RAID hardware requires PQ disk position instead
>> > of PQ disk coefficient. To address this, we have added raid_gflog
>> > table which will help driver to convert PQ disk coefficient to PQ
>> > disk position.
>> >
>> > This patchset is based on Linux-4.11-rc1 and depends on patchset
>> > "[PATCH v5 0/2] Broadcom FlexRM ring manager support"
>> >
>> > It is also available at sba-raid-v6 branch of
>> > https://github.com/Broadcom/arm64-linux.git
>> >
>> [..]
>> >
>> > Anup Patel (4):
>> >   lib/raid6: Add log-of-2 table for RAID6 HW requiring disk position
>> >   async_tx: Fix DMA_PREP_FENCE usage in do_async_gen_syndrome()
>> >   dmaengine: Add Broadcom SBA RAID driver
>> >   dt-bindings: Add DT bindings document for Broadcom SBA RAID driver
>>
>> For the dmaengine and async_tx changes:
>>
>> Acked-by: Dan Williams 
>>
>> The raid change should get an ack from Shaohua.
>
> For the raid6 part:
>
> Acked-by: Shaohua Li 

Thanks Shaohua ...

Regards,
Anup


Re: [PATCH v6 0/4] Broadcom SBA RAID support

2017-03-15 Thread Anup Patel
On Tue, Mar 14, 2017 at 10:26 PM, Dan Williams  wrote:
> On Mon, Mar 6, 2017 at 1:43 AM, Anup Patel  wrote:
>> The Broadcom SBA RAID is a stream-based device which provides
>> RAID5/6 offload.
>>
>> It requires a SoC specific ring manager (such as Broadcom FlexRM
>> ring manager) to provide ring-based programming interface. Due to
>> this, the Broadcom SBA RAID driver (mailbox client) implements
>> DMA device having one DMA channel using a set of mailbox channels
>> provided by Broadcom SoC specific ring manager driver (mailbox
>> controller).
>>
>> The Broadcom SBA RAID hardware requires PQ disk position instead
>> of PQ disk coefficient. To address this, we have added raid_gflog
>> table which will help driver to convert PQ disk coefficient to PQ
>> disk position.
>>
>> This patchset is based on Linux-4.11-rc1 and depends on patchset
>> "[PATCH v5 0/2] Broadcom FlexRM ring manager support"
>>
>> It is also available at sba-raid-v6 branch of
>> https://github.com/Broadcom/arm64-linux.git
>>
> [..]
>>
>> Anup Patel (4):
>>   lib/raid6: Add log-of-2 table for RAID6 HW requiring disk position
>>   async_tx: Fix DMA_PREP_FENCE usage in do_async_gen_syndrome()
>>   dmaengine: Add Broadcom SBA RAID driver
>>   dt-bindings: Add DT bindings document for Broadcom SBA RAID driver
>
> For the dmaengine and async_tx changes:
>
> Acked-by: Dan Williams 
>

Thanks Dan ...

Regards,
Anup


Crypto Fixes for 4.11

2017-03-15 Thread Herbert Xu
Hi Linus:

This push fixes the following issues:

- Self-test failure of crc32c on powerpc.
- Regressions of ecb(aes) when used with xts/lrw in s5p-sss.
- A number of bugs in the omap RNG driver.


Please pull from

git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6.git linus


Daniel Axtens (1):
  crypto: powerpc - Fix initialisation of crc32c context

Krzysztof Kozlowski (2):
  crypto: s5p-sss - Fix completing crypto request in IRQ handler
  crypto: s5p-sss - Fix spinlock recursion on LRW(AES)

Thomas Petazzoni (3):
  hwrng: omap - write registers after enabling the clock
  hwrng: omap - use devm_clk_get() instead of of_clk_get()
  hwrng: omap - Do not access INTMASK_REG on EIP76

 arch/powerpc/crypto/crc32c-vpmsum_glue.c |2 +-
 drivers/char/hw_random/omap-rng.c|   16 +++-
 drivers/crypto/s5p-sss.c |  132 +++---
 3 files changed, 100 insertions(+), 50 deletions(-)

Thanks,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt