[PATCH v5 1/4] lib/raid6: Add log-of-2 table for RAID6 HW requiring disk position

2017-02-15 Thread Anup Patel
The raid6_gfexp table represents {2}^n values for 0 <= n < 256. The
Linux async_tx framework pass values from raid6_gfexp as coefficients
for each source to prep_dma_pq() callback of DMA channel with PQ
capability. This creates problem for RAID6 offload engines (such as
Broadcom SBA) which take disk position (i.e. log of {2}) instead of
multiplicative cofficients from raid6_gfexp table.

This patch adds raid6_gflog table having log-of-2 value for any given
x such that 0 <= x < 256. For any given disk coefficient x, the
corresponding disk position is given by raid6_gflog[x]. The RAID6
offload engine driver can use this newly added raid6_gflog table to
get disk position from multiplicative coefficient.

Signed-off-by: Anup Patel 
Reviewed-by: Scott Branden 
Reviewed-by: Ray Jui 
---
 include/linux/raid/pq.h |  1 +
 lib/raid6/mktables.c| 20 
 2 files changed, 21 insertions(+)

diff --git a/include/linux/raid/pq.h b/include/linux/raid/pq.h
index 4d57bba..30f9453 100644
--- a/include/linux/raid/pq.h
+++ b/include/linux/raid/pq.h
@@ -142,6 +142,7 @@ int raid6_select_algo(void);
 extern const u8 raid6_gfmul[256][256] __attribute__((aligned(256)));
 extern const u8 raid6_vgfmul[256][32] __attribute__((aligned(256)));
 extern const u8 raid6_gfexp[256]  __attribute__((aligned(256)));
+extern const u8 raid6_gflog[256]  __attribute__((aligned(256)));
 extern const u8 raid6_gfinv[256]  __attribute__((aligned(256)));
 extern const u8 raid6_gfexi[256]  __attribute__((aligned(256)));
 
diff --git a/lib/raid6/mktables.c b/lib/raid6/mktables.c
index 39787db..e824d08 100644
--- a/lib/raid6/mktables.c
+++ b/lib/raid6/mktables.c
@@ -125,6 +125,26 @@ int main(int argc, char *argv[])
printf("EXPORT_SYMBOL(raid6_gfexp);\n");
printf("#endif\n");
 
+   /* Compute log-of-2 table */
+   printf("\nconst u8 __attribute__((aligned(256)))\n"
+  "raid6_gflog[256] =\n" "{\n");
+   for (i = 0; i < 256; i += 8) {
+   printf("\t");
+   for (j = 0; j < 8; j++) {
+   v = 255;
+   for (k = 0; k < 256; k++)
+   if (exptbl[k] == (i + j)) {
+   v = k;
+   break;
+   }
+   printf("0x%02x,%c", v, (j == 7) ? '\n' : ' ');
+   }
+   }
+   printf("};\n");
+   printf("#ifdef __KERNEL__\n");
+   printf("EXPORT_SYMBOL(raid6_gflog);\n");
+   printf("#endif\n");
+
/* Compute inverse table x^-1 == x^254 */
printf("\nconst u8 __attribute__((aligned(256)))\n"
   "raid6_gfinv[256] =\n" "{\n");
-- 
2.7.4



[PATCH v5 3/4] dmaengine: Add Broadcom SBA RAID driver

2017-02-15 Thread Anup Patel
The Broadcom stream buffer accelerator (SBA) provides offloading
capabilities for RAID operations. This SBA offload engine is
accessible via Broadcom SoC specific ring manager.

This patch adds Broadcom SBA RAID driver which provides one
DMA device with RAID capabilities using one or more Broadcom
SoC specific ring manager channels. The SBA RAID driver in its
current shape implements memcpy, xor, and pq operations.

Signed-off-by: Anup Patel 
Reviewed-by: Ray Jui 
---
 drivers/dma/Kconfig|   14 +
 drivers/dma/Makefile   |1 +
 drivers/dma/bcm-sba-raid.c | 1785 
 3 files changed, 1800 insertions(+)
 create mode 100644 drivers/dma/bcm-sba-raid.c

diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig
index 263495d..3d23597 100644
--- a/drivers/dma/Kconfig
+++ b/drivers/dma/Kconfig
@@ -99,6 +99,20 @@ config AXI_DMAC
  controller is often used in Analog Device's reference designs for FPGA
  platforms.
 
+config BCM_SBA_RAID
+   tristate "Broadcom SBA RAID engine support"
+   depends on (ARM64 && MAILBOX && RAID6_PQ) || COMPILE_TEST
+   select DMA_ENGINE
+   select DMA_ENGINE_RAID
+   select ASYNC_TX_DISABLE_XOR_VAL_DMA
+   select ASYNC_TX_DISABLE_PQ_VAL_DMA
+   default ARCH_BCM_IPROC
+   help
+ Enable support for Broadcom SBA RAID Engine. The SBA RAID
+ engine is available on most of the Broadcom iProc SoCs. It
+ has the capability to offload memcpy, xor and pq computation
+ for raid5/6.
+
 config COH901318
bool "ST-Ericsson COH901318 DMA support"
select DMA_ENGINE
diff --git a/drivers/dma/Makefile b/drivers/dma/Makefile
index a4fa336..ba96bdd 100644
--- a/drivers/dma/Makefile
+++ b/drivers/dma/Makefile
@@ -17,6 +17,7 @@ obj-$(CONFIG_AMCC_PPC440SPE_ADMA) += ppc4xx/
 obj-$(CONFIG_AT_HDMAC) += at_hdmac.o
 obj-$(CONFIG_AT_XDMAC) += at_xdmac.o
 obj-$(CONFIG_AXI_DMAC) += dma-axi-dmac.o
+obj-$(CONFIG_BCM_SBA_RAID) += bcm-sba-raid.o
 obj-$(CONFIG_COH901318) += coh901318.o coh901318_lli.o
 obj-$(CONFIG_DMA_BCM2835) += bcm2835-dma.o
 obj-$(CONFIG_DMA_JZ4740) += dma-jz4740.o
diff --git a/drivers/dma/bcm-sba-raid.c b/drivers/dma/bcm-sba-raid.c
new file mode 100644
index 000..d6b927b
--- /dev/null
+++ b/drivers/dma/bcm-sba-raid.c
@@ -0,0 +1,1785 @@
+/*
+ * Copyright (C) 2017 Broadcom
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+/*
+ * Broadcom SBA RAID Driver
+ *
+ * The Broadcom stream buffer accelerator (SBA) provides offloading
+ * capabilities for RAID operations. The SBA offload engine is accessible
+ * via Broadcom SoC specific ring manager. Two or more offload engines
+ * can share same Broadcom SoC specific ring manager due to this Broadcom
+ * SoC specific ring manager driver is implemented as a mailbox controller
+ * driver and offload engine drivers are implemented as mallbox clients.
+ *
+ * Typically, Broadcom SoC specific ring manager will implement larger
+ * number of hardware rings over one or more SBA hardware devices. By
+ * design, the internal buffer size of SBA hardware device is limited
+ * but all offload operations supported by SBA can be broken down into
+ * multiple small size requests and executed parallely on multiple SBA
+ * hardware devices for achieving high through-put.
+ *
+ * The Broadcom SBA RAID driver does not require any register programming
+ * except submitting request to SBA hardware device via mailbox channels.
+ * This driver implements a DMA device with one DMA channel using a set
+ * of mailbox channels provided by Broadcom SoC specific ring manager
+ * driver. To exploit parallelism (as described above), all DMA request
+ * coming to SBA RAID DMA channel are broken down to smaller requests
+ * and submitted to multiple mailbox channels in round-robin fashion.
+ * For having more SBA DMA channels, we can create more SBA device nodes
+ * in Broadcom SoC specific DTS based on number of hardware rings supported
+ * by Broadcom SoC ring manager.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "dmaengine.h"
+
+/* SBA command related defines */
+#define SBA_TYPE_SHIFT 48
+#define SBA_TYPE_MASK  GENMASK(1, 0)
+#define SBA_TYPE_A 0x0
+#define SBA_TYPE_B 0x2
+#define SBA_TYPE_C 0x3
+#define SBA_USER_DEF_SHIFT 32
+#define SBA_USER_DEF_MASK  GENMASK(15, 0)
+#define SBA_R_MDATA_SHIFT  24
+#define SBA_R_MDATA_MASK   GENMASK(7, 0)
+#define SBA_C_MDATA_MS_SHIFT 

[PATCH v5 4/4] dt-bindings: Add DT bindings document for Broadcom SBA RAID driver

2017-02-15 Thread Anup Patel
This patch adds the DT bindings document for newly added Broadcom
SBA RAID driver.

Signed-off-by: Anup Patel 
Reviewed-by: Ray Jui 
Reviewed-by: Scott Branden 
---
 .../devicetree/bindings/dma/brcm,iproc-sba.txt | 29 ++
 1 file changed, 29 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/dma/brcm,iproc-sba.txt

diff --git a/Documentation/devicetree/bindings/dma/brcm,iproc-sba.txt 
b/Documentation/devicetree/bindings/dma/brcm,iproc-sba.txt
new file mode 100644
index 000..092913a
--- /dev/null
+++ b/Documentation/devicetree/bindings/dma/brcm,iproc-sba.txt
@@ -0,0 +1,29 @@
+* Broadcom SBA RAID engine
+
+Required properties:
+- compatible: Should be one of the following
+ "brcm,iproc-sba"
+ "brcm,iproc-sba-v2"
+  The "brcm,iproc-sba" has support for only 6 PQ coefficients
+  The "brcm,iproc-sba-v2" has support for only 30 PQ coefficients
+- mboxes: List of phandle and mailbox channel specifiers
+
+Example:
+
+raid_mbox: mbox@6740 {
+   ...
+   #mbox-cells = <3>;
+   ...
+};
+
+raid0 {
+   compatible = "brcm,iproc-sba-v2";
+   mboxes = <_mbox 0 0x1 0x>,
+<_mbox 1 0x1 0x>,
+<_mbox 2 0x1 0x>,
+<_mbox 3 0x1 0x>,
+<_mbox 4 0x1 0x>,
+<_mbox 5 0x1 0x>,
+<_mbox 6 0x1 0x>,
+<_mbox 7 0x1 0x>;
+};
-- 
2.7.4



[PATCH v5 0/4] Broadcom SBA RAID support

2017-02-15 Thread Anup Patel
The Broadcom SBA RAID is a stream-based device which provides
RAID5/6 offload.

It requires a SoC specific ring manager (such as Broadcom FlexRM
ring manager) to provide ring-based programming interface. Due to
this, the Broadcom SBA RAID driver (mailbox client) implements
DMA device having one DMA channel using a set of mailbox channels
provided by Broadcom SoC specific ring manager driver (mailbox
controller).

The Broadcom SBA RAID hardware requires PQ disk position instead
of PQ disk coefficient. To address this, we have added raid_gflog
table which will help driver to convert PQ disk coefficient to PQ
disk position.

This patchset is based on Linux-4.10-rc2 and depends on patchset
"[PATCH v4 0/2] Broadcom FlexRM ring manager support"

It is also available at sba-raid-v5 branch of
https://github.com/Broadcom/arm64-linux.git

Changes since v4:
 - Removed dependency of bcm-sba-raid driver on kconfig opton
   ASYNC_TX_ENABLE_CHANNEL_SWITCH
 - Select kconfig options ASYNC_TX_DISABLE_XOR_VAL_DMA and
   ASYNC_TX_DISABLE_PQ_VAL_DMA for bcm-sba-raid driver
 - Implemented device_prep_dma_interrupt() using dummy 8-byte
   copy operation so that the dma_async_device_register() can
   set DMA_ASYNC_TX capability for the DMA device provided
   by bcm-sba-raid driver

Changes since v3:
 - Replaced SBA_ENC() with sba_cmd_enc() inline function
 - Use list_first_entry_or_null() wherever possible
 - Remove unwanted brances around loops wherever possible
 - Use lockdep_assert_held() where required

Changes since v2:
 - Droped patch to handle DMA devices having support for fewer
   PQ coefficients in Linux Async Tx
 - Added work-around in bcm-sba-raid driver to handle unsupported
   PQ coefficients using multiple SBA requests

Changes since v1:
 - Droped patch to add mbox_channel_device() API
 - Used GENMASK and BIT macros wherever possible in bcm-sba-raid driver
 - Replaced C_MDATA macros with static inline functions in
   bcm-sba-raid driver
 - Removed sba_alloc_chan_resources() callback in bcm-sba-raid driver
 - Used dev_err() instead of dev_info() wherever applicable
 - Removed call to sba_issue_pending() from sba_tx_submit() in
   bcm-sba-raid driver
 - Implemented SBA request chaning for handling (len > sba->req_size)
   in bcm-sba-raid driver
 - Implemented device_terminate_all() callback in bcm-sba-raid driver

Anup Patel (4):
  lib/raid6: Add log-of-2 table for RAID6 HW requiring disk position
  async_tx: Fix DMA_PREP_FENCE usage in do_async_gen_syndrome()
  dmaengine: Add Broadcom SBA RAID driver
  dt-bindings: Add DT bindings document for Broadcom SBA RAID driver

 .../devicetree/bindings/dma/brcm,iproc-sba.txt |   29 +
 crypto/async_tx/async_pq.c |5 +-
 drivers/dma/Kconfig|   14 +
 drivers/dma/Makefile   |1 +
 drivers/dma/bcm-sba-raid.c | 1785 
 include/linux/raid/pq.h|1 +
 lib/raid6/mktables.c   |   20 +
 7 files changed, 1852 insertions(+), 3 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/dma/brcm,iproc-sba.txt
 create mode 100644 drivers/dma/bcm-sba-raid.c

-- 
2.7.4



[PATCH v5 2/4] async_tx: Fix DMA_PREP_FENCE usage in do_async_gen_syndrome()

2017-02-15 Thread Anup Patel
The DMA_PREP_FENCE is to be used when preparing Tx descriptor if output
of Tx descriptor is to be used by next/dependent Tx descriptor.

The DMA_PREP_FENSE will not be set correctly in do_async_gen_syndrome()
when calling dma->device_prep_dma_pq() under following conditions:
1. ASYNC_TX_FENCE not set in submit->flags
2. DMA_PREP_FENCE not set in dma_flags
3. src_cnt (= (disks - 2)) is greater than dma_maxpq(dma, dma_flags)

This patch fixes DMA_PREP_FENCE usage in do_async_gen_syndrome() taking
inspiration from do_async_xor() implementation.

Signed-off-by: Anup Patel 
Reviewed-by: Ray Jui 
Reviewed-by: Scott Branden 
---
 crypto/async_tx/async_pq.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/crypto/async_tx/async_pq.c b/crypto/async_tx/async_pq.c
index f83de99..56bd612 100644
--- a/crypto/async_tx/async_pq.c
+++ b/crypto/async_tx/async_pq.c
@@ -62,9 +62,6 @@ do_async_gen_syndrome(struct dma_chan *chan,
dma_addr_t dma_dest[2];
int src_off = 0;
 
-   if (submit->flags & ASYNC_TX_FENCE)
-   dma_flags |= DMA_PREP_FENCE;
-
while (src_cnt > 0) {
submit->flags = flags_orig;
pq_src_cnt = min(src_cnt, dma_maxpq(dma, dma_flags));
@@ -83,6 +80,8 @@ do_async_gen_syndrome(struct dma_chan *chan,
if (cb_fn_orig)
dma_flags |= DMA_PREP_INTERRUPT;
}
+   if (submit->flags & ASYNC_TX_FENCE)
+   dma_flags |= DMA_PREP_FENCE;
 
/* Drivers force forward progress in case they can not provide
 * a descriptor
-- 
2.7.4



[PATCH 3/3] crypto: ccp - Add 3DES function on v5 CCPs

2017-02-15 Thread Gary R Hook
Wire up support for Triple DES in ECB mode.

Signed-off-by: Gary R Hook 
---
 drivers/crypto/ccp/Makefile  |1 
 drivers/crypto/ccp/ccp-crypto-des3.c |  254 ++
 drivers/crypto/ccp/ccp-crypto-main.c |   10 +
 drivers/crypto/ccp/ccp-crypto.h  |   22 +++
 drivers/crypto/ccp/ccp-dev-v3.c  |1 
 drivers/crypto/ccp/ccp-dev-v5.c  |   54 +++
 drivers/crypto/ccp/ccp-dev.h |   14 ++
 drivers/crypto/ccp/ccp-ops.c |  198 +++
 include/linux/ccp.h  |   57 +++-
 9 files changed, 608 insertions(+), 3 deletions(-)
 create mode 100644 drivers/crypto/ccp/ccp-crypto-des3.c

diff --git a/drivers/crypto/ccp/Makefile b/drivers/crypto/ccp/Makefile
index fd77225..563594a 100644
--- a/drivers/crypto/ccp/Makefile
+++ b/drivers/crypto/ccp/Makefile
@@ -14,4 +14,5 @@ ccp-crypto-objs := ccp-crypto-main.o \
   ccp-crypto-aes-xts.o \
   ccp-crypto-rsa.o \
   ccp-crypto-aes-galois.o \
+  ccp-crypto-des3.o \
   ccp-crypto-sha.o
diff --git a/drivers/crypto/ccp/ccp-crypto-des3.c 
b/drivers/crypto/ccp/ccp-crypto-des3.c
new file mode 100644
index 000..5af7347
--- /dev/null
+++ b/drivers/crypto/ccp/ccp-crypto-des3.c
@@ -0,0 +1,254 @@
+/*
+ * AMD Cryptographic Coprocessor (CCP) DES3 crypto API support
+ *
+ * Copyright (C) 2016 Advanced Micro Devices, Inc.
+ *
+ * Author: Gary R Hook 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "ccp-crypto.h"
+
+static int ccp_des3_complete(struct crypto_async_request *async_req, int ret)
+{
+   struct ablkcipher_request *req = ablkcipher_request_cast(async_req);
+   struct ccp_ctx *ctx = crypto_tfm_ctx(req->base.tfm);
+   struct ccp_des3_req_ctx *rctx = ablkcipher_request_ctx(req);
+
+   if (ret)
+   return ret;
+
+   if (ctx->u.des3.mode != CCP_DES3_MODE_ECB)
+   memcpy(req->info, rctx->iv, DES3_EDE_BLOCK_SIZE);
+
+   return 0;
+}
+
+static int ccp_des3_setkey(struct crypto_ablkcipher *tfm, const u8 *key,
+   unsigned int key_len)
+{
+   struct ccp_ctx *ctx = crypto_tfm_ctx(crypto_ablkcipher_tfm(tfm));
+   struct ccp_crypto_ablkcipher_alg *alg =
+   ccp_crypto_ablkcipher_alg(crypto_ablkcipher_tfm(tfm));
+   u32 *flags = >base.crt_flags;
+
+
+   /* From des_generic.c:
+*
+* RFC2451:
+*   If the first two or last two independent 64-bit keys are
+*   equal (k1 == k2 or k2 == k3), then the DES3 operation is simply the
+*   same as DES.  Implementers MUST reject keys that exhibit this
+*   property.
+*/
+   const u32 *K = (const u32 *)key;
+
+   if (unlikely(!((K[0] ^ K[2]) | (K[1] ^ K[3])) ||
+!((K[2] ^ K[4]) | (K[3] ^ K[5]))) &&
+(*flags & CRYPTO_TFM_REQ_WEAK_KEY)) {
+   *flags |= CRYPTO_TFM_RES_WEAK_KEY;
+   return -EINVAL;
+   }
+
+   /* It's not clear that there is any support for a keysize of 112.
+* If needed, the caller should make K1 == K3
+*/
+   ctx->u.des3.type = CCP_DES3_TYPE_168;
+   ctx->u.des3.mode = alg->mode;
+   ctx->u.des3.key_len = key_len;
+
+   memcpy(ctx->u.des3.key, key, key_len);
+   sg_init_one(>u.des3.key_sg, ctx->u.des3.key, key_len);
+
+   return 0;
+}
+
+static int ccp_des3_crypt(struct ablkcipher_request *req, bool encrypt)
+{
+   struct ccp_ctx *ctx = crypto_tfm_ctx(req->base.tfm);
+   struct ccp_des3_req_ctx *rctx = ablkcipher_request_ctx(req);
+   struct scatterlist *iv_sg = NULL;
+   unsigned int iv_len = 0;
+   int ret;
+
+   if (!ctx->u.des3.key_len)
+   return -EINVAL;
+
+   if (((ctx->u.des3.mode == CCP_DES3_MODE_ECB) ||
+(ctx->u.des3.mode == CCP_DES3_MODE_CBC)) &&
+   (req->nbytes & (DES3_EDE_BLOCK_SIZE - 1)))
+   return -EINVAL;
+
+   if (ctx->u.des3.mode != CCP_DES3_MODE_ECB) {
+   if (!req->info)
+   return -EINVAL;
+
+   memcpy(rctx->iv, req->info, DES3_EDE_BLOCK_SIZE);
+   iv_sg = >iv_sg;
+   iv_len = DES3_EDE_BLOCK_SIZE;
+   sg_init_one(iv_sg, rctx->iv, iv_len);
+   }
+
+   memset(>cmd, 0, sizeof(rctx->cmd));
+   INIT_LIST_HEAD(>cmd.entry);
+   rctx->cmd.engine = CCP_ENGINE_DES3;
+   rctx->cmd.u.des3.type = ctx->u.des3.type;
+   rctx->cmd.u.des3.mode = ctx->u.des3.mode;
+   rctx->cmd.u.des3.action = (encrypt)
+ ? CCP_DES3_ACTION_ENCRYPT
+ : CCP_DES3_ACTION_DECRYPT;
+   

[PATCH 2/3] crypto: ccp - Add support for AES GCM on v5 CCPs

2017-02-15 Thread Gary R Hook
A version 5 device provides the primitive commands
required for AES GCM. This patch adds support for
en/decryption.

Signed-off-by: Gary R Hook 
---
 drivers/crypto/ccp/Makefile|2 
 drivers/crypto/ccp/ccp-crypto-aes-galois.c |  257 
 drivers/crypto/ccp/ccp-crypto-main.c   |   20 ++
 drivers/crypto/ccp/ccp-crypto.h|   14 ++
 drivers/crypto/ccp/ccp-ops.c   |  252 +++
 include/linux/ccp.h|9 +
 6 files changed, 554 insertions(+)
 create mode 100644 drivers/crypto/ccp/ccp-crypto-aes-galois.c

diff --git a/drivers/crypto/ccp/Makefile b/drivers/crypto/ccp/Makefile
index 346ceb8..fd77225 100644
--- a/drivers/crypto/ccp/Makefile
+++ b/drivers/crypto/ccp/Makefile
@@ -12,4 +12,6 @@ ccp-crypto-objs := ccp-crypto-main.o \
   ccp-crypto-aes.o \
   ccp-crypto-aes-cmac.o \
   ccp-crypto-aes-xts.o \
+  ccp-crypto-rsa.o \
+  ccp-crypto-aes-galois.o \
   ccp-crypto-sha.o
diff --git a/drivers/crypto/ccp/ccp-crypto-aes-galois.c 
b/drivers/crypto/ccp/ccp-crypto-aes-galois.c
new file mode 100644
index 000..8bc18c9
--- /dev/null
+++ b/drivers/crypto/ccp/ccp-crypto-aes-galois.c
@@ -0,0 +1,257 @@
+/*
+ * AMD Cryptographic Coprocessor (CCP) AES GCM crypto API support
+ *
+ * Copyright (C) 2016 Advanced Micro Devices, Inc.
+ *
+ * Author: Gary R Hook 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "ccp-crypto.h"
+
+#defineAES_GCM_IVSIZE  12
+
+static int ccp_aes_gcm_complete(struct crypto_async_request *async_req, int 
ret)
+{
+   return ret;
+}
+
+static int ccp_aes_gcm_setkey(struct crypto_aead *tfm, const u8 *key,
+ unsigned int key_len)
+{
+   struct ccp_ctx *ctx = crypto_aead_ctx(tfm);
+
+   switch (key_len) {
+   case AES_KEYSIZE_128:
+   ctx->u.aes.type = CCP_AES_TYPE_128;
+   break;
+   case AES_KEYSIZE_192:
+   ctx->u.aes.type = CCP_AES_TYPE_192;
+   break;
+   case AES_KEYSIZE_256:
+   ctx->u.aes.type = CCP_AES_TYPE_256;
+   break;
+   default:
+   crypto_aead_set_flags(tfm, CRYPTO_TFM_RES_BAD_KEY_LEN);
+   return -EINVAL;
+   }
+
+   ctx->u.aes.mode = CCP_AES_MODE_GCM;
+   ctx->u.aes.key_len = key_len;
+
+   memcpy(ctx->u.aes.key, key, key_len);
+   sg_init_one(>u.aes.key_sg, ctx->u.aes.key, key_len);
+
+   return 0;
+}
+
+static int ccp_aes_gcm_setauthsize(struct crypto_aead *tfm,
+  unsigned int authsize)
+{
+   return 0;
+}
+
+static int ccp_aes_gcm_crypt(struct aead_request *req, bool encrypt)
+{
+   struct crypto_aead *tfm = crypto_aead_reqtfm(req);
+   struct ccp_ctx *ctx = crypto_aead_ctx(tfm);
+   struct ccp_aes_req_ctx *rctx = aead_request_ctx(req);
+   struct scatterlist *iv_sg = NULL;
+   unsigned int iv_len = 0;
+   int i;
+   int ret = 0;
+
+   if (!ctx->u.aes.key_len)
+   return -EINVAL;
+
+   if (ctx->u.aes.mode != CCP_AES_MODE_GCM)
+   return -EINVAL;
+
+   if (!req->iv)
+   return -EINVAL;
+
+   /*
+* 5 parts:
+*   plaintext/ciphertext input
+*   AAD
+*   key
+*   IV
+*   Destination+tag buffer
+*/
+
+   /* According to the way AES GCM has been implemented here,
+* per RFC 4106 it seems, the provided IV is fixed at 12 bytes,
+* occupies the beginning of the IV array. Write a 32-bit
+* integer after that (bytes 13-16) with a value of "1".
+*/
+   memcpy(rctx->iv, req->iv, AES_GCM_IVSIZE);
+   for (i = 0; i < 3; i++)
+   rctx->iv[i + AES_GCM_IVSIZE] = 0;
+   rctx->iv[AES_BLOCK_SIZE - 1] = 1;
+
+   /* Set up a scatterlist for the IV */
+   iv_sg = >iv_sg;
+   iv_len = AES_BLOCK_SIZE;
+   sg_init_one(iv_sg, rctx->iv, iv_len);
+
+   /* The AAD + plaintext are concatenated in the src buffer */
+   memset(>cmd, 0, sizeof(rctx->cmd));
+   INIT_LIST_HEAD(>cmd.entry);
+   rctx->cmd.engine = CCP_ENGINE_AES;
+   rctx->cmd.u.aes.type = ctx->u.aes.type;
+   rctx->cmd.u.aes.mode = ctx->u.aes.mode;
+   rctx->cmd.u.aes.action =
+   (encrypt) ? CCP_AES_ACTION_ENCRYPT : CCP_AES_ACTION_DECRYPT;
+   rctx->cmd.u.aes.key = >u.aes.key_sg;
+   rctx->cmd.u.aes.key_len = ctx->u.aes.key_len;
+   rctx->cmd.u.aes.iv = iv_sg;
+   rctx->cmd.u.aes.iv_len = iv_len;
+   rctx->cmd.u.aes.src = req->src;
+   

[PATCH 0/3] Support new function in the newer CCP

2017-02-15 Thread Gary R Hook
The following series implements new function in a version 5
coprocessor. New features are:
 - Support for SHA-2 384-bit and 512-bit hashing
 - Support for AES GCM encryption
 - Support for 3DES encryption

---

Gary R Hook (3):
  crypto: ccp - Add SHA-2 384-/512-/bit support
  crypto: ccp - Add support for AES GCM on v5 CCPs
  crypto: ccp - Add 3DES function on v5 CCPs


 drivers/crypto/ccp/Makefile|3 
 drivers/crypto/ccp/ccp-crypto-aes-galois.c |  257 ++
 drivers/crypto/ccp/ccp-crypto-des3.c   |  254 ++
 drivers/crypto/ccp/ccp-crypto-main.c   |   30 ++
 drivers/crypto/ccp/ccp-crypto-sha.c|   22 +
 drivers/crypto/ccp/ccp-crypto.h|   44 ++
 drivers/crypto/ccp/ccp-dev-v3.c|1 
 drivers/crypto/ccp/ccp-dev-v5.c|   54 +++
 drivers/crypto/ccp/ccp-dev.h   |   14 +
 drivers/crypto/ccp/ccp-ops.c   |  522 
 include/linux/ccp.h|   68 
 11 files changed, 1263 insertions(+), 6 deletions(-)
 create mode 100644 drivers/crypto/ccp/ccp-crypto-aes-galois.c
 create mode 100644 drivers/crypto/ccp/ccp-crypto-des3.c

--
I'm pretty sure donuts would help.


[PATCH 1/3] crypto: ccp - Add SHA-2 384-/512-bit support

2017-02-15 Thread Gary R Hook
Incorporate 384-bit and 512-bit hashing for a version 5 CCP
device


Signed-off-by: Gary R Hook 
---
 drivers/crypto/ccp/ccp-crypto-sha.c |   22 +++
 drivers/crypto/ccp/ccp-crypto.h |8 ++--
 drivers/crypto/ccp/ccp-ops.c|   72 +++
 include/linux/ccp.h |2 +
 4 files changed, 101 insertions(+), 3 deletions(-)

diff --git a/drivers/crypto/ccp/ccp-crypto-sha.c 
b/drivers/crypto/ccp/ccp-crypto-sha.c
index 84a652b..6b46eea 100644
--- a/drivers/crypto/ccp/ccp-crypto-sha.c
+++ b/drivers/crypto/ccp/ccp-crypto-sha.c
@@ -146,6 +146,12 @@ static int ccp_do_sha_update(struct ahash_request *req, 
unsigned int nbytes,
case CCP_SHA_TYPE_256:
rctx->cmd.u.sha.ctx_len = SHA256_DIGEST_SIZE;
break;
+   case CCP_SHA_TYPE_384:
+   rctx->cmd.u.sha.ctx_len = SHA384_DIGEST_SIZE;
+   break;
+   case CCP_SHA_TYPE_512:
+   rctx->cmd.u.sha.ctx_len = SHA512_DIGEST_SIZE;
+   break;
default:
/* Should never get here */
break;
@@ -393,6 +399,22 @@ struct ccp_sha_def {
.digest_size= SHA256_DIGEST_SIZE,
.block_size = SHA256_BLOCK_SIZE,
},
+   {
+   .version= CCP_VERSION(5, 0),
+   .name   = "sha384",
+   .drv_name   = "sha384-ccp",
+   .type   = CCP_SHA_TYPE_384,
+   .digest_size= SHA384_DIGEST_SIZE,
+   .block_size = SHA384_BLOCK_SIZE,
+   },
+   {
+   .version= CCP_VERSION(5, 0),
+   .name   = "sha512",
+   .drv_name   = "sha512-ccp",
+   .type   = CCP_SHA_TYPE_512,
+   .digest_size= SHA512_DIGEST_SIZE,
+   .block_size = SHA512_BLOCK_SIZE,
+   },
 };
 
 static int ccp_register_hmac_alg(struct list_head *head,
diff --git a/drivers/crypto/ccp/ccp-crypto.h b/drivers/crypto/ccp/ccp-crypto.h
index 8335b32..95cce27 100644
--- a/drivers/crypto/ccp/ccp-crypto.h
+++ b/drivers/crypto/ccp/ccp-crypto.h
@@ -137,9 +137,11 @@ struct ccp_aes_cmac_exp_ctx {
u8 buf[AES_BLOCK_SIZE];
 };
 
-/* SHA related defines */
-#define MAX_SHA_CONTEXT_SIZE   SHA256_DIGEST_SIZE
-#define MAX_SHA_BLOCK_SIZE SHA256_BLOCK_SIZE
+/* SHA-related defines
+ * These values must be large enough to accommodate any variant
+ */
+#define MAX_SHA_CONTEXT_SIZE   SHA512_DIGEST_SIZE
+#define MAX_SHA_BLOCK_SIZE SHA512_BLOCK_SIZE
 
 struct ccp_sha_ctx {
struct scatterlist opad_sg;
diff --git a/drivers/crypto/ccp/ccp-ops.c b/drivers/crypto/ccp/ccp-ops.c
index f1396c3..0d82080 100644
--- a/drivers/crypto/ccp/ccp-ops.c
+++ b/drivers/crypto/ccp/ccp-ops.c
@@ -41,6 +41,20 @@
cpu_to_be32(SHA256_H6), cpu_to_be32(SHA256_H7),
 };
 
+static const __be64 ccp_sha384_init[SHA512_DIGEST_SIZE / sizeof(__be64)] = {
+   cpu_to_be64(SHA384_H0), cpu_to_be64(SHA384_H1),
+   cpu_to_be64(SHA384_H2), cpu_to_be64(SHA384_H3),
+   cpu_to_be64(SHA384_H4), cpu_to_be64(SHA384_H5),
+   cpu_to_be64(SHA384_H6), cpu_to_be64(SHA384_H7),
+};
+
+static const __be64 ccp_sha512_init[SHA512_DIGEST_SIZE / sizeof(__be64)] = {
+   cpu_to_be64(SHA512_H0), cpu_to_be64(SHA512_H1),
+   cpu_to_be64(SHA512_H2), cpu_to_be64(SHA512_H3),
+   cpu_to_be64(SHA512_H4), cpu_to_be64(SHA512_H5),
+   cpu_to_be64(SHA512_H6), cpu_to_be64(SHA512_H7),
+};
+
 #defineCCP_NEW_JOBID(ccp)  ((ccp->vdata->version == CCP_VERSION(3, 
0)) ? \
ccp_gen_jobid(ccp) : 0)
 
@@ -955,6 +969,18 @@ static int ccp_run_sha_cmd(struct ccp_cmd_queue *cmd_q, 
struct ccp_cmd *cmd)
return -EINVAL;
block_size = SHA256_BLOCK_SIZE;
break;
+   case CCP_SHA_TYPE_384:
+   if (cmd_q->ccp->vdata->version < CCP_VERSION(4, 0)
+   || sha->ctx_len < SHA384_DIGEST_SIZE)
+   return -EINVAL;
+   block_size = SHA384_BLOCK_SIZE;
+   break;
+   case CCP_SHA_TYPE_512:
+   if (cmd_q->ccp->vdata->version < CCP_VERSION(4, 0)
+   || sha->ctx_len < SHA512_DIGEST_SIZE)
+   return -EINVAL;
+   block_size = SHA512_BLOCK_SIZE;
+   break;
default:
return -EINVAL;
}
@@ -1042,6 +1068,21 @@ static int ccp_run_sha_cmd(struct ccp_cmd_queue *cmd_q, 
struct ccp_cmd *cmd)
sb_count = 1;
ooffset = ioffset = 0;
break;
+   case CCP_SHA_TYPE_384:
+   digest_size = SHA384_DIGEST_SIZE;
+   init = (void *) ccp_sha384_init;
+   ctx_size = SHA512_DIGEST_SIZE;
+   sb_count = 2;
+   ioffset = 0;
+   ooffset = 2 * CCP_SB_BYTES - SHA384_DIGEST_SIZE;

[PATCH v8 2/5] lib/decompress_unlz4: Change module to work with new LZ4 module version

2017-02-15 Thread Sven Schmidt
Update the unlz4 wrapper to work with the updated LZ4 kernel module
version.

Signed-off-by: Sven Schmidt <4ssch...@informatik.uni-hamburg.de>
---
 lib/decompress_unlz4.c | 13 -
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/lib/decompress_unlz4.c b/lib/decompress_unlz4.c
index 036fc88..1b0baf3 100644
--- a/lib/decompress_unlz4.c
+++ b/lib/decompress_unlz4.c
@@ -72,7 +72,7 @@ STATIC inline int INIT unlz4(u8 *input, long in_len,
error("NULL input pointer and missing fill function");
goto exit_1;
} else {
-   inp = large_malloc(lz4_compressbound(uncomp_chunksize));
+   inp = large_malloc(LZ4_compressBound(uncomp_chunksize));
if (!inp) {
error("Could not allocate input buffer");
goto exit_1;
@@ -136,7 +136,7 @@ STATIC inline int INIT unlz4(u8 *input, long in_len,
inp += 4;
size -= 4;
} else {
-   if (chunksize > lz4_compressbound(uncomp_chunksize)) {
+   if (chunksize > LZ4_compressBound(uncomp_chunksize)) {
error("chunk length is longer than allocated");
goto exit_2;
}
@@ -152,11 +152,14 @@ STATIC inline int INIT unlz4(u8 *input, long in_len,
out_len -= dest_len;
} else
dest_len = out_len;
-   ret = lz4_decompress(inp, , outp, dest_len);
+
+   ret = LZ4_decompress_fast(inp, outp, dest_len);
+   chunksize = ret;
 #else
dest_len = uncomp_chunksize;
-   ret = lz4_decompress_unknownoutputsize(inp, chunksize, outp,
-   _len);
+
+   ret = LZ4_decompress_safe(inp, outp, chunksize, dest_len);
+   dest_len = ret;
 #endif
if (ret < 0) {
error("Decoding failed");
-- 
2.1.4



[PATCH v8 5/5] lib/lz4: Remove back-compat wrappers

2017-02-15 Thread Sven Schmidt
Remove the functions introduced as wrappers for providing backwards
compatibility to the prior LZ4 version.  They're not needed anymore since
there's no callers left.

Signed-off-by: Sven Schmidt <4ssch...@informatik.uni-hamburg.de>
---
 include/linux/lz4.h  | 69 
 lib/lz4/lz4_compress.c   | 22 ---
 lib/lz4/lz4_decompress.c | 42 -
 lib/lz4/lz4hc_compress.c | 23 
 4 files changed, 156 deletions(-)

diff --git a/include/linux/lz4.h b/include/linux/lz4.h
index 1b0f8ca..394e3d9 100644
--- a/include/linux/lz4.h
+++ b/include/linux/lz4.h
@@ -173,18 +173,6 @@ static inline int LZ4_compressBound(size_t isize)
 }
 
 /**
- * lz4_compressbound() - For backwards compatibility; see LZ4_compressBound
- * @isize: Size of the input data
- *
- * Return: Max. size LZ4 may output in a "worst case" szenario
- * (data not compressible)
- */
-static inline int lz4_compressbound(size_t isize)
-{
-   return LZ4_COMPRESSBOUND(isize);
-}
-
-/**
  * LZ4_compress_default() - Compress data from source to dest
  * @source: source address of the original data
  * @dest: output buffer address of the compressed data
@@ -257,20 +245,6 @@ int LZ4_compress_fast(const char *source, char *dest, int 
inputSize,
 int LZ4_compress_destSize(const char *source, char *dest, int *sourceSizePtr,
int targetDestSize, void *wrkmem);
 
-/*
- * lz4_compress() - For backward compatibility, see LZ4_compress_default
- * @src: source address of the original data
- * @src_len: size of the original data
- * @dst: output buffer address of the compressed data. This requires 'dst'
- * of size LZ4_COMPRESSBOUND
- * @dst_len: is the output size, which is returned after compress done
- * @workmem: address of the working memory.
- *
- * Return: Success if return 0, Error if return < 0
- */
-int lz4_compress(const unsigned char *src, size_t src_len, unsigned char *dst,
-   size_t *dst_len, void *wrkmem);
-
 /*-
  * Decompression Functions
  **/
@@ -346,34 +320,6 @@ int LZ4_decompress_safe(const char *source, char *dest, 
int compressedSize,
 int LZ4_decompress_safe_partial(const char *source, char *dest,
int compressedSize, int targetOutputSize, int maxDecompressedSize);
 
-/*
- * lz4_decompress_unknownoutputsize() - For backwards compatibility,
- * see LZ4_decompress_safe
- * @src: source address of the compressed data
- * @src_len: is the input size, therefore the compressed size
- * @dest: output buffer address of the decompressed data
- * which must be already allocated
- * @dest_len: is the max size of the destination buffer, which is
- * returned with actual size of decompressed data after decompress done
- *
- * Return: Success if return 0, Error if return (< 0)
- */
-int lz4_decompress_unknownoutputsize(const unsigned char *src, size_t src_len,
-   unsigned char *dest, size_t *dest_len);
-
-/**
- * lz4_decompress() - For backwards cocmpatibility, see LZ4_decompress_fast
- * @src: source address of the compressed data
- * @src_len: is the input size, which is returned after decompress done
- * @dest: output buffer address of the decompressed data,
- * which must be already allocated
- * @actual_dest_len: is the size of uncompressed data, supposing it's known
- *
- * Return: Success if return 0, Error if return (< 0)
- */
-int lz4_decompress(const unsigned char *src, size_t *src_len,
-   unsigned char *dest, size_t actual_dest_len);
-
 /*-
  * LZ4 HC Compression
  **/
@@ -401,21 +347,6 @@ int LZ4_compress_HC(const char *src, char *dst, int 
srcSize, int dstCapacity,
int compressionLevel, void *wrkmem);
 
 /**
- * lz4hc_compress() - For backwards compatibility, see LZ4_compress_HC
- * @src: source address of the original data
- * @src_len: size of the original data
- * @dst: output buffer address of the compressed data. This requires 'dst'
- * of size LZ4_COMPRESSBOUND.
- * @dst_len: is the output size, which is returned after compress done
- * @wrkmem: address of the working memory.
- * This requires 'workmem' of size LZ4HC_MEM_COMPRESS.
- *
- * Return  : Success if return 0, Error if return (< 0)
- */
-int lz4hc_compress(const unsigned char *src, size_t src_len, unsigned char 
*dst,
-   size_t *dst_len, void *wrkmem);
-
-/**
  * LZ4_resetStreamHC() - Init an allocated 'LZ4_streamHC_t' structure
  * @streamHCPtr: pointer to the 'LZ4_streamHC_t' structure
  * @compressionLevel: Recommended values are between 4 and 9, although any
diff --git a/lib/lz4/lz4_compress.c b/lib/lz4/lz4_compress.c
index 53f313f..cc7b6d4 100644
--- a/lib/lz4/lz4_compress.c
+++ b/lib/lz4/lz4_compress.c
@@ -936,27 

[PATCH v8 1/5] lib: Update LZ4 compressor module

2017-02-15 Thread Sven Schmidt
Update the LZ4 kernel module to LZ4 v1.7.3 by Yann Collet.  The kernel
module is inspired by the previous work by Chanho Min.  The updated LZ4
module will not break existing code since the patchset contains
appropriate changes.

API changes:

New method LZ4_compress_fast which differs from the variant available in
kernel by the new acceleration parameter, allowing to trade compression
ratio for more compression speed and vice versa.

LZ4_decompress_fast is the respective decompression method, featuring a
very fast decoder (multiple GB/s per core), able to reach RAM speed in
multi-core systems.  The decompressor allows to decompress data compressed
with LZ4 fast as well as the LZ4 HC (high compression) algorithm.

Also the useful functions LZ4_decompress_safe_partial and
LZ4_compress_destsize were added.  The latter reverses the logic by trying
to compress as much data as possible from source to dest while the former
aims to decompress partial blocks of data.

A bunch of streaming functions were also added which allow
compressig/decompressing data in multiple steps (so called "streaming
mode").

The methods lz4_compress and lz4_decompress_unknownoutputsize are now
known as LZ4_compress_default respectivley LZ4_decompress_safe. The old
methods will be removed since there's no callers left in the code.

Signed-off-by: Sven Schmidt <4ssch...@informatik.uni-hamburg.de>
---
 include/linux/lz4.h  |  762 +++---
 lib/lz4/Makefile |2 +
 lib/lz4/lz4_compress.c   | 1161 +-
 lib/lz4/lz4_decompress.c |  705 ++--
 lib/lz4/lz4defs.h|  338 --
 lib/lz4/lz4hc_compress.c |  867 ++
 6 files changed, 2758 insertions(+), 1077 deletions(-)

diff --git a/include/linux/lz4.h b/include/linux/lz4.h
index 6b784c5..1b0f8ca 100644
--- a/include/linux/lz4.h
+++ b/include/linux/lz4.h
@@ -1,87 +1,717 @@
-#ifndef __LZ4_H__
-#define __LZ4_H__
-/*
- * LZ4 Kernel Interface
+/* LZ4 Kernel Interface
  *
  * Copyright (C) 2013, LG Electronics, Kyungsik Lee 
+ * Copyright (C) 2016, Sven Schmidt <4ssch...@informatik.uni-hamburg.de>
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License version 2 as
  * published by the Free Software Foundation.
+ *
+ * This file is based on the original header file
+ * for LZ4 - Fast LZ compression algorithm.
+ *
+ * LZ4 - Fast LZ compression algorithm
+ * Copyright (C) 2011-2016, Yann Collet.
+ * BSD 2-Clause License (http://www.opensource.org/licenses/bsd-license.php)
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are
+ * met:
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following disclaimer
+ * in the documentation and/or other materials provided with the
+ * distribution.
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ * You can contact the author at :
+ * - LZ4 homepage : http://www.lz4.org
+ * - LZ4 source repository : https://github.com/lz4/lz4
  */
-#define LZ4_MEM_COMPRESS   (16384)
-#define LZ4HC_MEM_COMPRESS (262144 + (2 * sizeof(unsigned char *)))
 
+#ifndef __LZ4_H__
+#define __LZ4_H__
+
+#include 
+#include/* memset, memcpy */
+
+/*-
+ * CONSTANTS
+ **/
 /*
- * lz4_compressbound()
- * Provides the maximum size that LZ4 may output in a "worst case" scenario
- * (input data not compressible)
+ * LZ4_MEMORY_USAGE :
+ * Memory usage formula : N->2^N Bytes
+ * (examples : 10 -> 1KB; 12 -> 4KB ; 16 -> 64KB; 20 -> 1MB; etc.)
+ * Increasing memory usage improves compression ratio
+ * Reduced memory usage can improve speed, due to cache effect
+ * Default value is 14, for 16KB, which nicely fits into Intel x86 L1 cache
  */
-static 

[PATCH v8 3/5] crypto: Change LZ4 modules to work with new LZ4 module version

2017-02-15 Thread Sven Schmidt
Update the crypto modules using LZ4 compression as well as the test cases
in testmgr.h to work with the new LZ4 module version.

Signed-off-by: Sven Schmidt <4ssch...@informatik.uni-hamburg.de>
---
 crypto/lz4.c |  23 -
 crypto/lz4hc.c   |  23 -
 crypto/testmgr.h | 142 +++
 3 files changed, 120 insertions(+), 68 deletions(-)

diff --git a/crypto/lz4.c b/crypto/lz4.c
index 99c1b2c..71eff9b 100644
--- a/crypto/lz4.c
+++ b/crypto/lz4.c
@@ -66,15 +66,13 @@ static void lz4_exit(struct crypto_tfm *tfm)
 static int __lz4_compress_crypto(const u8 *src, unsigned int slen,
 u8 *dst, unsigned int *dlen, void *ctx)
 {
-   size_t tmp_len = *dlen;
-   int err;
+   int out_len = LZ4_compress_default(src, dst,
+   slen, *dlen, ctx);
 
-   err = lz4_compress(src, slen, dst, _len, ctx);
-
-   if (err < 0)
+   if (!out_len)
return -EINVAL;
 
-   *dlen = tmp_len;
+   *dlen = out_len;
return 0;
 }
 
@@ -96,16 +94,13 @@ static int lz4_compress_crypto(struct crypto_tfm *tfm, 
const u8 *src,
 static int __lz4_decompress_crypto(const u8 *src, unsigned int slen,
   u8 *dst, unsigned int *dlen, void *ctx)
 {
-   int err;
-   size_t tmp_len = *dlen;
-   size_t __slen = slen;
+   int out_len = LZ4_decompress_safe(src, dst, slen, *dlen);
 
-   err = lz4_decompress_unknownoutputsize(src, __slen, dst, _len);
-   if (err < 0)
-   return -EINVAL;
+   if (out_len < 0)
+   return out_len;
 
-   *dlen = tmp_len;
-   return err;
+   *dlen = out_len;
+   return 0;
 }
 
 static int lz4_sdecompress(struct crypto_scomp *tfm, const u8 *src,
diff --git a/crypto/lz4hc.c b/crypto/lz4hc.c
index 75ffc4a..03a34a8 100644
--- a/crypto/lz4hc.c
+++ b/crypto/lz4hc.c
@@ -65,15 +65,13 @@ static void lz4hc_exit(struct crypto_tfm *tfm)
 static int __lz4hc_compress_crypto(const u8 *src, unsigned int slen,
   u8 *dst, unsigned int *dlen, void *ctx)
 {
-   size_t tmp_len = *dlen;
-   int err;
+   int out_len = LZ4_compress_HC(src, dst, slen,
+   *dlen, LZ4HC_DEFAULT_CLEVEL, ctx);
 
-   err = lz4hc_compress(src, slen, dst, _len, ctx);
-
-   if (err < 0)
+   if (!out_len)
return -EINVAL;
 
-   *dlen = tmp_len;
+   *dlen = out_len;
return 0;
 }
 
@@ -97,16 +95,13 @@ static int lz4hc_compress_crypto(struct crypto_tfm *tfm, 
const u8 *src,
 static int __lz4hc_decompress_crypto(const u8 *src, unsigned int slen,
 u8 *dst, unsigned int *dlen, void *ctx)
 {
-   int err;
-   size_t tmp_len = *dlen;
-   size_t __slen = slen;
+   int out_len = LZ4_decompress_safe(src, dst, slen, *dlen);
 
-   err = lz4_decompress_unknownoutputsize(src, __slen, dst, _len);
-   if (err < 0)
-   return -EINVAL;
+   if (out_len < 0)
+   return out_len;
 
-   *dlen = tmp_len;
-   return err;
+   *dlen = out_len;
+   return 0;
 }
 
 static int lz4hc_sdecompress(struct crypto_scomp *tfm, const u8 *src,
diff --git a/crypto/testmgr.h b/crypto/testmgr.h
index 9b656be..98d4be0 100644
--- a/crypto/testmgr.h
+++ b/crypto/testmgr.h
@@ -34498,31 +34498,62 @@ static struct hash_testvec bfin_crc_tv_template[] = {
 
 static struct comp_testvec lz4_comp_tv_template[] = {
{
-   .inlen  = 70,
-   .outlen = 45,
-   .input  = "Join us now and share the software "
- "Join us now and share the software ",
-   .output = "\xf0\x10\x4a\x6f\x69\x6e\x20\x75"
- "\x73\x20\x6e\x6f\x77\x20\x61\x6e"
- "\x64\x20\x73\x68\x61\x72\x65\x20"
- "\x74\x68\x65\x20\x73\x6f\x66\x74"
- "\x77\x0d\x00\x0f\x23\x00\x0b\x50"
- "\x77\x61\x72\x65\x20",
+   .inlen  = 255,
+   .outlen = 218,
+   .input  = "LZ4 is lossless compression algorithm, providing"
+" compression speed at 400 MB/s per core, scalable "
+"with multi-cores CPU. It features an extremely fast "
+"decoder, with speed in multiple GB/s per core, "
+"typically reaching RAM speed limits on multi-core "
+"systems.",
+   .output = "\xf9\x21\x4c\x5a\x34\x20\x69\x73\x20\x6c\x6f\x73\x73"
+ "\x6c\x65\x73\x73\x20\x63\x6f\x6d\x70\x72\x65\x73\x73"
+ "\x69\x6f\x6e\x20\x61\x6c\x67\x6f\x72\x69\x74\x68\x6d"
+ "\x2c\x20\x70\x72\x6f\x76\x69\x64\x69\x6e\x67\x21\x00"
+ "\xf0\x21\x73\x70\x65\x65\x64\x20\x61\x74\x20\x34\x30"
+ 

[PATCH v8 0/5] Update LZ4 compressor module

2017-02-15 Thread Sven Schmidt

This patchset is for updating the LZ4 compression module to a version based
on LZ4 v1.7.3 allowing to use the fast compression algorithm aka LZ4 fast
which provides an "acceleration" parameter as a tradeoff between
high compression ratio and high compression speed.

We want to use LZ4 fast in order to support compression in lustre
and (mostly, based on that) investigate data reduction techniques in behalf of
storage systems.

Also, it will be useful for other users of LZ4 compression, as with LZ4 fast
it is possible to enable applications to use fast and/or high compression
depending on the usecase.
For instance, ZRAM is offering a LZ4 backend and could benefit from an updated
LZ4 in the kernel.

LZ4 homepage: http://www.lz4.org/
LZ4 source repository: https://github.com/lz4/lz4
Source version: 1.7.3

Benchmark (taken from [1], Core i5-4300U @1.9GHz):
|--||--
Compressor  | Compression  | Decompression  | Ratio
|--||--
memcpy  |  4200 MB/s   |  4200 MB/s | 1.000
LZ4 fast 50 |  1080 MB/s   |  2650 MB/s | 1.375
LZ4 fast 17 |   680 MB/s   |  2220 MB/s | 1.607
LZ4 fast 5  |   475 MB/s   |  1920 MB/s | 1.886
LZ4 default |   385 MB/s   |  1850 MB/s | 2.101

[1] http://fastcompression.blogspot.de/2015/04/sampling-or-faster-lz4.html

[PATCH 1/5] lib: Update LZ4 compressor module
[PATCH 2/5] lib/decompress_unlz4: Change module to work with new LZ4 module 
version
[PATCH 3/5] crypto: Change LZ4 modules to work with new LZ4 module version
[PATCH 4/5] fs/pstore: fs/squashfs: Change usage of LZ4 to work with new LZ4 
version
[PATCH 5/5] lib/lz4: Remove back-compat wrappers

Changes:
v8:
- Rewrote the architecture-dependent definitions in lz4defs.h, such as 
LZ4_read*,
  LZ4_write* and LZ4_NbCommonBytes as proposed by Eric Biggers
- Added -O3 compiler flag to Makefile, also suggested by Eric
- Defined FORCE_INLINE, as used in upstream LZ4, as __always_inline and
  force-inlined most of the small functions in lz4defs.h
- lz4_decompress: Wrapped the EXPORT_SYMBOL and MODULE_* macros in a
  #ifdef STATIC the way suggested by Andrew Morton, fixing the breakage
  of CONFIG_KERNEL_LZ4 in x86 as reported by Arnd Bergman

v7:
- Fixed errors reported by the Smatch tool
- Changed function documentation comments in lz4.h to match kernel-doc style
- Fixed a misbehaviour of LZ4HC caused by the wrong level of indentation
  concerning two for loops introduced after I refactored the code style using
  checkpatch.pl (upstream LZ4 put dozens of stuff in just one line, gnah)
- Updated the crypto tests for LZ4 since they did fail for the new code
  and hence zram did fail to allocate memory for LZ4

v6:
- Fixed LZ4_NBCOMMONBYTES() for 64-bit little endian
- Reset LZ4_MEMORY_USAGE to 14 (which is the value used in
  upstream LZ4 as well as the previous kernel module)
- Fixed that weird double-indentation in lz4defs.h and lz4.h
- Adjusted general styling issues in lz4defs.h
  (e.g. lines consisting of more than one instruction)
- Removed the architecture-dependent typedef to reg_t
  since upstream LZ4 is just using size_t and that works fine
- Changed error messages in pstore/platform.c:
  * LZ4_compress_default always returns 0 in case of an error
(no need to print the return value)
  * LZ4_decompress_safe returns a negative error message
(return value _does_ matter)

v5:
- Added a fifth patch to remove the back-compat wrappers introduced
  to ensure bisectibility between the patches (the functions are no longer
  needed since there's no callers left)

v4:
- Fixed kbuild errors
- Re-added lz4_compressbound as alias for LZ4_compressBound
  to ensure backwards compatibility
- Wrapped LZ4_hash5 with check for LZ4_ARCH64 since it is only used there
  and triggers an unused function warning when false

v3:
- Adjusted the code to satisfy kernel coding style (checkpatch.pl)
- Made sure the changes to LZ4 in Kernel (overflow checks etc.)
  are included in the new module (they are)
- Removed the second LZ4_compressBound function with related name but
  different return type
- Corrected version number (was LZ4 1.7.3)
- Added missing LZ4 streaming functions

v2:
- Changed order of the patches since in the initial patchset the lz4.h was in 
the
  last patch but was referenced by the other ones
- Split lib/decompress_unlz4.c in an own patch
- Fixed errors reported by the buildbot
- Further refactorings
- Added more appropriate copyright note to include/linux/lz4.h



[PATCH v8 4/5] fs/pstore: fs/squashfs: Change usage of LZ4 to work with new LZ4 version

2017-02-15 Thread Sven Schmidt
Update fs/pstore and fs/squashfs to use the updated functions from the new
LZ4 module.

Signed-off-by: Sven Schmidt <4ssch...@informatik.uni-hamburg.de>
---
 fs/pstore/platform.c  | 22 +-
 fs/squashfs/lz4_wrapper.c | 12 ++--
 2 files changed, 19 insertions(+), 15 deletions(-)

diff --git a/fs/pstore/platform.c b/fs/pstore/platform.c
index 729677e..efab7b6 100644
--- a/fs/pstore/platform.c
+++ b/fs/pstore/platform.c
@@ -342,31 +342,35 @@ static int compress_lz4(const void *in, void *out, size_t 
inlen, size_t outlen)
 {
int ret;
 
-   ret = lz4_compress(in, inlen, out, , workspace);
-   if (ret) {
-   pr_err("lz4_compress error, ret = %d!\n", ret);
+   ret = LZ4_compress_default(in, out, inlen, outlen, workspace);
+   if (!ret) {
+   pr_err("LZ4_compress_default error; compression failed!\n");
return -EIO;
}
 
-   return outlen;
+   return ret;
 }
 
 static int decompress_lz4(void *in, void *out, size_t inlen, size_t outlen)
 {
int ret;
 
-   ret = lz4_decompress_unknownoutputsize(in, inlen, out, );
-   if (ret) {
-   pr_err("lz4_decompress error, ret = %d!\n", ret);
+   ret = LZ4_decompress_safe(in, out, inlen, outlen);
+   if (ret < 0) {
+   /*
+* LZ4_decompress_safe will return an error code
+* (< 0) if decompression failed
+*/
+   pr_err("LZ4_decompress_safe error, ret = %d!\n", ret);
return -EIO;
}
 
-   return outlen;
+   return ret;
 }
 
 static void allocate_lz4(void)
 {
-   big_oops_buf_sz = lz4_compressbound(psinfo->bufsize);
+   big_oops_buf_sz = LZ4_compressBound(psinfo->bufsize);
big_oops_buf = kmalloc(big_oops_buf_sz, GFP_KERNEL);
if (big_oops_buf) {
workspace = kmalloc(LZ4_MEM_COMPRESS, GFP_KERNEL);
diff --git a/fs/squashfs/lz4_wrapper.c b/fs/squashfs/lz4_wrapper.c
index ff4468b..95da653 100644
--- a/fs/squashfs/lz4_wrapper.c
+++ b/fs/squashfs/lz4_wrapper.c
@@ -97,7 +97,6 @@ static int lz4_uncompress(struct squashfs_sb_info *msblk, 
void *strm,
struct squashfs_lz4 *stream = strm;
void *buff = stream->input, *data;
int avail, i, bytes = length, res;
-   size_t dest_len = output->length;
 
for (i = 0; i < b; i++) {
avail = min(bytes, msblk->devblksize - offset);
@@ -108,12 +107,13 @@ static int lz4_uncompress(struct squashfs_sb_info *msblk, 
void *strm,
put_bh(bh[i]);
}
 
-   res = lz4_decompress_unknownoutputsize(stream->input, length,
-   stream->output, _len);
-   if (res)
+   res = LZ4_decompress_safe(stream->input, stream->output,
+   length, output->length);
+
+   if (res < 0)
return -EIO;
 
-   bytes = dest_len;
+   bytes = res;
data = squashfs_first_page(output);
buff = stream->output;
while (data) {
@@ -128,7 +128,7 @@ static int lz4_uncompress(struct squashfs_sb_info *msblk, 
void *strm,
}
squashfs_finish_page(output);
 
-   return dest_len;
+   return res;
 }
 
 const struct squashfs_decompressor squashfs_lz4_comp_ops = {
-- 
2.1.4



BUG: af_alg bind fails for 50 % request from userspace for hash algo

2017-02-15 Thread Harsh Jain
Hi Herbert/Stephen,

When I try to run 100 application which calculates sha384 digest from 
userspace, nearly 50 applications fail in bind system call with error ENOENT.

"crypto_alg_mod_lookup" in api.c call fails in kernel space. Issue comes in 1st 
try only(Seems some relation with crypto test executions).  If I execute same 
test again issue didn't reproduce.

Regards

Harsh Jain



Re: [RFC PATCH v1 1/1] mm: zswap - Add crypto acomp/scomp framework support

2017-02-15 Thread Narayana Prasad Athreya

I assume all of these crypto_acomp_[compress|decompress] calls are
actually synchronous,
not asynchronous as the name suggests.  Otherwise, this would blow up
quite spectacularly
since all the resources we use in the call get derefed/unmapped below.

Could an async algorithm be implement/used that would break this assumption?


The callback is set to NULL using acomp_request_set_callback(). This 
implies synchronous mode of operation. So the underlying implementation 
must complete the operation synchronously.


Prasad

On Tuesday 14 February 2017 09:50 PM, Seth Jennings wrote:

On Tue, Feb 14, 2017 at 9:40 AM, Mahipal Challa
 wrote:

This adds the support for kernel's crypto new acomp/scomp framework
to zswap.

Signed-off-by: Mahipal Challa 
Signed-off-by: Vishnu Nair 
---
  mm/zswap.c | 129 +++--
  1 file changed, 99 insertions(+), 30 deletions(-)

diff --git a/mm/zswap.c b/mm/zswap.c
index 067a0d6..d08631b 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -33,6 +33,8 @@
  #include 
  #include 
  #include 
+#include 
+#include 
  #include 
  #include 

@@ -114,7 +116,8 @@ static int zswap_compressor_param_set(const char *,

  struct zswap_pool {
 struct zpool *zpool;
-   struct crypto_comp * __percpu *tfm;
+   struct crypto_acomp * __percpu *acomp;
+   struct acomp_req * __percpu *acomp_req;
 struct kref kref;
 struct list_head list;
 struct work_struct work;
@@ -379,30 +382,49 @@ static int zswap_dstmem_dead(unsigned int cpu)
  static int zswap_cpu_comp_prepare(unsigned int cpu, struct hlist_node *node)
  {
 struct zswap_pool *pool = hlist_entry(node, struct zswap_pool, node);
-   struct crypto_comp *tfm;
+   struct crypto_acomp *acomp;
+   struct acomp_req *acomp_req;

-   if (WARN_ON(*per_cpu_ptr(pool->tfm, cpu)))
+   if (WARN_ON(*per_cpu_ptr(pool->acomp, cpu)))
 return 0;
+   if (WARN_ON(*per_cpu_ptr(pool->acomp_req, cpu)))
+   return 0;
+
+   acomp = crypto_alloc_acomp(pool->tfm_name, 0, 0);
+   if (IS_ERR_OR_NULL(acomp)) {
+   pr_err("could not alloc crypto acomp %s : %ld\n",
+  pool->tfm_name, PTR_ERR(acomp));
+   return -ENOMEM;
+   }
+   *per_cpu_ptr(pool->acomp, cpu) = acomp;

-   tfm = crypto_alloc_comp(pool->tfm_name, 0, 0);
-   if (IS_ERR_OR_NULL(tfm)) {
-   pr_err("could not alloc crypto comp %s : %ld\n",
-  pool->tfm_name, PTR_ERR(tfm));
+   acomp_req = acomp_request_alloc(acomp);
+   if (IS_ERR_OR_NULL(acomp_req)) {
+   pr_err("could not alloc crypto acomp %s : %ld\n",
+  pool->tfm_name, PTR_ERR(acomp));
 return -ENOMEM;
 }
-   *per_cpu_ptr(pool->tfm, cpu) = tfm;
+   *per_cpu_ptr(pool->acomp_req, cpu) = acomp_req;
+
 return 0;
  }

  static int zswap_cpu_comp_dead(unsigned int cpu, struct hlist_node *node)
  {
 struct zswap_pool *pool = hlist_entry(node, struct zswap_pool, node);
-   struct crypto_comp *tfm;
+   struct crypto_acomp *acomp;
+   struct acomp_req *acomp_req;
+
+   acomp_req = *per_cpu_ptr(pool->acomp_req, cpu);
+   if (!IS_ERR_OR_NULL(acomp_req))
+   acomp_request_free(acomp_req);
+   *per_cpu_ptr(pool->acomp_req, cpu) = NULL;
+
+   acomp = *per_cpu_ptr(pool->acomp, cpu);
+   if (!IS_ERR_OR_NULL(acomp))
+   crypto_free_acomp(acomp);
+   *per_cpu_ptr(pool->acomp, cpu) = NULL;

-   tfm = *per_cpu_ptr(pool->tfm, cpu);
-   if (!IS_ERR_OR_NULL(tfm))
-   crypto_free_comp(tfm);
-   *per_cpu_ptr(pool->tfm, cpu) = NULL;
 return 0;
  }

@@ -503,8 +525,14 @@ static struct zswap_pool *zswap_pool_create(char *type, 
char *compressor)
 pr_debug("using %s zpool\n", zpool_get_type(pool->zpool));

 strlcpy(pool->tfm_name, compressor, sizeof(pool->tfm_name));
-   pool->tfm = alloc_percpu(struct crypto_comp *);
-   if (!pool->tfm) {
+   pool->acomp = alloc_percpu(struct crypto_acomp *);
+   if (!pool->acomp) {
+   pr_err("percpu alloc failed\n");
+   goto error;
+   }
+
+   pool->acomp_req = alloc_percpu(struct acomp_req *);
+   if (!pool->acomp_req) {
 pr_err("percpu alloc failed\n");
 goto error;
 }
@@ -526,7 +554,8 @@ static struct zswap_pool *zswap_pool_create(char *type, 
char *compressor)
 return pool;

  error:
-   free_percpu(pool->tfm);
+   free_percpu(pool->acomp_req);
+   free_percpu(pool->acomp);
 if (pool->zpool)
 zpool_destroy_pool(pool->zpool);
 kfree(pool);
@@ -566,7 +595,8 @@ static void zswap_pool_destroy(struct zswap_pool *pool)
 zswap_pool_debug("destroying", pool);

 

[PATCH] crypto: cavium/cpt: Fix couple of static checker errors

2017-02-15 Thread George Cherian
Fix the following smatch errors
cptvf_reqmanager.c:333 do_post_process() warn: variable dereferenced
before check 'cptvf'
cptvf_main.c:825 cptvf_remove() error: we previously assumed 'cptvf'
could be null

Reported-by: Dan Carpenter 
Signed-off-by: George Cherian 
---
 drivers/crypto/cavium/cpt/cptvf_main.c   | 4 +++-
 drivers/crypto/cavium/cpt/cptvf_reqmanager.c | 4 ++--
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/crypto/cavium/cpt/cptvf_main.c 
b/drivers/crypto/cavium/cpt/cptvf_main.c
index aac2966..e50872e 100644
--- a/drivers/crypto/cavium/cpt/cptvf_main.c
+++ b/drivers/crypto/cavium/cpt/cptvf_main.c
@@ -815,8 +815,10 @@ static void cptvf_remove(struct pci_dev *pdev)
 {
struct cpt_vf *cptvf = pci_get_drvdata(pdev);
 
-   if (!cptvf)
+   if (!cptvf) {
dev_err(>dev, "Invalid CPT-VF device\n");
+   return;
+   }
 
/* Convey DOWN to PF */
if (cptvf_send_vf_down(cptvf)) {
diff --git a/drivers/crypto/cavium/cpt/cptvf_reqmanager.c 
b/drivers/crypto/cavium/cpt/cptvf_reqmanager.c
index 7f57f30..169e662 100644
--- a/drivers/crypto/cavium/cpt/cptvf_reqmanager.c
+++ b/drivers/crypto/cavium/cpt/cptvf_reqmanager.c
@@ -330,8 +330,8 @@ void do_post_process(struct cpt_vf *cptvf, struct 
cpt_info_buffer *info)
 {
struct pci_dev *pdev = cptvf->pdev;
 
-   if (!info || !cptvf) {
-   dev_err(>dev, "Input params are incorrect for post 
processing\n");
+   if (!info) {
+   dev_err(>dev, "incorrect cpt_info_buffer for post 
processing\n");
return;
}
 
-- 
2.1.4



Assalamu`Alaikum.

2017-02-15 Thread mohammad ouattara
Dear Sir/Madam.

Assalamu`Alaikum.

I am Dr mohammad ouattara, I have  ($10.6 Million us dollars) to transfer into 
your account,

I will send you more details about this deal and the procedures to follow when 
I receive a positive response from you, 

Have a great day,
Dr mohammad ouattara.


Assalamu`Alaikum.

2017-02-15 Thread mohammad ouattara



Dear Sir/Madam.

Assalamu`Alaikum.

I am Dr mohammad ouattara, I have  ($10.6 Million us dollars) to transfer into 
your account,

I will send you more details about this deal and the procedures to follow when 
I receive a positive response from you, 

Have a great day,
Dr mohammad ouattara.


Re: crypto/cavium MSI-X fixups

2017-02-15 Thread George Cherian

Hi Christoph,


On 02/15/2017 12:48 PM, Christoph Hellwig wrote:

Hi George,

your commit "crypto: cavium - Add Support for Octeon-tx CPT Engine"
add a new caller to pci_enable_msix.  This API has long been deprecated
so this series switches it to use pci_alloc_irq_vectors instead.

Can you please test it and make sure it goes in before the end of the
merge window so that no more users of the old API hit mainline?


Yes the changes works well.
Acked-by: George Cherian 

for the series.




[bug report] crypto: cavium - Add the Virtual Function driver for CPT

2017-02-15 Thread Dan Carpenter
Hello George Cherian,

This is a semi-automatic email about new static checker warnings.

The patch c694b233295b: "crypto: cavium - Add the Virtual Function 
driver for CPT" from Feb 7, 2017, leads to the following Smatch 
complaint:

drivers/crypto/cavium/cpt/cptvf_reqmanager.c:333 do_post_process()
 warn: variable dereferenced before check 'cptvf' (see line 331)

drivers/crypto/cavium/cpt/cptvf_reqmanager.c
   330  {
   331  struct pci_dev *pdev = cptvf->pdev;
   ^^^
Dereference.

   332  
   333  if (!info || !cptvf) {
  ^
Check is too late.

   334  dev_err(>dev, "Input params are incorrect for 
post processing\n");
   335  return;

regards,
dan carpenter


Re: [PATCH v4 3/4] dmaengine: Add Broadcom SBA RAID driver

2017-02-15 Thread Anup Patel
On Wed, Feb 15, 2017 at 12:55 PM, Dan Williams  wrote:
> On Tue, Feb 14, 2017 at 11:03 PM, Anup Patel  wrote:
>> On Wed, Feb 15, 2017 at 12:13 PM, Dan Williams  
>> wrote:
>>> On Tue, Feb 14, 2017 at 10:25 PM, Anup Patel  
>>> wrote:
 On Tue, Feb 14, 2017 at 10:04 PM, Dan Williams  
 wrote:
> On Mon, Feb 13, 2017 at 10:51 PM, Anup Patel  
> wrote:
>> The Broadcom stream buffer accelerator (SBA) provides offloading
>> capabilities for RAID operations. This SBA offload engine is
>> accessible via Broadcom SoC specific ring manager.
>>
>> This patch adds Broadcom SBA RAID driver which provides one
>> DMA device with RAID capabilities using one or more Broadcom
>> SoC specific ring manager channels. The SBA RAID driver in its
>> current shape implements memcpy, xor, and pq operations.
>>
>> Signed-off-by: Anup Patel 
>> Reviewed-by: Ray Jui 
>> ---
>>  drivers/dma/Kconfig|   13 +
>>  drivers/dma/Makefile   |1 +
>>  drivers/dma/bcm-sba-raid.c | 1694 
>> 
>>  3 files changed, 1708 insertions(+)
>>  create mode 100644 drivers/dma/bcm-sba-raid.c
>>
>> diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig
>> index 263495d..bf8fb84 100644
>> --- a/drivers/dma/Kconfig
>> +++ b/drivers/dma/Kconfig
>> @@ -99,6 +99,19 @@ config AXI_DMAC
>>   controller is often used in Analog Device's reference designs 
>> for FPGA
>>   platforms.
>>
>> +config BCM_SBA_RAID
>> +   tristate "Broadcom SBA RAID engine support"
>> +   depends on (ARM64 && MAILBOX && RAID6_PQ) || COMPILE_TEST
>> +   select DMA_ENGINE
>> +   select DMA_ENGINE_RAID
>> +   select ASYNC_TX_ENABLE_CHANNEL_SWITCH
>
> I thought you agreed to drop this. Its usage is broken.

 If ASYNC_TX_ENABLE_CHANNEL_SWITCH is not selected
 then async_dma_find_channel() will only try to find channel
 with DMA_ASYNC_TX capability.

 The DMA_ASYNC_TX capability is set by
 dma_async_device_register() when all Async Tx
 capabilities are supported by a DMA devices namely
 DMA_INTERRUPT, DMA_MEMCPY, DMA_XOR,
 DMA_XOR_VAL, DMA_PQ, and DMA_PQ_VAL.

 We only support DMA_MEMCPY, DMA_XOR, and
 DMA_PQ capabilities in BCM-SBA-RAID driver so
 DMA_ASYNC_TX capability is never set for the
 DMA device registered by BCM-SBA-RAID driver.

 Due to above, if ASYNC_TX_ENABLE_CHANNEL_SWITCH
 is not selected then Async Tx APIs fail to find DMA
 channel provided by BCM-SBA-RAID hence the
 option ASYNC_TX_ENABLE_CHANNEL_SWITCH is
 required for BCM-SBA-RAID.

 The DMA mappings are violated by channel switching
 only if we switch form DMA channel A to DMA channel
 B and both these DMA channels have different underlying
 "struct device". In most of the cases DMA mappings
 are not violated because DMA channels having
 Async Tx capabilities are provided using same
 underlying "struct device".
>>>
>>> No, fix the infrastructure. Do not put local hack in your driver for
>>> this global problem [1].
>>
>> There is no hack in the driver. We need
>> ASYNC_TX_ENABLE_CHANNEL_SWITCH
>> based on current state of dmaengine framework.
>>
>> The framework should be fixed as separate patchset.
>>
>> We have other RAID drivers such as xgene-dma and
>> mv_xor_v2 who also require
>> ASYNC_TX_ENABLE_CHANNEL_SWITCH due
>> to same reason.
>>
>> Fixing the framework and improving framework is
>> a ongoing process. I don't see why that should
>> stop this patchset.
>>
>
> Because this driver is turning on a dangerous compile time option and
> is not using the functionality. If this silicon IP block appears in
> another product in the future paired with another DMA engine then the
> assumptions about a safe/single dma-device is violated.
>
> The realization of how async_tx was breaking DMA mapping api
> assumptions came after some of these dma-drivers were added to the
> kernel. We should stop making the problem worse.
>
> I should have submitted a patch like the below at the time we
> discovered this problem, but unfortunately it languished when I
> stopped maintaining the iop-adma and ioat drivers.
>
> diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig
> index 263495d0adbd..6b30eb9ad125 100644
> --- a/drivers/dma/Kconfig
> +++ b/drivers/dma/Kconfig
> @@ -35,6 +35,7 @@ comment "DMA Devices"
>
>  #core
>  config ASYNC_TX_ENABLE_CHANNEL_SWITCH
> +   depends on BROKEN
> bool
>
>  config ARCH_HAS_ASYNC_TX_FIND_CHANNEL

Instead of selecting
ASYNC_TX_ENABLE_CHANNEL_SWITCH,
we can select the following in BCM_SBA_RAID config
option:
1. ASYNC_TX_DISABLE_XOR_VAL