[PATCH 04/13] crypto: crypto4xx: increase context and scatter ring buffer elements

2017-10-03 Thread Christian Lamparter
If crypto4xx is used in conjunction with dm-crypt, the available
ring buffer elements are not enough to handle the load properly.

On an aes-cbc-essiv:sha256 encrypted swap partition the read
performance is abyssal: (tested with hdparm -t)

/dev/mapper/swap_crypt:
 Timing buffered disk reads:  14 MB in  3.68 seconds =   3.81 MB/sec

The patch increases both PPC4XX_NUM_SD and PPC4XX_NUM_PD to 256.
This improves the performance considerably:

/dev/mapper/swap_crypt:
 Timing buffered disk reads: 104 MB in  3.03 seconds =  34.31 MB/sec

Furthermore, PPC4XX_LAST_SD, PPC4XX_LAST_GD and PPC4XX_LAST_PD
can be easily calculated from their respective PPC4XX_NUM_*
constant.

Signed-off-by: Christian Lamparter 
---
 drivers/crypto/amcc/crypto4xx_core.h | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/crypto/amcc/crypto4xx_core.h 
b/drivers/crypto/amcc/crypto4xx_core.h
index 97fb8288ab30..27e439c1f5bf 100644
--- a/drivers/crypto/amcc/crypto4xx_core.h
+++ b/drivers/crypto/amcc/crypto4xx_core.h
@@ -36,12 +36,12 @@
 #define PPC405EX_CE_RESET   0x0008
 
 #define CRYPTO4XX_CRYPTO_PRIORITY  300
-#define PPC4XX_LAST_PD 63
-#define PPC4XX_NUM_PD  64
-#define PPC4XX_LAST_GD 1023
+#define PPC4XX_NUM_PD  256
+#define PPC4XX_LAST_PD (PPC4XX_NUM_PD - 1)
 #define PPC4XX_NUM_GD  1024
-#define PPC4XX_LAST_SD 63
-#define PPC4XX_NUM_SD  64
+#define PPC4XX_LAST_GD (PPC4XX_NUM_GD - 1)
+#define PPC4XX_NUM_SD  256
+#define PPC4XX_LAST_SD (PPC4XX_NUM_SD - 1)
 #define PPC4XX_SD_BUFFER_SIZE  2048
 
 #define PD_ENTRY_INUSE 1
-- 
2.14.2



[PATCH 02/13] crypto: crypto4xx: fix off-by-one AES-OFB

2017-10-03 Thread Christian Lamparter
I used aes-cbc as a template for ofb. But sadly I forgot
to update set_key method to crypto4xx_setkey_aes_ofb().

this was caught by the testmgr:
alg: skcipher: Test 1 failed (invalid result) on encr. for ofb-aes-ppc4xx
: 76 49 ab ac 81 19 b2 46 ce e9 8e 9b 12 e9 19 7d
0010: 50 86 cb 9b 50 72 19 ee 95 db 11 3a 91 76 78 b2
0020: 73 be d6 b8 e3 c1 74 3b 71 16 e6 9e 22 22 95 16
0030: 3f f1 ca a1 68 1f ac 09 12 0e ca 30 75 86 e1 a7

With the correct set_key method, the aes-ofb cipher passes the test.

name : ofb(aes)
driver   : ofb-aes-ppc4xx
module   : crypto4xx
priority : 300
refcnt   : 1
selftest : passed
internal : no
type : ablkcipher
async: yes
blocksize: 16
min keysize  : 16
max keysize  : 32
ivsize   : 16
geniv: 

Signed-off-by: Christian Lamparter 
---
 drivers/crypto/amcc/crypto4xx_core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/crypto/amcc/crypto4xx_core.c 
b/drivers/crypto/amcc/crypto4xx_core.c
index 773e5faebc47..cb45365166ae 100644
--- a/drivers/crypto/amcc/crypto4xx_core.c
+++ b/drivers/crypto/amcc/crypto4xx_core.c
@@ -1148,7 +1148,7 @@ struct crypto4xx_alg_common crypto4xx_alg[] = {
.min_keysize= AES_MIN_KEY_SIZE,
.max_keysize= AES_MAX_KEY_SIZE,
.ivsize = AES_IV_SIZE,
-   .setkey = crypto4xx_setkey_aes_cbc,
+   .setkey = crypto4xx_setkey_aes_ofb,
.encrypt= crypto4xx_encrypt,
.decrypt= crypto4xx_decrypt,
}
-- 
2.14.2



[PATCH 06/13] crypto: crypto4xx: use the correct LE32 format for IV and key defs

2017-10-03 Thread Christian Lamparter
The hardware expects that the keys, IVs (and inner/outer hashes)
are in the le32 format.

This patch changes all hardware interface declarations to use
the correct LE32 data format for each field.

In order to pass __CHECK_ENDIAN__ checks, crypto4xx_memcpy_le
has to be honest about the endianness of its parameters.
The function was split and moved to the common crypto4xx_core.h
header. This allows the compiler to generate better code if the
sizes/len is a constant (various *_IV_LEN).

Please note that the hardware isn't consistent with the endiannes
of the save_digest field in the state record struct though.
The hashes produced by GHASH and CBC (for CCM) will be in LE32.
Whereas md5 and sha{1/,256,...} do not need any conversion.

Signed-off-by: Christian Lamparter 
---
 drivers/crypto/amcc/crypto4xx_alg.c  |  4 +--
 drivers/crypto/amcc/crypto4xx_core.c | 40 ++
 drivers/crypto/amcc/crypto4xx_core.h | 47 +---
 drivers/crypto/amcc/crypto4xx_sa.h   | 29 --
 4 files changed, 64 insertions(+), 56 deletions(-)

diff --git a/drivers/crypto/amcc/crypto4xx_alg.c 
b/drivers/crypto/amcc/crypto4xx_alg.c
index 57b1dcef4cb4..0e1d110a6405 100644
--- a/drivers/crypto/amcc/crypto4xx_alg.c
+++ b/drivers/crypto/amcc/crypto4xx_alg.c
@@ -149,8 +149,8 @@ static int crypto4xx_setkey_aes(struct crypto_ablkcipher 
*cipher,
 SA_SEQ_MASK_OFF, SA_MC_ENABLE,
 SA_NOT_COPY_PAD, SA_NOT_COPY_PAYLOAD,
 SA_NOT_COPY_HDR);
-   crypto4xx_memcpy_le(get_dynamic_sa_key_field(sa),
-   key, keylen);
+   crypto4xx_memcpy_to_le32(get_dynamic_sa_key_field(sa),
+key, keylen);
sa->sa_contents.w = SA_AES_CONTENTS | (keylen << 2);
sa->sa_command_1.bf.key_len = keylen >> 3;
ctx->is_hash = 0;
diff --git a/drivers/crypto/amcc/crypto4xx_core.c 
b/drivers/crypto/amcc/crypto4xx_core.c
index abdf1db9b0eb..c936d68f19ad 100644
--- a/drivers/crypto/amcc/crypto4xx_core.c
+++ b/drivers/crypto/amcc/crypto4xx_core.c
@@ -614,42 +614,6 @@ static u32 crypto4xx_pd_done(struct crypto4xx_device *dev, 
u32 idx)
return crypto4xx_ahash_done(dev, pd_uinfo);
 }
 
-/**
- * Note: Only use this function to copy items that is word aligned.
- */
-void crypto4xx_memcpy_le(unsigned int *dst,
-const unsigned char *buf,
-int len)
-{
-   u8 *tmp;
-   for (; len >= 4; buf += 4, len -= 4)
-   *dst++ = cpu_to_le32(*(unsigned int *) buf);
-
-   tmp = (u8 *)dst;
-   switch (len) {
-   case 3:
-   *tmp++ = 0;
-   *tmp++ = *(buf+2);
-   *tmp++ = *(buf+1);
-   *tmp++ = *buf;
-   break;
-   case 2:
-   *tmp++ = 0;
-   *tmp++ = 0;
-   *tmp++ = *(buf+1);
-   *tmp++ = *buf;
-   break;
-   case 1:
-   *tmp++ = 0;
-   *tmp++ = 0;
-   *tmp++ = 0;
-   *tmp++ = *buf;
-   break;
-   default:
-   break;
-   }
-}
-
 static void crypto4xx_stop_all(struct crypto4xx_core_device *core_dev)
 {
crypto4xx_destroy_pdr(core_dev->dev);
@@ -809,8 +773,8 @@ u32 crypto4xx_build_pd(struct crypto_async_request *req,
_uinfo->sr_pa, 4);
 
if (iv_len)
-   crypto4xx_memcpy_le(pd_uinfo->sr_va->save_iv,
-   iv, iv_len);
+   crypto4xx_memcpy_to_le32(pd_uinfo->sr_va->save_iv,
+iv, iv_len);
} else {
if (ctx->direction == DIR_INBOUND) {
pd->sa = ctx->sa_in_dma_addr;
diff --git a/drivers/crypto/amcc/crypto4xx_core.h 
b/drivers/crypto/amcc/crypto4xx_core.h
index dbe29043e0c5..2df6874edee1 100644
--- a/drivers/crypto/amcc/crypto4xx_core.h
+++ b/drivers/crypto/amcc/crypto4xx_core.h
@@ -166,9 +166,7 @@ int crypto4xx_alloc_sa(struct crypto4xx_ctx *ctx, u32 size);
 void crypto4xx_free_sa(struct crypto4xx_ctx *ctx);
 void crypto4xx_free_ctx(struct crypto4xx_ctx *ctx);
 u32 crypto4xx_alloc_state_record(struct crypto4xx_ctx *ctx);
-void crypto4xx_memcpy_le(unsigned int *dst,
-const unsigned char *buf, int len);
-u32 crypto4xx_build_pd(struct crypto_async_request *req,
+int crypto4xx_build_pd(struct crypto_async_request *req,
   struct crypto4xx_ctx *ctx,
   struct scatterlist *src,
   struct scatterlist *dst,
@@ -193,4 +191,47 @@ int crypto4xx_hash_digest(struct ahash_request *req);
 int crypto4xx_hash_final(struct ahash_request *req);
 int crypto4xx_hash_update(struct ahash_request *req);
 int crypto4xx_hash_init(struct ahash_request *req);
+
+/**
+ * Note: Only 

[PATCH 03/13] crypto: crypto4xx: fix type mismatch compiler error

2017-10-03 Thread Christian Lamparter
This patch fixes a type mismatch error that I accidentally
introduced when I moved and refactored the dynamic_contents
helpers.

Signed-off-by: Christian Lamparter 
---
 drivers/crypto/amcc/crypto4xx_sa.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/crypto/amcc/crypto4xx_sa.h 
b/drivers/crypto/amcc/crypto4xx_sa.h
index 7cc04f1ff8a0..8040c82dc354 100644
--- a/drivers/crypto/amcc/crypto4xx_sa.h
+++ b/drivers/crypto/amcc/crypto4xx_sa.h
@@ -266,9 +266,9 @@ get_dynamic_sa_offset_state_ptr_field(struct dynamic_sa_ctl 
*cts)
return sizeof(struct dynamic_sa_ctl) + offset * 4;
 }
 
-static inline u8 *get_dynamic_sa_key_field(struct dynamic_sa_ctl *cts)
+static inline u32 *get_dynamic_sa_key_field(struct dynamic_sa_ctl *cts)
 {
-   return (u8 *) ((unsigned long)cts + sizeof(struct dynamic_sa_ctl));
+   return (u32 *) ((unsigned long)cts + sizeof(struct dynamic_sa_ctl));
 }
 
 #endif
-- 
2.14.2



[PATCH 09/13] crypto: crypto4xx: fix stalls under heavy load

2017-10-03 Thread Christian Lamparter
If the crypto4xx device is continuously loaded by dm-crypt
and ipsec work, it will start to work intermittent after a
few (between 20-30) seconds, hurting throughput and latency.

This patch contains various stability improvements in order
to fix this issue. So far, the hardware has survived more
than a day without suffering any stalls under the continuous
load.

Signed-off-by: Christian Lamparter 
---
 drivers/crypto/amcc/crypto4xx_core.c| 33 ++---
 drivers/crypto/amcc/crypto4xx_reg_def.h |  3 +++
 2 files changed, 21 insertions(+), 15 deletions(-)

diff --git a/drivers/crypto/amcc/crypto4xx_core.c 
b/drivers/crypto/amcc/crypto4xx_core.c
index 1752ea2125db..de9044201a23 100644
--- a/drivers/crypto/amcc/crypto4xx_core.c
+++ b/drivers/crypto/amcc/crypto4xx_core.c
@@ -280,17 +280,20 @@ static u32 crypto4xx_get_pd_from_pdr_nolock(struct 
crypto4xx_device *dev)
 static u32 crypto4xx_put_pd_to_pdr(struct crypto4xx_device *dev, u32 idx)
 {
struct pd_uinfo *pd_uinfo = >pdr_uinfo[idx];
+   u32 tail;
unsigned long flags;
 
spin_lock_irqsave(>core_dev->lock, flags);
+   pd_uinfo->state = PD_ENTRY_FREE;
+
if (dev->pdr_tail != PPC4XX_LAST_PD)
dev->pdr_tail++;
else
dev->pdr_tail = 0;
-   pd_uinfo->state = PD_ENTRY_FREE;
+   tail = dev->pdr_tail;
spin_unlock_irqrestore(>core_dev->lock, flags);
 
-   return 0;
+   return tail;
 }
 
 /**
@@ -854,16 +857,16 @@ int crypto4xx_build_pd(struct crypto_async_request *req,
}
}
 
-   sa->sa_command_1.bf.hash_crypto_offset = 0;
-   pd->pd_ctl.w = 0;
-   pd->pd_ctl.bf.hash_final =
-   (crypto_tfm_alg_type(req->tfm) == CRYPTO_ALG_TYPE_AHASH);
-   pd->pd_ctl.bf.host_ready = 1;
+   pd->pd_ctl.w = PD_CTL_HOST_READY |
+   ((crypto_tfm_alg_type(req->tfm) == CRYPTO_ALG_TYPE_AHASH) |
+(crypto_tfm_alg_type(req->tfm) == CRYPTO_ALG_TYPE_AEAD) ?
+   PD_CTL_HASH_FINAL : 0);
pd->pd_ctl_len.w = 0x0040 | datalen;
pd_uinfo->state = PD_ENTRY_INUSE | (is_busy ? PD_ENTRY_BUSY : 0);
 
wmb();
/* write any value to push engine to read a pd */
+   writel(0, dev->ce_base + CRYPTO4XX_INT_DESCR_RD);
writel(1, dev->ce_base + CRYPTO4XX_INT_DESCR_RD);
return is_busy ? -EBUSY : -EINPROGRESS;
 }
@@ -964,23 +967,23 @@ static void crypto4xx_bh_tasklet_cb(unsigned long data)
struct crypto4xx_core_device *core_dev = dev_get_drvdata(dev);
struct pd_uinfo *pd_uinfo;
struct ce_pd *pd;
-   u32 tail;
+   u32 tail = core_dev->dev->pdr_tail;
+   u32 head = core_dev->dev->pdr_head;
 
-   while (core_dev->dev->pdr_head != core_dev->dev->pdr_tail) {
-   tail = core_dev->dev->pdr_tail;
+   do {
pd_uinfo = _dev->dev->pdr_uinfo[tail];
pd = _dev->dev->pdr[tail];
if ((pd_uinfo->state & PD_ENTRY_INUSE) &&
-  pd->pd_ctl.bf.pe_done &&
-  !pd->pd_ctl.bf.host_ready) {
-   pd->pd_ctl.bf.pe_done = 0;
+((READ_ONCE(pd->pd_ctl.w) &
+  (PD_CTL_PE_DONE | PD_CTL_HOST_READY)) ==
+  PD_CTL_PE_DONE)) {
crypto4xx_pd_done(core_dev->dev, tail);
-   crypto4xx_put_pd_to_pdr(core_dev->dev, tail);
+   tail = crypto4xx_put_pd_to_pdr(core_dev->dev, tail);
} else {
/* if tail not done, break */
break;
}
-   }
+   } while (head != tail);
 }
 
 /**
diff --git a/drivers/crypto/amcc/crypto4xx_reg_def.h 
b/drivers/crypto/amcc/crypto4xx_reg_def.h
index 279b8725559f..0a22ec5d1a96 100644
--- a/drivers/crypto/amcc/crypto4xx_reg_def.h
+++ b/drivers/crypto/amcc/crypto4xx_reg_def.h
@@ -261,6 +261,9 @@ union ce_pd_ctl {
} bf;
u32 w;
 } __attribute__((packed));
+#define PD_CTL_HASH_FINAL  BIT(4)
+#define PD_CTL_PE_DONE BIT(1)
+#define PD_CTL_HOST_READY  BIT(0)
 
 union ce_pd_ctl_len {
struct {
-- 
2.14.2



[RFC 12/13] crypto: crypto4xx: add aes-ccm support

2017-10-03 Thread Christian Lamparter
This patch adds aes-ccm support.

Signed-off-by: Christian Lamparter 
---
 drivers/crypto/amcc/crypto4xx_alg.c  | 185 +++
 drivers/crypto/amcc/crypto4xx_core.c |  23 +
 drivers/crypto/amcc/crypto4xx_core.h |   8 ++
 3 files changed, 216 insertions(+)

diff --git a/drivers/crypto/amcc/crypto4xx_alg.c 
b/drivers/crypto/amcc/crypto4xx_alg.c
index dd4241a5bf56..b1c4783feab9 100644
--- a/drivers/crypto/amcc/crypto4xx_alg.c
+++ b/drivers/crypto/amcc/crypto4xx_alg.c
@@ -231,6 +231,191 @@ int crypto4xx_rfc3686_decrypt(struct ablkcipher_request 
*req)
  ctx->sa_out, ctx->sa_len, 0);
 }
 
+static inline bool crypto4xx_aead_need_fallback(struct aead_request *req,
+   bool is_ccm, bool decrypt)
+{
+   struct crypto_aead *aead = crypto_aead_reqtfm(req);
+
+   /* authsize has to be a multiple of 4 */
+   if (aead->authsize & 3)
+   return true;
+
+   /*
+* hardware does not handle cases where cryptlen
+* is less than a block
+*/
+   if (req->cryptlen < AES_BLOCK_SIZE)
+   return true;
+
+   /* assoc len needs to be a multiple of 4 */
+   if (req->assoclen & 0x3)
+   return true;
+
+   /* CCM supports only counter field length of 2 and 4 bytes */
+   if (is_ccm && !(req->iv[0] == 1 || req->iv[0] == 3))
+   return true;
+
+   /* CCM - fix CBC MAC mismatch in special case */
+   if (is_ccm && decrypt && !req->assoclen)
+   return true;
+
+   return false;
+}
+
+static int crypto4xx_aead_fallback(struct aead_request *req,
+   struct crypto4xx_ctx *ctx, bool do_decrypt)
+{
+   char aead_req_data[sizeof(struct aead_request) +
+  crypto_aead_reqsize(ctx->sw_cipher.aead)]
+   __aligned(__alignof__(struct aead_request));
+
+   struct aead_request *subreq = (void *) aead_req_data;
+
+   memset(subreq, 0, sizeof(aead_req_data));
+
+   aead_request_set_tfm(subreq, ctx->sw_cipher.aead);
+   aead_request_set_callback(subreq, req->base.flags,
+ req->base.complete, req->base.data);
+   aead_request_set_crypt(subreq, req->src, req->dst, req->cryptlen,
+  req->iv);
+   aead_request_set_ad(subreq, req->assoclen);
+   return do_decrypt ? crypto_aead_decrypt(subreq) :
+   crypto_aead_encrypt(subreq);
+}
+
+static int crypto4xx_setup_fallback(struct crypto4xx_ctx *ctx,
+   struct crypto_aead *cipher,
+   const u8 *key,
+   unsigned int keylen)
+{
+   int rc;
+
+   crypto_aead_clear_flags(ctx->sw_cipher.aead, CRYPTO_TFM_REQ_MASK);
+   crypto_aead_set_flags(ctx->sw_cipher.aead,
+   crypto_aead_get_flags(cipher) & CRYPTO_TFM_REQ_MASK);
+   rc = crypto_aead_setkey(ctx->sw_cipher.aead, key, keylen);
+   crypto_aead_clear_flags(cipher, CRYPTO_TFM_RES_MASK);
+   crypto_aead_set_flags(cipher,
+   crypto_aead_get_flags(ctx->sw_cipher.aead) &
+   CRYPTO_TFM_RES_MASK);
+
+   return rc;
+}
+
+/**
+ * AES-CCM Functions
+ */
+
+int crypto4xx_setkey_aes_ccm(struct crypto_aead *cipher, const u8 *key,
+unsigned int keylen)
+{
+   struct crypto_tfm *tfm = crypto_aead_tfm(cipher);
+   struct crypto4xx_ctx *ctx = crypto_tfm_ctx(tfm);
+   struct dynamic_sa_ctl *sa;
+   int rc = 0;
+
+   rc = crypto4xx_setup_fallback(ctx, cipher, key, keylen);
+   if (rc)
+   return rc;
+
+   if (ctx->sa_in || ctx->sa_out)
+   crypto4xx_free_sa(ctx);
+
+   rc = crypto4xx_alloc_sa(ctx, SA_AES128_CCM_LEN + (keylen - 16) / 4);
+   if (rc)
+   return rc;
+
+   /* Setup SA */
+   sa = (struct dynamic_sa_ctl *) ctx->sa_in;
+   sa->sa_contents.w = SA_AES_CCM_CONTENTS | (keylen << 2);
+
+   set_dynamic_sa_command_0(sa, SA_NOT_SAVE_HASH, SA_NOT_SAVE_IV,
+SA_LOAD_HASH_FROM_SA, SA_LOAD_IV_FROM_STATE,
+SA_NO_HEADER_PROC, SA_HASH_ALG_CBC_MAC,
+SA_CIPHER_ALG_AES,
+SA_PAD_TYPE_ZERO, SA_OP_GROUP_BASIC,
+SA_OPCODE_HASH_DECRYPT, DIR_INBOUND);
+
+   set_dynamic_sa_command_1(sa, CRYPTO_MODE_CTR, SA_HASH_MODE_HASH,
+CRYPTO_FEEDBACK_MODE_NO_FB, SA_EXTENDED_SN_OFF,
+SA_SEQ_MASK_OFF, SA_MC_ENABLE,
+SA_NOT_COPY_PAD, SA_COPY_PAYLOAD,
+SA_NOT_COPY_HDR);
+
+   sa->sa_command_1.bf.key_len = keylen >> 3;
+
+   crypto4xx_memcpy_to_le32(get_dynamic_sa_key_field(sa), key, keylen);
+
+   memcpy(ctx->sa_out, ctx->sa_in, 

[RFC 11/13] crypto: crypto4xx: prepare for AEAD support

2017-10-03 Thread Christian Lamparter
This patch enhances existing interfaces and
functions to support AEAD ciphers in the next
patches.

Signed-off-by: Christian Lamparter 
---
 drivers/crypto/Kconfig   |   4 +
 drivers/crypto/amcc/crypto4xx_alg.c  |  19 +--
 drivers/crypto/amcc/crypto4xx_core.c | 217 +++
 drivers/crypto/amcc/crypto4xx_core.h |  22 ++--
 drivers/crypto/amcc/crypto4xx_sa.h   |  41 +++
 5 files changed, 230 insertions(+), 73 deletions(-)

diff --git a/drivers/crypto/Kconfig b/drivers/crypto/Kconfig
index fe33c199fc1a..de825b354fdf 100644
--- a/drivers/crypto/Kconfig
+++ b/drivers/crypto/Kconfig
@@ -315,6 +315,10 @@ config CRYPTO_DEV_PPC4XX
tristate "Driver AMCC PPC4xx crypto accelerator"
depends on PPC && 4xx
select CRYPTO_HASH
+   select CRYPTO_AEAD
+   select CRYPTO_AES
+   select CRYPTO_CCM
+   select CRYPTO_GCM
select CRYPTO_BLKCIPHER
help
  This option allows you to have support for AMCC crypto acceleration.
diff --git a/drivers/crypto/amcc/crypto4xx_alg.c 
b/drivers/crypto/amcc/crypto4xx_alg.c
index 22875ec2b2c8..dd4241a5bf56 100644
--- a/drivers/crypto/amcc/crypto4xx_alg.c
+++ b/drivers/crypto/amcc/crypto4xx_alg.c
@@ -26,6 +26,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -83,7 +84,7 @@ int crypto4xx_encrypt(struct ablkcipher_request *req)
crypto4xx_memcpy_to_le32(iv, req->info, ivlen);
 
return crypto4xx_build_pd(>base, ctx, req->src, req->dst,
-   req->nbytes, iv, ivlen, ctx->sa_out, ctx->sa_len);
+   req->nbytes, iv, ivlen, ctx->sa_out, ctx->sa_len, 0);
 }
 
 int crypto4xx_decrypt(struct ablkcipher_request *req)
@@ -97,7 +98,7 @@ int crypto4xx_decrypt(struct ablkcipher_request *req)
crypto4xx_memcpy_to_le32(iv, req->info, ivlen);
 
return crypto4xx_build_pd(>base, ctx, req->src, req->dst,
-   req->nbytes, iv, ivlen, ctx->sa_in, ctx->sa_len);
+   req->nbytes, iv, ivlen, ctx->sa_in, ctx->sa_len, 0);
 }
 
 /**
@@ -213,7 +214,7 @@ int crypto4xx_rfc3686_encrypt(struct ablkcipher_request 
*req)
 
return crypto4xx_build_pd(>base, ctx, req->src, req->dst,
  req->nbytes, iv, AES_IV_SIZE,
- ctx->sa_out, ctx->sa_len);
+ ctx->sa_out, ctx->sa_len, 0);
 }
 
 int crypto4xx_rfc3686_decrypt(struct ablkcipher_request *req)
@@ -227,7 +228,7 @@ int crypto4xx_rfc3686_decrypt(struct ablkcipher_request 
*req)
 
return crypto4xx_build_pd(>base, ctx, req->src, req->dst,
  req->nbytes, iv, AES_IV_SIZE,
- ctx->sa_out, ctx->sa_len);
+ ctx->sa_out, ctx->sa_len, 0);
 }
 
 /**
@@ -239,11 +240,13 @@ static int crypto4xx_hash_alg_init(struct crypto_tfm *tfm,
   unsigned char hm)
 {
struct crypto_alg *alg = tfm->__crt_alg;
-   struct crypto4xx_alg *my_alg = crypto_alg_to_crypto4xx_alg(alg);
+   struct crypto4xx_alg *my_alg;
struct crypto4xx_ctx *ctx = crypto_tfm_ctx(tfm);
struct dynamic_sa_hash160 *sa;
int rc;
 
+   my_alg = container_of(__crypto_ahash_alg(alg), struct crypto4xx_alg,
+ alg.u.hash);
ctx->dev   = my_alg->dev;
 
/* Create SA */
@@ -300,7 +303,7 @@ int crypto4xx_hash_update(struct ahash_request *req)
 
return crypto4xx_build_pd(>base, ctx, req->src, ,
  req->nbytes, NULL, 0, ctx->sa_in,
- ctx->sa_len);
+ ctx->sa_len, 0);
 }
 
 int crypto4xx_hash_final(struct ahash_request *req)
@@ -319,7 +322,7 @@ int crypto4xx_hash_digest(struct ahash_request *req)
 
return crypto4xx_build_pd(>base, ctx, req->src, ,
  req->nbytes, NULL, 0, ctx->sa_in,
- ctx->sa_len);
+ ctx->sa_len, 0);
 }
 
 /**
@@ -330,5 +333,3 @@ int crypto4xx_sha1_alg_init(struct crypto_tfm *tfm)
return crypto4xx_hash_alg_init(tfm, SA_HASH160_LEN, SA_HASH_ALG_SHA1,
   SA_HASH_MODE_HASH);
 }
-
-
diff --git a/drivers/crypto/amcc/crypto4xx_core.c 
b/drivers/crypto/amcc/crypto4xx_core.c
index 55a4dd8984c7..b5108259f1a6 100644
--- a/drivers/crypto/amcc/crypto4xx_core.c
+++ b/drivers/crypto/amcc/crypto4xx_core.c
@@ -35,10 +35,12 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
 #include 
+#include 
 #include 
 #include "crypto4xx_reg_def.h"
 #include "crypto4xx_core.h"
@@ -518,7 +520,7 @@ static void crypto4xx_ret_sg_desc(struct crypto4xx_device 
*dev,
}
 }
 
-static u32 crypto4xx_ablkcipher_done(struct crypto4xx_device *dev,
+static void crypto4xx_ablkcipher_done(struct crypto4xx_device *dev,

[PATCH 05/13] crypto: crypto4xx: add backlog queue support

2017-10-03 Thread Christian Lamparter
Previously, If the crypto4xx driver used all available
security contexts, it would simply refuse new requests
with -EAGAIN. CRYPTO_TFM_REQ_MAY_BACKLOG was ignored.

in case of dm-crypt.c's crypt_convert() function this was
causing the following errors to manifest, if the system was
pushed hard enough:

| EXT4-fs warning (dm-1): ext4_end_bio:314: I/O error -5 writing to ino ..
| EXT4-fs warning (dm-1): ext4_end_bio:314: I/O error -5 writing to ino ..
| EXT4-fs warning (dm-1): ext4_end_bio:314: I/O error -5 writing to ino ..
| JBD2: Detected IO errors while flushing file data on dm-1-8
| Aborting journal on device dm-1-8.
| EXT4-fs error : ext4_journal_check_start:56: Detected aborted journal
| EXT4-fs (dm-1): Remounting filesystem read-only
| EXT4-fs : ext4_writepages: jbd2_start: 2048 pages, inode 498...; err -30

(This did cause corruptions due to failed writes)

To fix this mess, the crypto4xx driver needs to notifiy the
user to slow down. This can be achieved by returning -EBUSY
on requests, once the crypto hardware was falling behind.

Note: -EBUSY has two different meanings. Setting the flag
CRYPTO_TFM_REQ_MAY_BACKLOG implies that the request was
successfully queued, by the crypto driver. To achieve this
requirement, the implementation introduces a threshold check and
adds logic to the completion routines in much the same way as
AMD's Cryptographic Coprocessor (CCP) driver do.

Note2: Tests showed that dm-crypt starved ipsec traffic.
Under load, ipsec links dropped to 0 Kbits/s. This is because
dm-crypt's callback would instantly queue the next request.
In order to not starve ipsec, the driver reserves a small
portion of the available crypto contexts for this purpose.

Signed-off-by: Christian Lamparter 
---
 drivers/crypto/amcc/crypto4xx_core.c | 47 ++--
 drivers/crypto/amcc/crypto4xx_core.h |  3 ++-
 2 files changed, 41 insertions(+), 9 deletions(-)

diff --git a/drivers/crypto/amcc/crypto4xx_core.c 
b/drivers/crypto/amcc/crypto4xx_core.c
index cb45365166ae..abdf1db9b0eb 100644
--- a/drivers/crypto/amcc/crypto4xx_core.c
+++ b/drivers/crypto/amcc/crypto4xx_core.c
@@ -39,6 +39,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "crypto4xx_reg_def.h"
 #include "crypto4xx_core.h"
 #include "crypto4xx_sa.h"
@@ -573,8 +574,10 @@ static u32 crypto4xx_ablkcipher_done(struct 
crypto4xx_device *dev,
dst->offset, dst->length, DMA_FROM_DEVICE);
}
crypto4xx_ret_sg_desc(dev, pd_uinfo);
-   if (ablk_req->base.complete != NULL)
-   ablk_req->base.complete(_req->base, 0);
+
+   if (pd_uinfo->state & PD_ENTRY_BUSY)
+   ablkcipher_request_complete(ablk_req, -EINPROGRESS);
+   ablkcipher_request_complete(ablk_req, 0);
 
return 0;
 }
@@ -591,9 +594,10 @@ static u32 crypto4xx_ahash_done(struct crypto4xx_device 
*dev,
crypto4xx_copy_digest_to_dst(pd_uinfo,
 crypto_tfm_ctx(ahash_req->base.tfm));
crypto4xx_ret_sg_desc(dev, pd_uinfo);
-   /* call user provided callback function x */
-   if (ahash_req->base.complete != NULL)
-   ahash_req->base.complete(_req->base, 0);
+
+   if (pd_uinfo->state & PD_ENTRY_BUSY)
+   ahash_request_complete(ahash_req, -EINPROGRESS);
+   ahash_request_complete(ahash_req, 0);
 
return 0;
 }
@@ -704,6 +708,7 @@ u32 crypto4xx_build_pd(struct crypto_async_request *req,
struct pd_uinfo *pd_uinfo = NULL;
unsigned int nbytes = datalen, idx;
u32 gd_idx = 0;
+   bool is_busy;
 
/* figure how many gd is needed */
num_gd = sg_nents_for_len(src, datalen);
@@ -734,6 +739,31 @@ u32 crypto4xx_build_pd(struct crypto_async_request *req,
 * already got must be return the original place.
 */
spin_lock_irqsave(>core_dev->lock, flags);
+   /*
+* Let the caller know to slow down, once more than 13/16ths = 81%
+* of the available data contexts are being used simultaneously.
+*
+* With PPC4XX_NUM_PD = 256, this will leave a "backlog queue" for
+* 31 more contexts. Before new requests have to be rejected.
+*/
+   if (req->flags & CRYPTO_TFM_REQ_MAY_BACKLOG) {
+   is_busy = ((dev->pdr_head - dev->pdr_tail) % PPC4XX_NUM_PD) >=
+   ((PPC4XX_NUM_PD * 13) / 16);
+   } else {
+   /*
+* To fix contention issues between ipsec (no blacklog) and
+* dm-crypto (backlog) reserve 32 entries for "no backlog"
+* data contexts.
+*/
+   is_busy = ((dev->pdr_head - dev->pdr_tail) % PPC4XX_NUM_PD) >=
+   ((PPC4XX_NUM_PD * 15) / 16);
+
+   if (is_busy) {
+   spin_unlock_irqrestore(>core_dev->lock, flags);
+   return -EBUSY;
+   }
+   }
+
if (num_gd) {
  

[PATCH 01/13] crypto: crypto4xx: wire up hmac_mc to hmac_muting

2017-10-03 Thread Christian Lamparter
The hmac_mc parameter of set_dynamic_sa_command_1()
was defined but not used. On closer inspection it
turns out, it was never wired up.

Signed-off-by: Christian Lamparter 
---
 drivers/crypto/amcc/crypto4xx_alg.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/crypto/amcc/crypto4xx_alg.c 
b/drivers/crypto/amcc/crypto4xx_alg.c
index d08e4c94abed..57b1dcef4cb4 100644
--- a/drivers/crypto/amcc/crypto4xx_alg.c
+++ b/drivers/crypto/amcc/crypto4xx_alg.c
@@ -63,6 +63,7 @@ static void set_dynamic_sa_command_1(struct dynamic_sa_ctl 
*sa, u32 cm,
sa->sa_command_1.bf.crypto_mode9_8 = cm & 3;
sa->sa_command_1.bf.feedback_mode = cfb,
sa->sa_command_1.bf.sa_rev = 1;
+   sa->sa_command_1.bf.hmac_muting = hmac_mc;
sa->sa_command_1.bf.extended_seq_num = esn;
sa->sa_command_1.bf.seq_num_mask = sn_mask;
sa->sa_command_1.bf.mutable_bit_proc = mute;
-- 
2.14.2



[PATCH 10/13] crypto: crypto4xx: simplify sa and state context acquisition

2017-10-03 Thread Christian Lamparter
Thanks to the big overhaul of crypto4xx_build_pd(), the request-local
sa_in, sa_out and state_record allocation can be simplified.

There's no need to setup any dma coherent memory anymore and
much of the support code can be removed.

Signed-off-by: Christian Lamparter 
---
 drivers/crypto/amcc/crypto4xx_alg.c  | 27 +--
 drivers/crypto/amcc/crypto4xx_core.c | 50 ++--
 drivers/crypto/amcc/crypto4xx_core.h |  6 +
 3 files changed, 15 insertions(+), 68 deletions(-)

diff --git a/drivers/crypto/amcc/crypto4xx_alg.c 
b/drivers/crypto/amcc/crypto4xx_alg.c
index 195445310f0c..22875ec2b2c8 100644
--- a/drivers/crypto/amcc/crypto4xx_alg.c
+++ b/drivers/crypto/amcc/crypto4xx_alg.c
@@ -122,20 +122,13 @@ static int crypto4xx_setkey_aes(struct crypto_ablkcipher 
*cipher,
}
 
/* Create SA */
-   if (ctx->sa_in_dma_addr || ctx->sa_out_dma_addr)
+   if (ctx->sa_in || ctx->sa_out)
crypto4xx_free_sa(ctx);
 
rc = crypto4xx_alloc_sa(ctx, SA_AES128_LEN + (keylen-16) / 4);
if (rc)
return rc;
 
-   if (ctx->state_record_dma_addr == 0) {
-   rc = crypto4xx_alloc_state_record(ctx);
-   if (rc) {
-   crypto4xx_free_sa(ctx);
-   return rc;
-   }
-   }
/* Setup SA */
sa = ctx->sa_in;
 
@@ -203,8 +196,8 @@ int crypto4xx_setkey_rfc3686(struct crypto_ablkcipher 
*cipher,
if (rc)
return rc;
 
-   crypto4xx_memcpy_to_le32(ctx->state_record->save_iv,
-   key + keylen - CTR_RFC3686_NONCE_SIZE, CTR_RFC3686_NONCE_SIZE);
+   ctx->iv_nonce = cpu_to_le32p((u32 *)[keylen -
+CTR_RFC3686_NONCE_SIZE]);
 
return 0;
 }
@@ -213,7 +206,7 @@ int crypto4xx_rfc3686_encrypt(struct ablkcipher_request 
*req)
 {
struct crypto4xx_ctx *ctx = crypto_tfm_ctx(req->base.tfm);
__le32 iv[AES_IV_SIZE / 4] = {
-   ctx->state_record->save_iv[0],
+   ctx->iv_nonce,
cpu_to_le32p((u32 *) req->info),
cpu_to_le32p((u32 *) (req->info + 4)),
cpu_to_le32(1) };
@@ -227,7 +220,7 @@ int crypto4xx_rfc3686_decrypt(struct ablkcipher_request 
*req)
 {
struct crypto4xx_ctx *ctx = crypto_tfm_ctx(req->base.tfm);
__le32 iv[AES_IV_SIZE / 4] = {
-   ctx->state_record->save_iv[0],
+   ctx->iv_nonce,
cpu_to_le32p((u32 *) req->info),
cpu_to_le32p((u32 *) (req->info + 4)),
cpu_to_le32(1) };
@@ -254,21 +247,13 @@ static int crypto4xx_hash_alg_init(struct crypto_tfm *tfm,
ctx->dev   = my_alg->dev;
 
/* Create SA */
-   if (ctx->sa_in_dma_addr || ctx->sa_out_dma_addr)
+   if (ctx->sa_in || ctx->sa_out)
crypto4xx_free_sa(ctx);
 
rc = crypto4xx_alloc_sa(ctx, sa_len);
if (rc)
return rc;
 
-   if (ctx->state_record_dma_addr == 0) {
-   crypto4xx_alloc_state_record(ctx);
-   if (!ctx->state_record_dma_addr) {
-   crypto4xx_free_sa(ctx);
-   return -ENOMEM;
-   }
-   }
-
crypto_ahash_set_reqsize(__crypto_ahash_cast(tfm),
 sizeof(struct crypto4xx_ctx));
sa = (struct dynamic_sa_hash160 *)ctx->sa_in;
diff --git a/drivers/crypto/amcc/crypto4xx_core.c 
b/drivers/crypto/amcc/crypto4xx_core.c
index de9044201a23..55a4dd8984c7 100644
--- a/drivers/crypto/amcc/crypto4xx_core.c
+++ b/drivers/crypto/amcc/crypto4xx_core.c
@@ -130,21 +130,17 @@ static void crypto4xx_hw_init(struct crypto4xx_device 
*dev)
 
 int crypto4xx_alloc_sa(struct crypto4xx_ctx *ctx, u32 size)
 {
-   ctx->sa_in = dma_alloc_coherent(ctx->dev->core_dev->device, size * 4,
-   >sa_in_dma_addr, GFP_ATOMIC);
+   ctx->sa_in = kzalloc(size * 4, GFP_ATOMIC);
if (ctx->sa_in == NULL)
return -ENOMEM;
 
-   ctx->sa_out = dma_alloc_coherent(ctx->dev->core_dev->device, size * 4,
->sa_out_dma_addr, GFP_ATOMIC);
+   ctx->sa_out = kzalloc(size * 4, GFP_ATOMIC);
if (ctx->sa_out == NULL) {
-   dma_free_coherent(ctx->dev->core_dev->device, size * 4,
- ctx->sa_in, ctx->sa_in_dma_addr);
+   kfree(ctx->sa_in);
+   ctx->sa_in = NULL;
return -ENOMEM;
}
 
-   memset(ctx->sa_in, 0, size * 4);
-   memset(ctx->sa_out, 0, size * 4);
ctx->sa_len = size;
 
return 0;
@@ -152,40 +148,13 @@ int crypto4xx_alloc_sa(struct crypto4xx_ctx *ctx, u32 
size)
 
 void crypto4xx_free_sa(struct crypto4xx_ctx *ctx)
 {
-   if (ctx->sa_in != NULL)
-   dma_free_coherent(ctx->dev->core_dev->device, ctx->sa_len * 4,
-

[PATCH 08/13] crypto: crypto4xx: fix various warnings

2017-10-03 Thread Christian Lamparter
crypto4xx_core.c:179:6: warning: symbol 'crypto4xx_free_state_record'
was not declared. Should it be static?
crypto4xx_core.c:331:5: warning: symbol 'crypto4xx_get_n_gd'
was not declared. Should it be static?
crypto4xx_core.c:652:6: warning: symbol 'crypto4xx_return_pd'
was not declared. Should it be static?

crypto4xx_return_pd() is not used by anything. Therefore it is removed.

Signed-off-by: Christian Lamparter 
---
 drivers/crypto/amcc/crypto4xx_core.c | 16 +++-
 1 file changed, 3 insertions(+), 13 deletions(-)

diff --git a/drivers/crypto/amcc/crypto4xx_core.c 
b/drivers/crypto/amcc/crypto4xx_core.c
index 254dc61c91a6..1752ea2125db 100644
--- a/drivers/crypto/amcc/crypto4xx_core.c
+++ b/drivers/crypto/amcc/crypto4xx_core.c
@@ -176,7 +176,7 @@ u32 crypto4xx_alloc_state_record(struct crypto4xx_ctx *ctx)
return 0;
 }
 
-void crypto4xx_free_state_record(struct crypto4xx_ctx *ctx)
+static void crypto4xx_free_state_record(struct crypto4xx_ctx *ctx)
 {
if (ctx->state_record != NULL)
dma_free_coherent(ctx->dev->core_dev->device,
@@ -322,10 +322,11 @@ static inline void crypto4xx_destroy_gdr(struct 
crypto4xx_device *dev)
  * when this function is called.
  * preemption or interrupt must be disabled
  */
-u32 crypto4xx_get_n_gd(struct crypto4xx_device *dev, int n)
+static u32 crypto4xx_get_n_gd(struct crypto4xx_device *dev, int n)
 {
u32 retval;
u32 tmp;
+
if (n >= PPC4XX_NUM_GD)
return ERING_WAS_FULL;
 
@@ -616,17 +617,6 @@ static void crypto4xx_stop_all(struct 
crypto4xx_core_device *core_dev)
kfree(core_dev);
 }
 
-void crypto4xx_return_pd(struct crypto4xx_device *dev,
-u32 pd_entry, struct ce_pd *pd,
-struct pd_uinfo *pd_uinfo)
-{
-   /* irq should be already disabled */
-   dev->pdr_head = pd_entry;
-   pd->pd_ctl.w = 0;
-   pd->pd_ctl_len.w = 0;
-   pd_uinfo->state = PD_ENTRY_FREE;
-}
-
 static u32 get_next_gd(u32 current)
 {
if (current != PPC4XX_LAST_GD)
-- 
2.14.2



[RFC 13/13] crypto: crypto4xx: add aes-gcm support

2017-10-03 Thread Christian Lamparter
This patch adds aes-gcm support to crypto4xx.

Signed-off-by: Christian Lamparter 
---
 drivers/crypto/amcc/crypto4xx_alg.c  | 139 +++
 drivers/crypto/amcc/crypto4xx_core.c |  22 ++
 drivers/crypto/amcc/crypto4xx_core.h |   4 +
 3 files changed, 165 insertions(+)

diff --git a/drivers/crypto/amcc/crypto4xx_alg.c 
b/drivers/crypto/amcc/crypto4xx_alg.c
index b1c4783feab9..eeaf27859d80 100644
--- a/drivers/crypto/amcc/crypto4xx_alg.c
+++ b/drivers/crypto/amcc/crypto4xx_alg.c
@@ -28,6 +28,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include "crypto4xx_reg_def.h"
@@ -416,6 +417,144 @@ int crypto4xx_setauthsize_aead(struct crypto_aead *cipher,
return crypto_aead_setauthsize(ctx->sw_cipher.aead, authsize);
 }
 
+/**
+ * AES-GCM Functions
+ */
+
+static int crypto4xx_aes_gcm_validate_keylen(unsigned int keylen)
+{
+   switch (keylen) {
+   case 16:
+   case 24:
+   case 32:
+   return 0;
+   default:
+   return -EINVAL;
+   }
+}
+
+static int crypto4xx_compute_gcm_hash_key_sw(__le32 *hash_start, const u8 *key,
+unsigned int keylen)
+{
+   struct crypto_cipher *aes_tfm = NULL;
+   uint8_t src[16] = { 0 };
+   int rc = 0;
+
+   aes_tfm = crypto_alloc_cipher("aes", 0, CRYPTO_ALG_ASYNC |
+ CRYPTO_ALG_NEED_FALLBACK);
+   if (IS_ERR(aes_tfm)) {
+   rc = PTR_ERR(aes_tfm);
+   pr_warn("could not load aes cipher driver: %d\n", rc);
+   return rc;
+   }
+
+   rc = crypto_cipher_setkey(aes_tfm, key, keylen);
+   if (rc) {
+   pr_err("setkey() failed: %d\n", rc);
+   goto out;
+   }
+
+   crypto_cipher_encrypt_one(aes_tfm, src, src);
+   crypto4xx_memcpy_to_le32(hash_start, src, 16);
+out:
+   crypto_free_cipher(aes_tfm);
+   return rc;
+}
+
+int crypto4xx_setkey_aes_gcm(struct crypto_aead *cipher,
+const u8 *key, unsigned int keylen)
+{
+   struct crypto_tfm *tfm = crypto_aead_tfm(cipher);
+   struct crypto4xx_ctx *ctx = crypto_tfm_ctx(tfm);
+   struct dynamic_sa_ctl *sa;
+   intrc = 0;
+
+   if (crypto4xx_aes_gcm_validate_keylen(keylen) != 0) {
+   crypto_aead_set_flags(cipher, CRYPTO_TFM_RES_BAD_KEY_LEN);
+   return -EINVAL;
+   }
+
+   rc = crypto4xx_setup_fallback(ctx, cipher, key, keylen);
+   if (rc)
+   return rc;
+
+   if (ctx->sa_in || ctx->sa_out)
+   crypto4xx_free_sa(ctx);
+
+   rc = crypto4xx_alloc_sa(ctx, SA_AES128_GCM_LEN + (keylen - 16) / 4);
+   if (rc)
+   return rc;
+
+   sa  = (struct dynamic_sa_ctl *) ctx->sa_in;
+
+   sa->sa_contents.w = SA_AES_GCM_CONTENTS | (keylen << 2);
+   set_dynamic_sa_command_0(sa, SA_SAVE_HASH, SA_NOT_SAVE_IV,
+SA_LOAD_HASH_FROM_SA, SA_LOAD_IV_FROM_STATE,
+SA_NO_HEADER_PROC, SA_HASH_ALG_GHASH,
+SA_CIPHER_ALG_AES, SA_PAD_TYPE_ZERO,
+SA_OP_GROUP_BASIC, SA_OPCODE_HASH_DECRYPT,
+DIR_INBOUND);
+   set_dynamic_sa_command_1(sa, CRYPTO_MODE_CTR, SA_HASH_MODE_HASH,
+CRYPTO_FEEDBACK_MODE_NO_FB, SA_EXTENDED_SN_OFF,
+SA_SEQ_MASK_ON, SA_MC_DISABLE,
+SA_NOT_COPY_PAD, SA_COPY_PAYLOAD,
+SA_NOT_COPY_HDR);
+
+   sa->sa_command_1.bf.key_len = keylen >> 3;
+
+   crypto4xx_memcpy_to_le32(get_dynamic_sa_key_field(sa),
+key, keylen);
+
+   rc = crypto4xx_compute_gcm_hash_key_sw(get_dynamic_sa_inner_digest(sa),
+   key, keylen);
+   if (rc) {
+   pr_err("GCM hash key setting failed = %d\n", rc);
+   goto err;
+   }
+
+   memcpy(ctx->sa_out, ctx->sa_in, ctx->sa_len * 4);
+   sa = (struct dynamic_sa_ctl *) ctx->sa_out;
+   sa->sa_command_0.bf.dir = DIR_OUTBOUND;
+   sa->sa_command_0.bf.opcode = SA_OPCODE_ENCRYPT_HASH;
+
+   return 0;
+err:
+   crypto4xx_free_sa(ctx);
+   return rc;
+}
+
+static inline int crypto4xx_crypt_aes_gcm(struct aead_request *req,
+ bool decrypt)
+{
+   struct crypto4xx_ctx *ctx = crypto_tfm_ctx(req->base.tfm);
+   unsigned int len = req->cryptlen;
+   __le32 iv[4];
+
+   if (crypto4xx_aead_need_fallback(req, false, decrypt))
+   return crypto4xx_aead_fallback(req, ctx, decrypt);
+
+   crypto4xx_memcpy_to_le32(iv, req->iv, GCM_AES_IV_SIZE);
+   iv[3] = cpu_to_le32(1);
+
+   if (decrypt)
+   len -= crypto_aead_authsize(crypto_aead_reqtfm(req));
+
+   return crypto4xx_build_pd(>base, ctx, req->src, req->dst,
+   

Re: [PATCH V2] Fix a sleep-in-atomic bug in shash_setkey_unaligned

2017-10-03 Thread Marcelo Ricardo Leitner
On Tue, Oct 03, 2017 at 07:33:08PM -0300, Marcelo Ricardo Leitner wrote:
> On Tue, Oct 03, 2017 at 10:25:22AM +0800, Jia-Ju Bai wrote:
> > The SCTP program may sleep under a spinlock, and the function call path is:
> > sctp_generate_t3_rtx_event (acquire the spinlock)
> >   sctp_do_sm
> > sctp_side_effects
> >   sctp_cmd_interpreter
> > sctp_make_init_ack
> >   sctp_pack_cookie
> > crypto_shash_setkey
> >   shash_setkey_unaligned
> > kmalloc(GFP_KERNEL)
> 
> Are you sure this can happen?
> The host is not supposed to store any information when replying to an
> INIT packet (which generated the INIT_ACK listed above). That said,
> it's weird to see the timer function triggering so.
> 
> Checking now, that code is dead actually:
> $ git grep -A 2 SCTP_CMD_GEN_INIT_ACK
> sm_sideeffect.c:case SCTP_CMD_GEN_INIT_ACK:
> sm_sideeffect.c-/* Generate an INIT ACK chunk.
> */
> sm_sideeffect.c-new_obj =
> sctp_make_init_ack(asoc, chunk, GFP_ATOMIC,
> 
> Nobody is triggering a call to sctp_cmd_interpreter with
> SCTP_CMD_GEN_INIT_ACK command, which would generate the callstack
> above.

Nevertheless, the issue is real through other call paths.

Thanks,
Marcelo


Re: [PATCH V2] Fix a sleep-in-atomic bug in shash_setkey_unaligned

2017-10-03 Thread Marcelo Ricardo Leitner
On Tue, Oct 03, 2017 at 01:26:43PM +0800, Herbert Xu wrote:
> On Mon, Oct 02, 2017 at 09:18:24PM -0700, Andy Lutomirski wrote:
> > > On Oct 2, 2017, at 7:25 PM, Jia-Ju Bai  wrote:
> > >
> > > The SCTP program may sleep under a spinlock, and the function call path 
> > > is:
> > > sctp_generate_t3_rtx_event (acquire the spinlock)
> > >  sctp_do_sm
> > >sctp_side_effects
> > >  sctp_cmd_interpreter
> > >sctp_make_init_ack
> > >  sctp_pack_cookie
> > >crypto_shash_setkey
> > >  shash_setkey_unaligned
> > >kmalloc(GFP_KERNEL)
> > 
> > I'm going to go out on a limb here: why on Earth is out crypto API so
> > full of indirection that we allocate memory at all here?
> 
> The crypto API operates on a one key per-tfm basis.  So normally
> tfm allocation and key setting is done once only and not done on
> the data path.
> 
> I have looked at the SCTP code and it appears to fit this paradigm.
> That is, we should be able to allocate the tfm and set the key when
> the key is actually generated via get_random_bytes, rather than every
> time the key is used which is not only a waste but as you see runs
> into API issues.

Fair point, but

> 
> Usually if you're invoking setkey from a non-sleeping code-path
> you're probably doing something wrong.

Usually but not always. There are 3 calls to that function on SCTP
code:
- pack a cookie, which is sent on an INIT_ACK packet to the client
- unpack the cookie above, after it is sent back by the client on a
  COOKIE_ECHO packet
- send a chunk authenticated by a hash

the first two happen during softirq processing, while processing a
packet that was received.

As I explained on the other email, SCTP code is not supposed to store
any information about the peer between the 1st and the 2nd moments
above, to be less vulnerable to DoS attacks (it's planned so by the
RFC), thus why it uses the cookie.

The 3rd one we probably can improve, but I don't think we can do much
about the 2 first ones from the SCTP side.

Note on sctp_sf_do_5_1B_init() how sctp_make_init_ack() is explicitly
called with GFP_ATOMIC, and also on sctp_sf_do_unexpected_init().
Though we can't propagate that to crypto_shash_setkey.

Ideas?

Thanks,
Marcelo

> 
> As someone else noted recently, there is no single forum for
> reviewing code that uses the crypto API so buggy code like this
> is not surprising.
> 
> > We're synchronously computing a hash of a small amount of data using
> > either HMAC-SHA1 or HMAC-SHA256 (determined at runtime) if I read it
> > right.  There's a sane way to do this that doesn't need kmalloc,
> > alloca, or fancy indirection.  And then there's crypto_shash_xyz().
> 
> There are some legitimate cases where you want to use a different
> key for every hashing operation.  But so far these are uses have
> been very few so there has been no need to provide an API for them.
> 
> Cheers,
> -- 
> Email: Herbert Xu 
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
> --
> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


Re: [PATCH V2] Fix a sleep-in-atomic bug in shash_setkey_unaligned

2017-10-03 Thread Marcelo Ricardo Leitner
On Tue, Oct 03, 2017 at 10:25:22AM +0800, Jia-Ju Bai wrote:
> The SCTP program may sleep under a spinlock, and the function call path is:
> sctp_generate_t3_rtx_event (acquire the spinlock)
>   sctp_do_sm
> sctp_side_effects
>   sctp_cmd_interpreter
> sctp_make_init_ack
>   sctp_pack_cookie
> crypto_shash_setkey
>   shash_setkey_unaligned
> kmalloc(GFP_KERNEL)

Are you sure this can happen?
The host is not supposed to store any information when replying to an
INIT packet (which generated the INIT_ACK listed above). That said,
it's weird to see the timer function triggering so.

Checking now, that code is dead actually:
$ git grep -A 2 SCTP_CMD_GEN_INIT_ACK
sm_sideeffect.c:case SCTP_CMD_GEN_INIT_ACK:
sm_sideeffect.c-/* Generate an INIT ACK chunk.
*/
sm_sideeffect.c-new_obj =
sctp_make_init_ack(asoc, chunk, GFP_ATOMIC,

Nobody is triggering a call to sctp_cmd_interpreter with
SCTP_CMD_GEN_INIT_ACK command, which would generate the callstack
above.

  Marcelo


Re: [PATCH] iommu/vt-d: Fix scatterlist offset handling

2017-10-03 Thread Casey Leedom
| From: Harsh Jain 
| Sent: Tuesday, October 3, 2017 5:22 AM
|
| Hi Robin/Ashok,
|
| Find attached trace of DMA write error. I had a look on trace but didn't
| find anything suspicious.
|
| Let me know if you need more trace.

As a reminder, Harsh and Atul will be waking up in a few hours, so if there
are additional tests for which you'd like them to gather data, it would be
good to ask now so it's available to them to work on while we're all off
asleep ...

Casey


Re: [PATCH] iommu/vt-d: Fix scatterlist offset handling

2017-10-03 Thread David Woodhouse
On Tue, 2017-10-03 at 19:05 +0100, Robin Murphy wrote:
> 
> Now, there are indeed plenty of drivers and subsystems which do work on
> lists of explicitly single pages - anything doing some variant of
> "addr = kmap_atomic(sg_page(sg)) + sg->offset;" is easy to spot - but I
> don't think DMA API implementations are in a position to make any kind
> of assumption; nearly all of them just shut up and handle sg->length
> bytes from sg_phys(sg) without questioning the caller, and I reckon
> that's exactly what they should be doing.

So what's the point in sg->page in the first place? If even the
*offset* can be greater than page size, it isn't even the *first* page
(as you called it). Why aren't we just using a physical address,
instead of an arbitrary page and an offset from that?

Can we have *negative* sg->offset too? :)


smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH v3] crypto: s5p-sss: Add HASH support for Exynos

2017-10-03 Thread Krzysztof Kozlowski
On Tue, Oct 03, 2017 at 04:57:43PM +0200, Kamil Konieczny wrote:
 
> >> [...]
> >> +static struct ahash_alg algs_sha256[] = {
> >> +{
> >> +  .init   = s5p_hash_init,
> >> +  .update = s5p_hash_update,
> >> +  .final  = s5p_hash_final,
> >> +  .finup  = s5p_hash_finup,
> >> +  .digest = s5p_hash_digest,
> >> +  .halg.digestsize= SHA256_DIGEST_SIZE,
> >> +  .halg.base  = {
> >> +  .cra_name   = "sha256",
> >> +  .cra_driver_name= "exynos-sha256",
> >> +  .cra_priority   = 100,
> >> +  .cra_flags  = CRYPTO_ALG_TYPE_AHASH |
> >> +CRYPTO_ALG_KERN_DRIVER_ONLY |
> >> +CRYPTO_ALG_ASYNC |
> >> +CRYPTO_ALG_NEED_FALLBACK,
> >> +  .cra_blocksize  = HASH_BLOCK_SIZE,
> >> +  .cra_ctxsize= sizeof(struct s5p_hash_ctx),
> >> +  .cra_alignmask  = SSS_DMA_ALIGN_MASK,
> >> +  .cra_module = THIS_MODULE,
> >> +  .cra_init   = s5p_hash_cra_init,
> >> +  .cra_exit   = s5p_hash_cra_exit,
> >> +  }
> >> +}
> >> +
> >> +};
> >> +
> >> +static struct sss_hash_algs_info exynos_hash_algs_info[] = {
> > 
> > You have warnings in your code. Please be sure that all compiler,
> > Smatch, Sparse, checkpatch and coccicheck warnings are fixed.
> > 
> > ../drivers/crypto/s5p-sss.c:1896:34: warning: ‘exynos_hash_algs_info’ 
> > defined but not used [-Wunused-variable]
> >  static struct sss_hash_algs_info exynos_hash_algs_info[] = {
> > 
> > Probably this should be __maybe_unused.
> 
> You are right, I did not check this with EXYNOS_HASH disabled, I will
> rewrite it.
> 
> > Also this should be const. I do not understand why you have to add one
> > more static variable (which sticks the driver to only one instance...)
> > and then modify it during runtime. Everything should be stored in device
> > state container (s5p_aes_dev) - directly or through some other pointers.
> 
> There is .registered field which is incremented with each algo register.
> I can move assignes to fields .import, .export and .statesize into struct.
> When I tried add const, I got compiler warn:
> drivers/crypto/s5p-sss.c: In function ‘s5p_aes_remove’:
> drivers/crypto/s5p-sss.c:2397:6: warning: passing argument 1 of 
> ‘crypto_unregister_ahash’ discards ‘const’ qualifier from pointer target type 
> [-Wdiscarded-qualifiers]
>   _algs_i[i].algs_list[j]);
> so it was not designed to be const (in crypto framework).
> In AES code the alg struct is also static:
> static struct crypto_alg algs[] = {

The crypto_alg and ahash_alg must indeed stay non-const but
sss_hash_algs_info is different. You do not pass it to crypto-core.

> What you mean by 'stick the driver to only one instance' ? In Exynos 4412 
> there
> is only one SecSS block, in some other Exynos there is SlimSS, but it is not
> the same (it has lower capabilities and other io addresses), so there should 
> not
> be two s5p_aes_dev drivers loaded at the same time. 

Current driver matches hardware in one-to-one so indeed there cannot be
two s5p_aes_dev devices. However this might change thus almost every
driver tries to follow the pattern of state-container passed to device
(e.g. platform_set_drvdata()). With this approach the code is nicely
encapsulated and usually much easier to review. Globals (or file-scope
variables) usually makes code more difficult to maintain.

In this driver this is not entirely possible as some crypto-functions do
not allow passing driver-supplied opaque pointer. But except this case,
everywhere else the driver should follow common convention - do not use
static variables.


Best regards,
Krzysztof



Re: [PATCH] iommu/vt-d: Fix scatterlist offset handling

2017-10-03 Thread Robin Murphy
On 03/10/17 13:55, David Woodhouse wrote:
> On Thu, 2017-09-28 at 15:14 +0100, Robin Murphy wrote:
>> The intel-iommu DMA ops fail to correctly handle scatterlists where
>> sg->offset is greater than PAGE_SIZE - the IOVA allocation is computed
>> appropriately based on the page-aligned portion of the offset, but the
>> mapping is set up relative to sg->page, which means it fails to actually
>> cover the whole buffer (and in the worst case doesn't cover it at all):
>>
>>     (sg->dma_address + sg->dma_len) +
>>     sg->dma_address -+  |
>>     iov_pfn--+   |  |
>>  |   |  |
>>  v   v  v
>> iova:   a    b    c    d    e    f
>>     ||||||
>>   <...calculated>
>>  [_mapped__]
>> pfn:    0    1    2    3    4    5
>>     ||||||
>>  ^   ^  ^
>>  |   |  |
>>     sg->page +   |  |
>>     sg->offset --+  |
>>     (sg->offset + sg->length) --+
> 
> I'd still dearly love to see some clear documentation of what it means
> for sg->offset to be outside the page referenced by sg->page.

I think the key is that for each SG segment, sg->page doesn't
necessarily represent "a" page, but the first of one or more contiguous
pages. Disregarding offsets for the moment, Here's a typical example of
a 120KB buffer from the block layer as processed by iommu_dma_map_sg():

[   16.092649] == initial (4) ==
[   16.095591]  0: virt 81372000phys 0x81372000 dma 
0x
[   16.095591]  offset 0x   length 0xe000   dma_len 
0x
[   16.109541]  1: virt 8138phys 0x8138 dma 
0x
[   16.109541]  offset 0x   length 0xd000   dma_len 
0x
[   16.123491]  2: virt 8138e000phys 0x8138e000 dma 
0x
[   16.123491]  offset 0x   length 0x2000   dma_len 
0x
[   16.137440]  3: virt 8139phys 0x8139 dma 
0x
[   16.137440]  offset 0x   length 0x1000   dma_len 
0x
[   16.216167] == final   (2) ==
[   16.219106]  0: virt 81372000phys 0x81372000 dma 
0xffb6
[   16.219106]  offset 0x   length 0xe000   dma_len 
0xe000
[   16.233056]  1: virt 8138phys 0x8138 dma 
0xffb7
[   16.233056]  offset 0x   length 0xd000   dma_len 
0x0001

i.e. segments of 14 pages, 13 pages, 2 pages and 1 page respectively
(and we further merge the resulting DMA-contiguous segments on top of
that).

Now, there are indeed plenty of drivers and subsystems which do work on
lists of explicitly single pages - anything doing some variant of
"addr = kmap_atomic(sg_page(sg)) + sg->offset;" is easy to spot - but I
don't think DMA API implementations are in a position to make any kind
of assumption; nearly all of them just shut up and handle sg->length
bytes from sg_phys(sg) without questioning the caller, and I reckon
that's exactly what they should be doing.

> Or is it really not "outside", and it's *only* valid for the offset to
> be > PAGE_OFFSET when it's a huge page, so we can check that with a
> BUG_ON() ? 
> 
> In particular, I'd like to know what is intended in the Xen PV case,
> where there isn't a straight correspondence between pfn and mfn. Is the
> out-of-range sg->offset intended to refer to the next *pfn* after sg-
>> page, or to the next *mfn* after sg->page? 

Logically, it should mean the same thing as whatever a length of more
than 1 page means to Xen - judging by blkif_queue_rw_req() at least,
that seems to be a BUG_ON() in both cases.

> I confess I've only followed this thread vaguely, but I haven't seen a
> *coherent* explanation except in the huge page case (in which case I
> want to see that BUG_ON in the patch) of why this isn't just totally
> bogus.

As I've said before, I'd certainly consider it a denormalised case, but
not a bogus one, and certainly not something that is the DMA API's job
to police. Having now audited every dma_map_ops::map_sg implementation I
could find, the only ones not using sg_phys()/sg_virt() or some other
construction immune to the absolute offset value (MIPS even explicitly
normalises it) are intel-iommu and arch/frv, and the latter is clearly
broken anyway as it ignores sg->length.

Robin.


Re: [PATCH V2] Fix a sleep-in-atomic bug in shash_setkey_unaligned

2017-10-03 Thread Andy Lutomirski
On Mon, Oct 2, 2017 at 10:26 PM, Herbert Xu  wrote:
> On Mon, Oct 02, 2017 at 09:18:24PM -0700, Andy Lutomirski wrote:
>> > On Oct 2, 2017, at 7:25 PM, Jia-Ju Bai  wrote:
>> >
>> > The SCTP program may sleep under a spinlock, and the function call path is:
>> > sctp_generate_t3_rtx_event (acquire the spinlock)
>> >  sctp_do_sm
>> >sctp_side_effects
>> >  sctp_cmd_interpreter
>> >sctp_make_init_ack
>> >  sctp_pack_cookie
>> >crypto_shash_setkey
>> >  shash_setkey_unaligned
>> >kmalloc(GFP_KERNEL)
>>
>> I'm going to go out on a limb here: why on Earth is out crypto API so
>> full of indirection that we allocate memory at all here?
>
> The crypto API operates on a one key per-tfm basis.  So normally
> tfm allocation and key setting is done once only and not done on
> the data path.
>
> I have looked at the SCTP code and it appears to fit this paradigm.
> That is, we should be able to allocate the tfm and set the key when
> the key is actually generated via get_random_bytes, rather than every
> time the key is used which is not only a waste but as you see runs
> into API issues.

It's a waste because it loses a pre-computation advantage.

The fact that it has memory allocation issues is crypto API's fault,
full stop.  There is no legit reason to need to allocate anything.


Re: [PATCH v2] staging: ccree: Convert to platform_{get,set}_drvdata()

2017-10-03 Thread Greg KH
On Thu, Sep 21, 2017 at 05:47:42PM +0530, suni...@techveda.org wrote:
> From: Suniel Mahesh 
> 
> Platform devices are expected to use wrapper functions,
> platform_{get,set}_drvdata() with platform_device as argument,
> for getting and setting the driver data. dev_{get,set}_drvdata()
> are using _dev->dev.
> For wrapper functions we can directly pass a struct platform_device.
> 
> dev_set_drvdata() is redundant and therefore removed. The driver core
> clears the driver data to NULL after device_release or on probe failure.
> 
> Signed-off-by: Suniel Mahesh 
> ---
> Changes for v2:
> - Rebased on top of staging-testing.

Can you rebase again, this still does not apply :(


Re: [Part2 PATCH v4 05/29] crypto: ccp: Add Platform Security Processor (PSP) device support

2017-10-03 Thread Brijesh Singh



On 10/03/2017 11:17 AM, Borislav Petkov wrote:
...



No, please add my patch below to your set for the CRYPTO_DEV_CCP_DD
dependency as it is a separate thing. Your patch should concentrate only
on adding the PSP and its dependencies.



Sure, I will include your patch in my series. thanks



---
From: Borislav Petkov 
Date: Sat, 30 Sep 2017 10:06:27 +0200
Subject: [PATCH] crypto: ccp: Build the AMD secure processor driver only with
  AMD CPU support

This is AMD-specific hardware so present it in Kconfig only when AMD
CPU support is enabled or on ARM64 where it is also used.

Signed-off-by: Borislav Petkov 
Cc: Brijesh Singh 
Cc: Tom Lendacky 
Cc: Gary Hook 
Cc: Herbert Xu 
Cc: "David S. Miller" 
Cc: linux-crypto@vger.kernel.org
---
  drivers/crypto/ccp/Kconfig | 1 +
  1 file changed, 1 insertion(+)

diff --git a/drivers/crypto/ccp/Kconfig b/drivers/crypto/ccp/Kconfig
index 627f3e61dcac..f19f57162225 100644
--- a/drivers/crypto/ccp/Kconfig
+++ b/drivers/crypto/ccp/Kconfig
@@ -1,5 +1,6 @@
  config CRYPTO_DEV_CCP_DD
tristate "Secure Processor device driver"
+   depends on CPU_SUP_AMD || ARM64
default m
help
  Provides AMD Secure Processor device driver.



Re: [Part2 PATCH v4 05/29] crypto: ccp: Add Platform Security Processor (PSP) device support

2017-10-03 Thread Borislav Petkov
On Sun, Oct 01, 2017 at 03:05:11PM -0500, Brijesh Singh wrote:
> I think theoretically a 32-bit host OS can invoke a PSP commands but
> currently PSP interface is exposing only the SEV FW command. And SEV

Let's cross that bridge when we get to it.

> feature is available when we are in 64-bit mode hence for now its okay
> to have depends on X86_64. I will add CRYPTO_DEV_CCP_DD depend on
> CPU_SUP_AMD || ARM64 and CRYPTO_DEV_SP_PSP depend on X86_64 and send you
> v4.2.

No, please add my patch below to your set for the CRYPTO_DEV_CCP_DD
dependency as it is a separate thing. Your patch should concentrate only
on adding the PSP and its dependencies.

Thx.

---
From: Borislav Petkov 
Date: Sat, 30 Sep 2017 10:06:27 +0200
Subject: [PATCH] crypto: ccp: Build the AMD secure processor driver only with
 AMD CPU support

This is AMD-specific hardware so present it in Kconfig only when AMD
CPU support is enabled or on ARM64 where it is also used.

Signed-off-by: Borislav Petkov 
Cc: Brijesh Singh 
Cc: Tom Lendacky 
Cc: Gary Hook 
Cc: Herbert Xu 
Cc: "David S. Miller" 
Cc: linux-crypto@vger.kernel.org
---
 drivers/crypto/ccp/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/crypto/ccp/Kconfig b/drivers/crypto/ccp/Kconfig
index 627f3e61dcac..f19f57162225 100644
--- a/drivers/crypto/ccp/Kconfig
+++ b/drivers/crypto/ccp/Kconfig
@@ -1,5 +1,6 @@
 config CRYPTO_DEV_CCP_DD
tristate "Secure Processor device driver"
+   depends on CPU_SUP_AMD || ARM64
default m
help
  Provides AMD Secure Processor device driver.
-- 
2.13.0

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)
-- 


Re: [PATCH 1/3] crypto: dh_helper - return unsigned int for dh_data_size()

2017-10-03 Thread Tudor Ambarus

Hi, David,

On 10/03/2017 12:06 PM, David Howells wrote:

Tudor Ambarus  wrote:


-static inline int dh_data_size(const struct dh *p)
+static inline unsigned int dh_data_size(const struct dh *p)
  {
return p->key_size + p->p_size + p->g_size;
  }


If this is a problem, do you need to do range checking?


The algorithm does not impose any constraint in this direction, as far
as I'm aware of.

It's unnatural to return a signed integer in a function which just sums
unsigned integers. No checking is needed, the function should return the
unsigned result.

Cheers,
ta


Re: [PATCH v3] crypto: s5p-sss: Add HASH support for Exynos

2017-10-03 Thread Kamil Konieczny
On 30.09.2017 21:50, Krzysztof Kozlowski wrote:
> On Wed, Sep 27, 2017 at 02:25:50PM +0200, Kamil Konieczny wrote:
>> Add support for MD5, SHA1, SHA256 hash algorithms for Exynos HW.
>> It uses the crypto framework asynchronous hash api.
>> It is based on omap-sham.c driver.
>> S5P has some HW differencies and is not implemented.
>>
>> Modifications in s5p-sss:
>>[...]

>> +/*
>> + * HASH bit numbers, used by device, setting in dev->hash_flags with
>> + * functions set_bit(), clear_bit() or tested with test_bit() or BIT(),
>> + * to keep HASH state BUSY or FREE, or to signal state from irq_handler
>> + * to hash_tasklet. SGS keep track of allocated memory for scatterlist
>> + */
>> +#define HASH_FLAGS_BUSY 0
>> +#define HASH_FLAGS_FINAL1
>> +#define HASH_FLAGS_DMA_ACTIVE   2
>> +#define HASH_FLAGS_OUTPUT_READY 3
>> +#define HASH_FLAGS_DMA_READY4
>> +#define HASH_FLAGS_SGS_COPIED   5
>> +#define HASH_FLAGS_SGS_ALLOCED  6
>> +
>> +/*
>> + * HASH bit numbers used in request context
>> + * FINUP mark last hash operation
>> + */
>> +#define HASH_FLAGS_FINUP7
>> +#define HASH_FLAGS_ERROR8
> 
> I spent some time on s5p_hash_finish_req() and other code around flags,
> confused by two different flags (ctx->flags, device->hash_flags) and
> different API used to play with them next to each other (once test_bit,
> line later just |=).
> 
> This is just confusing. AFAIU, you use only two bits in ctx->flags, so
> just convert it to two bools. This will remove the confuse:
> 1. between the defines before and here,
> 2. around mixing xxx_bit() and regular |= operations.
> 

Good point, I will rewrite them into two bool vars.

>> +
>> +/* HASH op codes */
>> +#define HASH_OP_UPDATE  1
>> +#define HASH_OP_FINAL   2
>> +
>> +/* HASH HW constants */
>> +#define BUFLEN  HASH_BLOCK_SIZE
>> +
>> +#define SSS_DMA_ALIGN   16
>> +#define SSS_ALIGNED __attribute__((aligned(SSS_DMA_ALIGN)))
>> +#define SSS_DMA_ALIGN_MASK  (SSS_DMA_ALIGN - 1)
> 
> No changes here... I asked for making this consistent with current code
> so please bring a patch which introduces new macro to existing code and
> then re-use it for new code.
> 
> Dropping inconsistent code and then promising "I will fix it up later"
> does not work.

This align stuff was again taken from omap driver. AES code does not need
any DMA_ALIGN, it sets ".cra_align = 0x0f", but it is not needed, it can be
simple zero 0x00, as AES operate on blocks of 64 bytes long, and SecSS DMA
engine can operate on non-aligned addresses. I am not sure how CPU will
handle HASH corner case at last update, when input data will point to last
byte of page, length will be 1, and FeedControl will round up length to 8,
so DMA will try (?) reading bytes after page.
To sum it up, AES code needs cleanup ".cra_align = 0", but I do not think
it is so critical to make it before HASH.

You have good point here, as the names are confusing, so I will remove
SSS_ALIGNED from HASH struct definitions, and change SSS_DMA_ALIGN into
SSS_HASH_DMA_LEN_ALIGN, and SSS_DMA_ALIGN_MASK into SSS_HASH_DMA_ALIGN_MASK.

>> [...]
>>   * @lock:   Lock for protecting both access to device hardware registers
>> - *  and fields related to current request (including the busy 
>> field).
>> + *  and fields related to current request (including the busy
>> + *  field).
> 
> Why wrapping this line?

Sorry, I will remove this cleanup (line is 81 bytes long).

>> + * @res:Resources for hash.
>> + * @io_hash_base: Per-variant offset for HASH block IO memory.
>> + * @hash_lock:  Lock for protecting hash_req, hash_queue and hash_flags
>> + *  variable.
>> + * @hash_tasklet: New HASH request scheduling job.
>> + * @xmit_buf:   Buffer for current HASH request transfer into SSS block.
>> + * @hash_flags: Flags for current HASH op.
>> + * @hash_queue: Async hash queue.
>> + * @hash_req:   Current request sending to SSS HASH block.
>> + * @hash_sg_iter: Scatterlist transferred through DMA into SSS HASH block.
>> + * @hash_sg_cnt: Counter for hash_sg_iter.
>> + *
>> +/**
>> + * s5p_hash_rx - get next hash_sg_iter
>> + * @dev:device
>> + *
>> + * Return:
>> + * 2if there is no more data and it is UPDATE op
>> + * 1if new receiving (input) data is ready and can be written to
>> + *  device
> 
> Why wrapping so early?

OK, I will reformat comments up to 80 bytes per line, this one and following
ones.

>>[...]
>> +
>> +if (final) {
>> +/* number of bytes for last part */
>> +low = length; high = 0;
> 
> No multiple assignments in one line.
> 
>> +/* total number of bits prev hashed */
>> +tmplen = ctx->digcnt * 8;
>> +prelow = (u32)tmplen;
>> +prehigh = (u32)(tmplen >> 32);
>> +} else {
>> +prelow = 0; prehigh = 0;
>> +low = 0; high = BIT(31);
> 
> No multiple 

Re: [PATCH] iommu/vt-d: Fix scatterlist offset handling

2017-10-03 Thread David Woodhouse
On Thu, 2017-09-28 at 15:14 +0100, Robin Murphy wrote:
> The intel-iommu DMA ops fail to correctly handle scatterlists where
> sg->offset is greater than PAGE_SIZE - the IOVA allocation is computed
> appropriately based on the page-aligned portion of the offset, but the
> mapping is set up relative to sg->page, which means it fails to actually
> cover the whole buffer (and in the worst case doesn't cover it at all):
> 
>     (sg->dma_address + sg->dma_len) +
>     sg->dma_address -+  |
>     iov_pfn--+   |  |
>  |   |  |
>  v   v  v
> iova:   a    b    c    d    e    f
>     ||||||
>   <...calculated>
>  [_mapped__]
> pfn:    0    1    2    3    4    5
>     ||||||
>  ^   ^  ^
>  |   |  |
>     sg->page +   |  |
>     sg->offset --+  |
>     (sg->offset + sg->length) --+

I'd still dearly love to see some clear documentation of what it means
for sg->offset to be outside the page referenced by sg->page.

Or is it really not "outside", and it's *only* valid for the offset to
be > PAGE_OFFSET when it's a huge page, so we can check that with a
BUG_ON() ? 

In particular, I'd like to know what is intended in the Xen PV case,
where there isn't a straight correspondence between pfn and mfn. Is the
out-of-range sg->offset intended to refer to the next *pfn* after sg-
>page, or to the next *mfn* after sg->page? 

I confess I've only followed this thread vaguely, but I haven't seen a
*coherent* explanation except in the huge page case (in which case I
want to see that BUG_ON in the patch) of why this isn't just totally
bogus.


smime.p7s
Description: S/MIME cryptographic signature


[PATCH v2 4/4] staging: ccree: simplify OOM handling

2017-10-03 Thread Gilad Ben-Yossef
Simplify handling of memory allocation failures and remove
redundant log messages

Signed-off-by: Gilad Ben-Yossef 
---
 drivers/staging/ccree/ssi_cipher.c   | 11 --
 drivers/staging/ccree/ssi_driver.c   |  1 -
 drivers/staging/ccree/ssi_hash.c | 42 +---
 drivers/staging/ccree/ssi_ivgen.c|  9 +++-
 drivers/staging/ccree/ssi_sram_mgr.c | 23 
 5 files changed, 26 insertions(+), 60 deletions(-)

diff --git a/drivers/staging/ccree/ssi_cipher.c 
b/drivers/staging/ccree/ssi_cipher.c
index d70d86a..78706f5 100644
--- a/drivers/staging/ccree/ssi_cipher.c
+++ b/drivers/staging/ccree/ssi_cipher.c
@@ -194,10 +194,9 @@ static int ssi_blkcipher_init(struct crypto_tfm *tfm)
 
/* Allocate key buffer, cache line aligned */
ctx_p->user.key = kmalloc(max_key_buf_size, GFP_KERNEL | GFP_DMA);
-   if (!ctx_p->user.key) {
-   dev_dbg(dev, "Allocating key buffer in context failed\n");
-   rc = -ENOMEM;
-   }
+   if (!ctx_p->user.key)
+   return -ENOMEM;
+
dev_dbg(dev, "Allocated key buffer in context. key=@%p\n",
ctx_p->user.key);
 
@@ -1245,10 +1244,8 @@ struct ssi_crypto_alg *ssi_ablkcipher_create_alg(struct 
ssi_alg_template
struct crypto_alg *alg;
 
t_alg = kzalloc(sizeof(*t_alg), GFP_KERNEL);
-   if (!t_alg) {
-   dev_dbg(dev, "failed to allocate t_alg\n");
+   if (!t_alg)
return ERR_PTR(-ENOMEM);
-   }
 
alg = _alg->crypto_alg;
 
diff --git a/drivers/staging/ccree/ssi_driver.c 
b/drivers/staging/ccree/ssi_driver.c
index c4b608b..795a087 100644
--- a/drivers/staging/ccree/ssi_driver.c
+++ b/drivers/staging/ccree/ssi_driver.c
@@ -212,7 +212,6 @@ static int init_cc_resources(struct platform_device 
*plat_dev)
 
new_drvdata = devm_kzalloc(dev, sizeof(*new_drvdata), GFP_KERNEL);
if (!new_drvdata) {
-   dev_dbg(dev, "Failed to allocate drvdata");
rc = -ENOMEM;
goto post_drvdata_err;
}
diff --git a/drivers/staging/ccree/ssi_hash.c b/drivers/staging/ccree/ssi_hash.c
index a27c988..d79090e 100644
--- a/drivers/staging/ccree/ssi_hash.c
+++ b/drivers/staging/ccree/ssi_hash.c
@@ -157,34 +157,28 @@ static int ssi_hash_map_request(struct device *dev,
int rc = -ENOMEM;
 
state->buff0 = kzalloc(SSI_MAX_HASH_BLCK_SIZE, GFP_KERNEL | GFP_DMA);
-   if (!state->buff0) {
-   dev_err(dev, "Allocating buff0 in context failed\n");
+   if (!state->buff0)
goto fail0;
-   }
+
state->buff1 = kzalloc(SSI_MAX_HASH_BLCK_SIZE, GFP_KERNEL | GFP_DMA);
-   if (!state->buff1) {
-   dev_err(dev, "Allocating buff1 in context failed\n");
+   if (!state->buff1)
goto fail_buff0;
-   }
+
state->digest_result_buff = kzalloc(SSI_MAX_HASH_DIGEST_SIZE, 
GFP_KERNEL | GFP_DMA);
-   if (!state->digest_result_buff) {
-   dev_err(dev, "Allocating digest_result_buff in context 
failed\n");
+   if (!state->digest_result_buff)
goto fail_buff1;
-   }
+
state->digest_buff = kzalloc(ctx->inter_digestsize, GFP_KERNEL | 
GFP_DMA);
-   if (!state->digest_buff) {
-   dev_err(dev, "Allocating digest-buffer in context failed\n");
+   if (!state->digest_buff)
goto fail_digest_result_buff;
-   }
 
dev_dbg(dev, "Allocated digest-buffer in context 
ctx->digest_buff=@%p\n",
state->digest_buff);
if (ctx->hw_mode != DRV_CIPHER_XCBC_MAC) {
state->digest_bytes_len = kzalloc(HASH_LEN_SIZE, GFP_KERNEL | 
GFP_DMA);
-   if (!state->digest_bytes_len) {
-   dev_err(dev, "Allocating digest-bytes-len in context 
failed\n");
+   if (!state->digest_bytes_len)
goto fail1;
-   }
+
dev_dbg(dev, "Allocated digest-bytes-len in context 
state->>digest_bytes_len=@%p\n",
state->digest_bytes_len);
} else {
@@ -192,10 +186,9 @@ static int ssi_hash_map_request(struct device *dev,
}
 
state->opad_digest_buff = kzalloc(ctx->inter_digestsize, GFP_KERNEL | 
GFP_DMA);
-   if (!state->opad_digest_buff) {
-   dev_err(dev, "Allocating opad-digest-buffer in context 
failed\n");
+   if (!state->opad_digest_buff)
goto fail2;
-   }
+
dev_dbg(dev, "Allocated opad-digest-buffer in context 
state->digest_bytes_len=@%p\n",
state->opad_digest_buff);
 
@@ -2057,10 +2050,9 @@ ssi_hash_create_alg(struct ssi_hash_template *template, 
struct device *dev,
struct ahash_alg *halg;
 
t_crypto_alg = kzalloc(sizeof(*t_crypto_alg), GFP_KERNEL);
-   if (!t_crypto_alg) {
-   dev_dbg(dev, "failed to allocate t_crypto_alg\n");
+   if (!t_crypto_alg)

Re: [PATCH 2/4] staging: ccree: simplify access to struct device

2017-10-03 Thread Gilad Ben-Yossef
On Mon, Oct 2, 2017 at 1:00 PM, Joe Perches  wrote:
> On Mon, 2017-10-02 at 10:03 +0100, Gilad Ben-Yossef wrote:
>> Introduce a DEV macro to retrieve struct device from private
>> data structure in preparation to replacing custom logging
>> macros with proper dev_dbg and friends which require struct
>> device.
> []
>> diff --git a/drivers/staging/ccree/ssi_driver.h 
>> b/drivers/staging/ccree/ssi_driver.h
> []
>> @@ -103,6 +103,8 @@
>>  #define SSI_LOG_DEBUG(format, ...) do {} while (0)
>>  #endif
>>
>> +#define DEV(drvdata) ((&(drvdata)->plat_dev->dev))
>
> The name seems not particularly descriptive.
> It seems a longer name would
> not be too bad.
>
> Perhaps
>
> static inline struct device *drvdata_to_dev(struct ssi_drvdata *drvdata)
> {
> return >plat_dev->dev;
> }
>

Good point.

Fixed in v2.

Thanks,
Gilad


-- 
Gilad Ben-Yossef
Chief Coffee Drinker

"If you take a class in large-scale robotics, can you end up in a
situation where the homework eats your dog?"
 -- Jean-Baptiste Queru


[PATCH v2 0/4] staging: ccree: logging related coding style fixes

2017-10-03 Thread Gilad Ben-Yossef
The following patch set cleans up some code and builds upon this to replace
ccree custom logging macros with the generic device dev_* facilities,
handles the resulting fallout and further simplifies handling of
memory and allocation OOM error handling code path exposed by checkpatch
following the change.

Patch set based upon commit 1cd5929ab675 ("staging: greybus: light:
remove unnecessary error check") in the staging-next tree.

Signed-off-by: Gilad Ben-Yossef 

Changes from v1:
- Turn DEV macro into drvdats_to_dev inline function as suggested
  by Joe Perches.
- Fix a compile warning about an unused variable seen between after
  the application of the 2nd patch in the series before the 3rd.
- Remove even more uneeded code in the memory allocation functions

Gilad Ben-Yossef (4):
  staging: ccree: remove sysfs if of deleted code
  staging: ccree: simplify access to struct device
  staging: ccree: move to generic device log infra
  staging: ccree: simplify OOM handling

 drivers/staging/ccree/ssi_aead.c| 237 +++
 drivers/staging/ccree/ssi_buffer_mgr.c  | 408 +++-
 drivers/staging/ccree/ssi_buffer_mgr.h  |   5 +-
 drivers/staging/ccree/ssi_cipher.c  | 158 ++---
 drivers/staging/ccree/ssi_driver.c  | 163 ++---
 drivers/staging/ccree/ssi_driver.h  |  19 +-
 drivers/staging/ccree/ssi_fips.c|  12 +-
 drivers/staging/ccree/ssi_hash.c| 374 ++---
 drivers/staging/ccree/ssi_ivgen.c   |  18 +-
 drivers/staging/ccree/ssi_pm.c  |  30 +--
 drivers/staging/ccree/ssi_request_mgr.c | 107 +
 drivers/staging/ccree/ssi_sram_mgr.c|  33 +--
 drivers/staging/ccree/ssi_sysfs.c   | 269 +
 13 files changed, 773 insertions(+), 1060 deletions(-)

-- 
2.7.4



Re: [PATCH 1/3] crypto: dh_helper - return unsigned int for dh_data_size()

2017-10-03 Thread David Howells
Tudor Ambarus  wrote:

> -static inline int dh_data_size(const struct dh *p)
> +static inline unsigned int dh_data_size(const struct dh *p)
>  {
>   return p->key_size + p->p_size + p->g_size;
>  }

If this is a problem, do you need to do range checking?

David


Re: [PATCH v3 1/3] crypto: engine - permit to enqueue aead_request

2017-10-03 Thread Fabien DESSENNE
On 22/09/17 11:09, Herbert Xu wrote:
> On Fri, Aug 18, 2017 at 11:19:04AM +0200, Fabien Dessenne wrote:
>> The current crypto engine allows ablkcipher_request and ahash_request to
>> be enqueued. Extend this to aead_request.
>>
>> Signed-off-by: Fabien Dessenne 
> I'd like to see the crypto_engine interface cleaned up a little
> before we expand it further.  Please refer to
>
> https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1474434.html
>
> Thanks,
It looks like there is no more activity around this "crypto_engine 
interface clean up" task.
This unfortunately has been blocking the introduction of this new STM32 
crypto driver for 3 months now.
Would it make sense to have this driver reviewed first, and then 
reworked (I expect minor update here) when the interface update is ready?

BR

Fabien

[PATCH 1/7] crypto:chelsio: Remove unused parameter

2017-10-03 Thread Harsh Jain
Remove unused parameter sent to latest fw.

Signed-off-by: Harsh Jain 
---
 drivers/crypto/chelsio/chcr_algo.c | 43 +++---
 drivers/crypto/chelsio/chcr_algo.h | 12 +--
 2 files changed, 23 insertions(+), 32 deletions(-)

diff --git a/drivers/crypto/chelsio/chcr_algo.c 
b/drivers/crypto/chelsio/chcr_algo.c
index 0e81607..bdb1014 100644
--- a/drivers/crypto/chelsio/chcr_algo.c
+++ b/drivers/crypto/chelsio/chcr_algo.c
@@ -577,36 +577,27 @@ static int chcr_cipher_fallback(struct crypto_skcipher 
*cipher,
 static inline void create_wreq(struct chcr_context *ctx,
   struct chcr_wr *chcr_req,
   void *req, struct sk_buff *skb,
-  int kctx_len, int hash_sz,
-  int is_iv,
+  int hash_sz,
   unsigned int sc_len,
   unsigned int lcb)
 {
struct uld_ctx *u_ctx = ULD_CTX(ctx);
-   int iv_loc = IV_DSGL;
int qid = u_ctx->lldi.rxq_ids[ctx->rx_qidx];
-   unsigned int immdatalen = 0, nr_frags = 0;
+   unsigned int immdatalen = 0;
 
-   if (is_ofld_imm(skb)) {
+   if (is_ofld_imm(skb))
immdatalen = skb->data_len;
-   iv_loc = IV_IMMEDIATE;
-   } else {
-   nr_frags = skb_shinfo(skb)->nr_frags;
-   }
 
-   chcr_req->wreq.op_to_cctx_size = FILL_WR_OP_CCTX_SIZE(immdatalen,
-   ((sizeof(chcr_req->key_ctx) + kctx_len) >> 4));
+   chcr_req->wreq.op_to_cctx_size = FILL_WR_OP_CCTX_SIZE;
chcr_req->wreq.pld_size_hash_size =
-   htonl(FW_CRYPTO_LOOKASIDE_WR_PLD_SIZE_V(sgl_lengths[nr_frags]) |
- FW_CRYPTO_LOOKASIDE_WR_HASH_SIZE_V(hash_sz));
+   htonl(FW_CRYPTO_LOOKASIDE_WR_HASH_SIZE_V(hash_sz));
chcr_req->wreq.len16_pkd =
htonl(FW_CRYPTO_LOOKASIDE_WR_LEN16_V(DIV_ROUND_UP(
(calc_tx_flits_ofld(skb) * 8), 16)));
chcr_req->wreq.cookie = cpu_to_be64((uintptr_t)req);
chcr_req->wreq.rx_chid_to_rx_q_id =
FILL_WR_RX_Q_ID(ctx->dev->rx_channel_id, qid,
-   is_iv ? iv_loc : IV_NOP, !!lcb,
-   ctx->tx_qidx);
+   !!lcb, ctx->tx_qidx);
 
chcr_req->ulptx.cmd_dest = FILL_ULPTX_CMD_DEST(ctx->dev->tx_channel_id,
   qid);
@@ -616,7 +607,7 @@ static inline void create_wreq(struct chcr_context *ctx,
chcr_req->sc_imm.cmd_more = FILL_CMD_MORE(immdatalen);
chcr_req->sc_imm.len = cpu_to_be32(sizeof(struct cpl_tx_sec_pdu) +
   sizeof(chcr_req->key_ctx) +
-  kctx_len + sc_len + immdatalen);
+  sc_len + immdatalen);
 }
 
 /**
@@ -706,8 +697,8 @@ static struct sk_buff *create_cipher_wr(struct 
cipher_wr_param *wrparam)
write_buffer_to_skb(skb, , reqctx->iv, ivsize);
write_sg_to_skb(skb, , wrparam->srcsg, wrparam->bytes);
atomic_inc(>chcr_stats.cipher_rqst);
-   create_wreq(ctx, chcr_req, &(wrparam->req->base), skb, kctx_len, 0, 1,
-   sizeof(struct cpl_rx_phys_dsgl) + phys_dsgl,
+   create_wreq(ctx, chcr_req, &(wrparam->req->base), skb, 0,
+   sizeof(struct cpl_rx_phys_dsgl) + phys_dsgl + kctx_len,
ablkctx->ciph_mode == CHCR_SCMD_CIPHER_MODE_AES_CBC);
reqctx->skb = skb;
skb_get(skb);
@@ -1417,8 +1408,8 @@ static struct sk_buff *create_hash_wr(struct 
ahash_request *req,
if (param->sg_len != 0)
write_sg_to_skb(skb, , req->src, param->sg_len);
atomic_inc(>chcr_stats.digest_rqst);
-   create_wreq(ctx, chcr_req, >base, skb, kctx_len,
-   hash_size_in_response, 0, DUMMY_BYTES, 0);
+   create_wreq(ctx, chcr_req, >base, skb, hash_size_in_response,
+   DUMMY_BYTES + kctx_len, 0);
req_ctx->skb = skb;
skb_get(skb);
return skb;
@@ -2080,8 +2071,8 @@ static struct sk_buff *create_authenc_wr(struct 
aead_request *req,
write_buffer_to_skb(skb, , req->iv, ivsize);
write_sg_to_skb(skb, , src, req->cryptlen);
atomic_inc(>chcr_stats.cipher_rqst);
-   create_wreq(ctx, chcr_req, >base, skb, kctx_len, size, 1,
-  sizeof(struct cpl_rx_phys_dsgl) + dst_size, 0);
+   create_wreq(ctx, chcr_req, >base, skb, size,
+  sizeof(struct cpl_rx_phys_dsgl) + dst_size + kctx_len, 0);
reqctx->skb = skb;
skb_get(skb);
 
@@ -2396,8 +2387,8 @@ static struct sk_buff *create_aead_ccm_wr(struct 
aead_request *req,
skb_set_transport_header(skb, transhdr_len);
frags = fill_aead_req_fields(skb, req, src, ivsize, aeadctx);

[PATCH 4/7] crypto:chelsio:Use x8_ble gf multiplication to calculate IV.

2017-10-03 Thread Harsh Jain
gf128mul_x8_ble() will reduce gf Multiplication iteration by 8.

Signed-off-by: Harsh Jain 
---
 drivers/crypto/chelsio/chcr_algo.c   | 11 +--
 drivers/crypto/chelsio/chcr_crypto.h |  1 +
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/crypto/chelsio/chcr_algo.c 
b/drivers/crypto/chelsio/chcr_algo.c
index e4bf32d..e0ab34a 100644
--- a/drivers/crypto/chelsio/chcr_algo.c
+++ b/drivers/crypto/chelsio/chcr_algo.c
@@ -888,9 +888,11 @@ static int chcr_update_tweak(struct ablkcipher_request 
*req, u8 *iv)
int ret, i;
u8 *key;
unsigned int keylen;
+   int round = reqctx->last_req_len / AES_BLOCK_SIZE;
+   int round8 = round / 8;
 
cipher = ablkctx->aes_generic;
-   memcpy(iv, req->info, AES_BLOCK_SIZE);
+   memcpy(iv, reqctx->iv, AES_BLOCK_SIZE);
 
keylen = ablkctx->enckey_len / 2;
key = ablkctx->key + keylen;
@@ -899,7 +901,10 @@ static int chcr_update_tweak(struct ablkcipher_request 
*req, u8 *iv)
goto out;
 
crypto_cipher_encrypt_one(cipher, iv, iv);
-   for (i = 0; i < (reqctx->processed / AES_BLOCK_SIZE); i++)
+   for (i = 0; i < round8; i++)
+   gf128mul_x8_ble((le128 *)iv, (le128 *)iv);
+
+   for (i = 0; i < (round % 8); i++)
gf128mul_x_ble((le128 *)iv, (le128 *)iv);
 
crypto_cipher_decrypt_one(cipher, iv, iv);
@@ -1040,6 +1045,7 @@ static int chcr_handle_cipher_resp(struct 
ablkcipher_request *req,
CRYPTO_ALG_SUB_TYPE_CTR)
bytes = adjust_ctr_overflow(reqctx->iv, bytes);
reqctx->processed += bytes;
+   reqctx->last_req_len = bytes;
wrparam.qid = u_ctx->lldi.rxq_ids[ctx->rx_qidx];
wrparam.req = req;
wrparam.bytes = bytes;
@@ -1132,6 +1138,7 @@ static int process_cipher(struct ablkcipher_request *req,
goto error;
}
reqctx->processed = bytes;
+   reqctx->last_req_len = bytes;
reqctx->dst = reqctx->dstsg;
reqctx->op = op_type;
wrparam.qid = qid;
diff --git a/drivers/crypto/chelsio/chcr_crypto.h 
b/drivers/crypto/chelsio/chcr_crypto.h
index 30af1ee..b3722b3 100644
--- a/drivers/crypto/chelsio/chcr_crypto.h
+++ b/drivers/crypto/chelsio/chcr_crypto.h
@@ -247,6 +247,7 @@ struct chcr_blkcipher_req_ctx {
struct scatterlist *dst;
struct scatterlist *newdstsg;
unsigned int processed;
+   unsigned int last_req_len;
unsigned int op;
short int dst_nents;
u8 iv[CHCR_MAX_CRYPTO_IV_LEN];
-- 
2.1.4



[PATCH 3/7] crypto:gf128mul: The x8_ble multiplication functions

2017-10-03 Thread Harsh Jain
It multiply GF(2^128) elements in the ble format.
It will be used by chelsio driver to fasten gf multiplication.

Signed-off-by: Harsh Jain 
---
 crypto/gf128mul.c | 13 +
 include/crypto/gf128mul.h |  2 +-
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/crypto/gf128mul.c b/crypto/gf128mul.c
index dc01212..24e6019 100644
--- a/crypto/gf128mul.c
+++ b/crypto/gf128mul.c
@@ -156,6 +156,19 @@ static void gf128mul_x8_bbe(be128 *x)
x->b = cpu_to_be64((b << 8) ^ _tt);
 }
 
+void gf128mul_x8_ble(le128 *r, const le128 *x)
+{
+   u64 a = le64_to_cpu(x->a);
+   u64 b = le64_to_cpu(x->b);
+
+   /* equivalent to gf128mul_table_be[b >> 63] (see crypto/gf128mul.c): */
+   u64 _tt = gf128mul_table_be[a >> 56];
+
+   r->a = cpu_to_le64((a << 8) | (b >> 56));
+   r->b = cpu_to_le64((b << 8) ^ _tt);
+}
+EXPORT_SYMBOL(gf128mul_x8_ble);
+
 void gf128mul_lle(be128 *r, const be128 *b)
 {
be128 p[8];
diff --git a/include/crypto/gf128mul.h b/include/crypto/gf128mul.h
index 0977fb1..fa0a63d 100644
--- a/include/crypto/gf128mul.h
+++ b/include/crypto/gf128mul.h
@@ -227,7 +227,7 @@ struct gf128mul_4k *gf128mul_init_4k_lle(const be128 *g);
 struct gf128mul_4k *gf128mul_init_4k_bbe(const be128 *g);
 void gf128mul_4k_lle(be128 *a, const struct gf128mul_4k *t);
 void gf128mul_4k_bbe(be128 *a, const struct gf128mul_4k *t);
-
+void gf128mul_x8_ble(le128 *r, const le128 *x);
 static inline void gf128mul_free_4k(struct gf128mul_4k *t)
 {
kzfree(t);
-- 
2.1.4



[PATCH 0/7]crypto:chelsio: Bugs fixes

2017-10-03 Thread Harsh Jain
It includes bug fix and performance improvement changes.

Harsh Jain (7):
  crypto:gf128mul: The x8_ble multiplication functions
  crypto:chelsio:Use x8_ble gf multiplication to calculate IV.
  crypto:chelsio:Remove allocation of sg list to implement 2K limit of
dsgl header
  crypto:chelsio:Move DMA un/mapping to chcr from lld  cxgb4 driver
  crypto:chelsio: Fix memory leak
  crypto:chelsio: Remove unused parameter
  crypto:chelsio: Check error code with IS_ERR macro

 crypto/gf128mul.c|   13 +
 drivers/crypto/chelsio/chcr_algo.c   | 1784 +-
 drivers/crypto/chelsio/chcr_algo.h   |   57 +-
 drivers/crypto/chelsio/chcr_core.c   |8 +-
 drivers/crypto/chelsio/chcr_core.h   |2 +-
 drivers/crypto/chelsio/chcr_crypto.h |  121 +-
 drivers/net/ethernet/chelsio/cxgb4/sge.c |8 +-
 include/crypto/gf128mul.h|2 +-
 8 files changed, 1166 insertions(+), 829 deletions(-)

-- 
2.1.4



[PATCH 6/7] crypto:chelsio:Move DMA un/mapping to chcr from lld cxgb4 driver

2017-10-03 Thread Harsh Jain
Allow chcr to do DMA mapping/Unmapping instead of lld cxgb4.
It moves "Copy AAD to dst buffer" requirement from driver to
firmware.

Signed-off-by: Ganesh Goudar 
Signed-off-by: Harsh Jain 
---
 drivers/crypto/chelsio/chcr_algo.c   | 1645 ++
 drivers/crypto/chelsio/chcr_algo.h   |   44 +-
 drivers/crypto/chelsio/chcr_crypto.h |  114 ++-
 drivers/net/ethernet/chelsio/cxgb4/sge.c |8 +-
 4 files changed, 1116 insertions(+), 695 deletions(-)

diff --git a/drivers/crypto/chelsio/chcr_algo.c 
b/drivers/crypto/chelsio/chcr_algo.c
index b13991d..646dfff 100644
--- a/drivers/crypto/chelsio/chcr_algo.c
+++ b/drivers/crypto/chelsio/chcr_algo.c
@@ -70,6 +70,8 @@
 #include "chcr_algo.h"
 #include "chcr_crypto.h"
 
+#define IV AES_BLOCK_SIZE
+
 static inline  struct chcr_aead_ctx *AEAD_CTX(struct chcr_context *ctx)
 {
return ctx->crypto_ctx->aeadctx;
@@ -102,7 +104,7 @@ static inline struct uld_ctx *ULD_CTX(struct chcr_context 
*ctx)
 
 static inline int is_ofld_imm(const struct sk_buff *skb)
 {
-   return (skb->len <= CRYPTO_MAX_IMM_TX_PKT_LEN);
+   return (skb->len <= SGE_MAX_WR_LEN);
 }
 
 /*
@@ -117,21 +119,92 @@ static inline unsigned int sgl_len(unsigned int n)
return (3 * n) / 2 + (n & 1) + 2;
 }
 
-static int dstsg_2k(struct scatterlist *sgl, unsigned int reqlen)
+static int sg_nents_xlen(struct scatterlist *sg, unsigned int reqlen,
+unsigned int entlen,
+unsigned int skip)
 {
int nents = 0;
unsigned int less;
+   unsigned int skip_len = 0;
 
-   while (sgl && reqlen) {
-   less = min(reqlen, sgl->length);
-   nents += DIV_ROUND_UP(less, CHCR_SG_SIZE);
-   reqlen -= less;
-   sgl = sg_next(sgl);
+   while (sg && skip) {
+   if (sg_dma_len(sg) <= skip) {
+   skip -= sg_dma_len(sg);
+   skip_len = 0;
+   sg = sg_next(sg);
+   } else {
+   skip_len = skip;
+   skip = 0;
+   }
}
 
+   while (sg && reqlen) {
+   less = min(reqlen, sg_dma_len(sg) - skip_len);
+   nents += DIV_ROUND_UP(less, entlen);
+   reqlen -= less;
+   skip_len = 0;
+   sg = sg_next(sg);
+   }
return nents;
 }
 
+static inline void chcr_handle_ahash_resp(struct ahash_request *req,
+ unsigned char *input,
+ int err)
+{
+   struct chcr_ahash_req_ctx *reqctx = ahash_request_ctx(req);
+   int digestsize, updated_digestsize;
+   struct crypto_ahash *tfm = crypto_ahash_reqtfm(req);
+   struct uld_ctx *u_ctx = ULD_CTX(h_ctx(tfm));
+
+   if (input == NULL)
+   goto out;
+   reqctx = ahash_request_ctx(req);
+   digestsize = crypto_ahash_digestsize(crypto_ahash_reqtfm(req));
+   if (reqctx->is_sg_map)
+   chcr_hash_dma_unmap(_ctx->lldi.pdev->dev, req);
+   if (reqctx->dma_addr)
+   dma_unmap_single(_ctx->lldi.pdev->dev, reqctx->dma_addr,
+reqctx->dma_len, DMA_TO_DEVICE);
+   reqctx->dma_addr = 0;
+   updated_digestsize = digestsize;
+   if (digestsize == SHA224_DIGEST_SIZE)
+   updated_digestsize = SHA256_DIGEST_SIZE;
+   else if (digestsize == SHA384_DIGEST_SIZE)
+   updated_digestsize = SHA512_DIGEST_SIZE;
+   if (reqctx->result == 1) {
+   reqctx->result = 0;
+   memcpy(req->result, input + sizeof(struct cpl_fw6_pld),
+  digestsize);
+   } else {
+   memcpy(reqctx->partial_hash, input + sizeof(struct cpl_fw6_pld),
+  updated_digestsize);
+   }
+out:
+   req->base.complete(>base, err);
+
+   }
+
+static inline void chcr_handle_aead_resp(struct aead_request *req,
+unsigned char *input,
+int err)
+{
+   struct chcr_aead_reqctx *reqctx = aead_request_ctx(req);
+   struct crypto_aead *tfm = crypto_aead_reqtfm(req);
+   struct uld_ctx *u_ctx = ULD_CTX(a_ctx(tfm));
+
+
+   chcr_aead_dma_unmap(_ctx->lldi.pdev->dev, req, reqctx->op);
+   if (reqctx->b0_dma)
+   dma_unmap_single(_ctx->lldi.pdev->dev, reqctx->b0_dma,
+reqctx->b0_len, DMA_BIDIRECTIONAL);
+   if (reqctx->verify == VERIFY_SW) {
+   chcr_verify_tag(req, input, );
+   reqctx->verify = VERIFY_HW;
+}
+   req->base.complete(>base, err);
+
+}
 static void chcr_verify_tag(struct aead_request *req, u8 *input, int *err)
 {
u8 temp[SHA512_DIGEST_SIZE];
@@ -166,27 +239,11 @@ int chcr_handle_resp(struct crypto_async_request *req, 
unsigned char *input,
 {
struct 

[PATCH 7/7] crypto:chelsio: Fix memory leak

2017-10-03 Thread Harsh Jain
Fix memory leak when device does not support crypto.

Reported-by: Dan Carpenter 
Signed-off-by: Harsh Jain 
---
 drivers/crypto/chelsio/chcr_core.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/crypto/chelsio/chcr_core.c 
b/drivers/crypto/chelsio/chcr_core.c
index b6dd9cb..4f677b3 100644
--- a/drivers/crypto/chelsio/chcr_core.c
+++ b/drivers/crypto/chelsio/chcr_core.c
@@ -154,15 +154,15 @@ static void *chcr_uld_add(const struct cxgb4_lld_info 
*lld)
struct uld_ctx *u_ctx;
 
/* Create the device and add it in the device list */
+   if (!(lld->ulp_crypto & ULP_CRYPTO_LOOKASIDE))
+   return ERR_PTR(-EOPNOTSUPP);
+
+   /* Create the device and add it in the device list */
u_ctx = kzalloc(sizeof(*u_ctx), GFP_KERNEL);
if (!u_ctx) {
u_ctx = ERR_PTR(-ENOMEM);
goto out;
}
-   if (!(lld->ulp_crypto & ULP_CRYPTO_LOOKASIDE)) {
-   u_ctx = ERR_PTR(-ENOMEM);
-   goto out;
-   }
u_ctx->lldi = *lld;
 out:
return u_ctx;
-- 
2.1.4



[PATCH 2/7] crypto:chelsio: Check error code with IS_ERR macro

2017-10-03 Thread Harsh Jain
Check and return proper error code.

Signed-off-by: Harsh Jain 
---
 drivers/crypto/chelsio/chcr_algo.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/crypto/chelsio/chcr_algo.c 
b/drivers/crypto/chelsio/chcr_algo.c
index bdb1014..e4bf32d 100644
--- a/drivers/crypto/chelsio/chcr_algo.c
+++ b/drivers/crypto/chelsio/chcr_algo.c
@@ -1455,8 +1455,8 @@ static int chcr_ahash_update(struct ahash_request *req)
req_ctx->result = 0;
req_ctx->data_len += params.sg_len + params.bfr_len;
skb = create_hash_wr(req, );
-   if (!skb)
-   return -ENOMEM;
+   if (IS_ERR(skb))
+   return PTR_ERR(skb);
 
if (remainder) {
u8 *temp;
@@ -1519,8 +1519,8 @@ static int chcr_ahash_final(struct ahash_request *req)
params.more = 0;
}
skb = create_hash_wr(req, );
-   if (!skb)
-   return -ENOMEM;
+   if (IS_ERR(skb))
+   return PTR_ERR(skb);
 
skb->dev = u_ctx->lldi.ports[0];
set_wr_txq(skb, CPL_PRIORITY_DATA, ctx->tx_qidx);
@@ -1570,8 +1570,8 @@ static int chcr_ahash_finup(struct ahash_request *req)
}
 
skb = create_hash_wr(req, );
-   if (!skb)
-   return -ENOMEM;
+   if (IS_ERR(skb))
+   return PTR_ERR(skb);
 
skb->dev = u_ctx->lldi.ports[0];
set_wr_txq(skb, CPL_PRIORITY_DATA, ctx->tx_qidx);
@@ -1621,8 +1621,8 @@ static int chcr_ahash_digest(struct ahash_request *req)
}
 
skb = create_hash_wr(req, );
-   if (!skb)
-   return -ENOMEM;
+   if (IS_ERR(skb))
+   return PTR_ERR(skb);
 
skb->dev = u_ctx->lldi.ports[0];
set_wr_txq(skb, CPL_PRIORITY_DATA, ctx->tx_qidx);
-- 
2.1.4



Re: [PATCH 2/2] MIPS: crypto: Add crc32 and crc32c hw accelerated module

2017-10-03 Thread Marcin Nowakowski

Hi Jonas, James,

On 02.10.2017 16:20, Jonas Gorski wrote:

On 29 September 2017 at 23:34, James Hogan  wrote:

Hi Marcin,

On Wed, Sep 27, 2017 at 02:18:36PM +0200, Marcin Nowakowski wrote:

This module registers crc32 and crc32c algorithms that use the
optional CRC32[bhwd] and CRC32C[bhwd] instructions in MIPSr6 cores.

Signed-off-by: Marcin Nowakowski 
Cc: linux-crypto@vger.kernel.org
Cc: Herbert Xu 
Cc: "David S. Miller" 

---
  arch/mips/Kconfig |   4 +
  arch/mips/Makefile|   3 +
  arch/mips/crypto/Makefile |   5 +
  arch/mips/crypto/crc32-mips.c | 361 ++
  crypto/Kconfig|   9 ++
  5 files changed, 382 insertions(+)
  create mode 100644 arch/mips/crypto/Makefile
  create mode 100644 arch/mips/crypto/crc32-mips.c

diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
index cb7fcc4..0f96812 100644
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -2036,6 +2036,7 @@ config CPU_MIPSR6
   select CPU_HAS_RIXI
   select HAVE_ARCH_BITREVERSE
   select MIPS_ASID_BITS_VARIABLE
+ select MIPS_CRC_SUPPORT
   select MIPS_SPRAM

  config EVA
@@ -2503,6 +2504,9 @@ config MIPS_ASID_BITS
  config MIPS_ASID_BITS_VARIABLE
   bool

+config MIPS_CRC_SUPPORT
+ bool
+
  #
  # - Highmem only makes sense for the 32-bit kernel.
  # - The current highmem code will only work properly on physically indexed
diff --git a/arch/mips/Makefile b/arch/mips/Makefile
index a96d97a..aa77536 100644
--- a/arch/mips/Makefile
+++ b/arch/mips/Makefile
@@ -216,6 +216,8 @@ cflags-$(toolchain-msa)   += 
-DTOOLCHAIN_SUPPORTS_MSA
  endif
  toolchain-virt   := $(call 
cc-option-yn,$(mips-cflags) -mvirt)
  cflags-$(toolchain-virt) += -DTOOLCHAIN_SUPPORTS_VIRT
+toolchain-crc:= $(call 
cc-option-yn,$(mips-cflags) -Wa$(comma)-mcrc)
+cflags-$(toolchain-crc)  += -DTOOLCHAIN_SUPPORTS_CRC

  #
  # Firmware support
@@ -324,6 +326,7 @@ libs-y+= arch/mips/math-emu/
  # See arch/mips/Kbuild for content of core part of the kernel
  core-y += arch/mips/

+drivers-$(CONFIG_MIPS_CRC_SUPPORT) += arch/mips/crypto/
  drivers-$(CONFIG_OPROFILE)   += arch/mips/oprofile/

  # suspend and hibernation support
diff --git a/arch/mips/crypto/Makefile b/arch/mips/crypto/Makefile
new file mode 100644
index 000..665c725
--- /dev/null
+++ b/arch/mips/crypto/Makefile
@@ -0,0 +1,5 @@
+#
+# Makefile for MIPS crypto files..
+#
+
+obj-$(CONFIG_CRYPTO_CRC32_MIPS) += crc32-mips.o
diff --git a/arch/mips/crypto/crc32-mips.c b/arch/mips/crypto/crc32-mips.c
new file mode 100644
index 000..dfa8bb1
--- /dev/null
+++ b/arch/mips/crypto/crc32-mips.c
@@ -0,0 +1,361 @@
+/*
+ * crc32-mips.c - CRC32 and CRC32C using optional MIPSr6 instructions
+ *
+ * Module based on arm64/crypto/crc32-arm.c
+ *
+ * Copyright (C) 2014 Linaro Ltd 
+ * Copyright (C) 2017 Imagination Technologies, Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+enum crc_op_size {
+ b, h, w, d,
+};
+
+enum crc_type {
+ crc32,
+ crc32c,
+};
+
+#ifdef TOOLCHAIN_SUPPORTS_CRC
+
+#define _CRC32(crc, value, size, type)   \
+do { \
+ __asm__ __volatile__(   \
+ ".set   push\n\t"   \
+ ".set   crc\n\t"\
+ #type #size "   %0, %1, %0\n\t" \
+ ".set   pop\n\t"\


Technically the \n\t on the last line is redundant.


+ : "+r" (crc)\
+ : "r" (value)   \
+);   \
+} while(0)
+
+#define CRC_REGISTER
+
+#else/* TOOLCHAIN_SUPPORTS_CRC */
+/*
+ * Crc argument is currently ignored and the assembly below assumes
+ * the crc is stored in $2. As the register number is encoded in the
+ * instruction we can't let the compiler chose the register it wants.
+ * An alternative is to change the code to do
+ * move $2, %0
+ * crc32
+ * move %0, $2
+ * but that adds unnecessary operations that the crc32 operation is
+ * designed to avoid. This issue can go away once the assembler
+ * is extended to support this operation and the compiler can make
+ * the right register choice automatically
+ */
+
+#define _CRC32(crc, value, size, type) 
  \
+do { \
+ __asm__ __volatile__(   \
+ ".set   push\n\t"