[cryptodev:master 81/86] htmldocs: include/linux/crypto.h:614: warning: Function parameter or member 'stats.aead' not described in 'crypto_alg'

2018-12-07 Thread kbuild test robot
tree:   
https://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git master
head:   88d905e20b11f7ad841e3afddaf1d59b6693c4a1
commit: 17c18f9e33282a170458cb5ea20759bfcb0da7d8 [81/86] crypto: user - Split 
stats in multiple structures
reproduce: make htmldocs

All warnings (new ones prefixed by >>):

   WARNING: convert(1) not found, for SVG to PDF conversion install ImageMagick 
(https://www.imagemagick.org)
   kernel/resource.c:337: warning: Function parameter or member 'start' not 
described in 'find_next_iomem_res'
   kernel/resource.c:337: warning: Function parameter or member 'end' not 
described in 'find_next_iomem_res'
   kernel/resource.c:337: warning: Function parameter or member 'flags' not 
described in 'find_next_iomem_res'
   kernel/resource.c:337: warning: Function parameter or member 'desc' not 
described in 'find_next_iomem_res'
   kernel/resource.c:337: warning: Function parameter or member 'first_lvl' not 
described in 'find_next_iomem_res'
   kernel/resource.c:337: warning: Function parameter or member 'res' not 
described in 'find_next_iomem_res'
   kernel/resource.c:409: warning: Function parameter or member 'arg' not 
described in 'walk_iomem_res_desc'
   kernel/resource.c:409: warning: Function parameter or member 'func' not 
described in 'walk_iomem_res_desc'
   kernel/resource.c:409: warning: Function parameter or member 'arg' not 
described in 'walk_iomem_res_desc'
   kernel/resource.c:409: warning: Function parameter or member 'func' not 
described in 'walk_iomem_res_desc'
   include/linux/rcutree.h:1: warning: no structured comments found
   kernel/rcu/tree.c:684: warning: Excess function parameter 'irq' description 
in 'rcu_nmi_exit'
   include/linux/srcu.h:175: warning: Function parameter or member 'p' not 
described in 'srcu_dereference_notrace'
   include/linux/srcu.h:175: warning: Function parameter or member 'sp' not 
described in 'srcu_dereference_notrace'
   include/linux/gfp.h:1: warning: no structured comments found
>> include/linux/crypto.h:614: warning: Function parameter or member 
>> 'stats.aead' not described in 'crypto_alg'
>> include/linux/crypto.h:614: warning: Function parameter or member 
>> 'stats.akcipher' not described in 'crypto_alg'
>> include/linux/crypto.h:614: warning: Function parameter or member 
>> 'stats.cipher' not described in 'crypto_alg'
>> include/linux/crypto.h:614: warning: Function parameter or member 
>> 'stats.compress' not described in 'crypto_alg'
>> include/linux/crypto.h:614: warning: Function parameter or member 
>> 'stats.hash' not described in 'crypto_alg'
>> include/linux/crypto.h:614: warning: Function parameter or member 
>> 'stats.rng' not described in 'crypto_alg'
>> include/linux/crypto.h:614: warning: Function parameter or member 
>> 'stats.kpp' not described in 'crypto_alg'
   include/net/cfg80211.h:2838: warning: cannot understand function prototype: 
'struct cfg80211_ftm_responder_stats '
   include/net/cfg80211.h:2838: warning: cannot understand function prototype: 
'struct cfg80211_ftm_responder_stats '
   include/net/cfg80211.h:2838: warning: cannot understand function prototype: 
'struct cfg80211_ftm_responder_stats '
   include/net/cfg80211.h:2838: warning: cannot understand function prototype: 
'struct cfg80211_ftm_responder_stats '
   include/net/cfg80211.h:2838: warning: cannot understand function prototype: 
'struct cfg80211_ftm_responder_stats '
   include/net/cfg80211.h:2838: warning: cannot understand function prototype: 
'struct cfg80211_ftm_responder_stats '
   include/net/cfg80211.h:2838: warning: cannot understand function prototype: 
'struct cfg80211_ftm_responder_stats '
   include/net/cfg80211.h:2838: warning: cannot understand function prototype: 
'struct cfg80211_ftm_responder_stats '
   include/net/cfg80211.h:2838: warning: cannot understand function prototype: 
'struct cfg80211_ftm_responder_stats '
   include/net/cfg80211.h:2838: warning: cannot understand function prototype: 
'struct cfg80211_ftm_responder_stats '
   include/net/cfg80211.h:2838: warning: cannot understand function prototype: 
'struct cfg80211_ftm_responder_stats '
   include/net/cfg80211.h:2838: warning: cannot understand function prototype: 
'struct cfg80211_ftm_responder_stats '
   include/net/cfg80211.h:2838: warning: cannot understand function prototype: 
'struct cfg80211_ftm_responder_stats '
   include/net/cfg80211.h:4439: warning: Function parameter or member 
'wext.ibss' not described in 'wireless_dev'
   include/net/cfg80211.h:4439: warning: Function parameter or member 
'wext.connect' not described in 'wireless_dev'
   include/net/cfg80211.h:4439: warning: Function parameter or member 
'wext.keys' not described in 'wireless_dev'
   include/net/cfg80211.h:4439: warning: Function parameter or member 'wext.ie' 
not described in 'wireless_dev'
   include/net/cfg80211.h:4439: warning: Function parameter or member 
'wext.ie_len' not described in 'wireless_dev'
   include/net/cfg80211.h:4439: warning: Function 

[PATCH] crypto: caam - fix setting IV after decrypt

2018-12-07 Thread Sascha Hauer
The crypto API wants the updated IV in req->info after decryption. The
updated IV used to be copied correctly to req->info after running the
decryption job. Since 115957bb3e59 this is done before running the job
so instead of the updated IV only the unmodified input IV is given back
to the crypto API.

This was observed running the gcm(aes) selftest which internally uses
ctr(aes) implemented by the CAAM engine.

Fixes: 115957bb3e59 ("crypto: caam - fix IV DMA mapping and updating")

Signed-off-by: Sascha Hauer 
Cc: sta...@vger.kernel.org
---
 drivers/crypto/caam/caamalg.c | 17 +
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/drivers/crypto/caam/caamalg.c b/drivers/crypto/caam/caamalg.c
index 869f092432de..c05c7938439c 100644
--- a/drivers/crypto/caam/caamalg.c
+++ b/drivers/crypto/caam/caamalg.c
@@ -917,10 +917,10 @@ static void skcipher_decrypt_done(struct device *jrdev, 
u32 *desc, u32 err,
 {
struct skcipher_request *req = context;
struct skcipher_edesc *edesc;
-#ifdef DEBUG
struct crypto_skcipher *skcipher = crypto_skcipher_reqtfm(req);
int ivsize = crypto_skcipher_ivsize(skcipher);
 
+#ifdef DEBUG
dev_err(jrdev, "%s %d: err 0x%x\n", __func__, __LINE__, err);
 #endif
 
@@ -937,6 +937,14 @@ static void skcipher_decrypt_done(struct device *jrdev, 
u32 *desc, u32 err,
 edesc->dst_nents > 1 ? 100 : req->cryptlen, 1);
 
skcipher_unmap(jrdev, edesc, req);
+
+   /*
+* The crypto API expects us to set the IV (req->iv) to the last
+* ciphertext block.
+*/
+   scatterwalk_map_and_copy(req->iv, req->src, req->cryptlen - ivsize,
+ivsize, 0);
+
kfree(edesc);
 
skcipher_request_complete(req, err);
@@ -1588,13 +1596,6 @@ static int skcipher_decrypt(struct skcipher_request *req)
if (IS_ERR(edesc))
return PTR_ERR(edesc);
 
-   /*
-* The crypto API expects us to set the IV (req->iv) to the last
-* ciphertext block.
-*/
-   scatterwalk_map_and_copy(req->iv, req->src, req->cryptlen - ivsize,
-ivsize, 0);
-
/* Create and submit job descriptor*/
init_skcipher_job(req, edesc, false);
desc = edesc->hw_desc;
-- 
2.19.1



Re: [PATCH v5 00/11] crypto: crypto_user_stat: misc enhancement

2018-12-06 Thread Herbert Xu
On Thu, Nov 29, 2018 at 02:42:15PM +, Corentin Labbe wrote:
> Hello
> 
> This patchset fixes all reported problem by Eric biggers.
> 
> Regards
> 
> Changes since v4:
> - Inlined functions when !CRYPTO_STATS
> 
> Changes since v3:
> - Added a crypto_stats_init as asked vy Neil Horman
> - Fixed some checkpatch complaints
> 
> Changes since v2:
> - moved all crypto_stats functions from header to algapi.c for using
>   crypto_alg_get/put
> 
> Changes since v1:
> - Better locking of crypto_alg via crypto_alg_get/crypto_alg_put
> - remove all intermediate variables in crypto/crypto_user_stat.c
> - splited all internal stats variables into different structures
> 
> Corentin Labbe (11):
>   crypto: crypto_user_stat: made crypto_user_stat optional
>   crypto: CRYPTO_STATS should depend on CRYPTO_USER
>   crypto: crypto_user_stat: convert all stats from u32 to u64
>   crypto: crypto_user_stat: split user space crypto stat structures
>   crypto: tool: getstat: convert user space example to the new
> crypto_user_stat uapi
>   crypto: crypto_user_stat: fix use_after_free of struct xxx_request
>   crypto: crypto_user_stat: Fix invalid stat reporting
>   crypto: crypto_user_stat: remove intermediate variable
>   crypto: crypto_user_stat: Split stats in multiple structures
>   crypto: crypto_user_stat: rename err_cnt parameter
>   crypto: crypto_user_stat: Add crypto_stats_init
> 
>  crypto/Kconfig   |   1 +
>  crypto/Makefile  |   3 +-
>  crypto/ahash.c   |  17 +-
>  crypto/algapi.c  | 247 ++-
>  crypto/crypto_user_stat.c| 160 +--
>  crypto/rng.c |   4 +-
>  include/crypto/acompress.h   |  38 +---
>  include/crypto/aead.h|  38 +---
>  include/crypto/akcipher.h|  74 ++-
>  include/crypto/hash.h|  32 +--
>  include/crypto/internal/cryptouser.h |  17 ++
>  include/crypto/kpp.h |  48 +
>  include/crypto/rng.h |  27 +--
>  include/crypto/skcipher.h|  36 +---
>  include/linux/crypto.h   | 290 ++-
>  include/uapi/linux/cryptouser.h  | 102 ++
>  tools/crypto/getstat.c   |  72 +++
>  17 files changed, 676 insertions(+), 530 deletions(-)

All applied.  Thanks.
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: [crypto chcr 1/2] small packet Tx stalls the queue

2018-12-06 Thread Herbert Xu
On Fri, Nov 30, 2018 at 02:31:48PM +0530, Atul Gupta wrote:
> Immediate packets sent to hardware should include the work
> request length in calculating the flits. WR occupy one flit and
> if not accounted result in invalid request which stalls the HW
> queue.
> 
> Cc: sta...@vger.kernel.org
> Signed-off-by: Atul Gupta 
> ---
>  drivers/crypto/chelsio/chcr_ipsec.c | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)

All applied.  Thanks.
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


[PATCH] crypto: adiantum - adjust some comments to match latest paper

2018-12-06 Thread Eric Biggers
From: Eric Biggers 

The 2018-11-28 revision of the Adiantum paper has revised some notation:

- 'M' was replaced with 'L' (meaning "Left", for the left-hand part of
  the message) in the definition of Adiantum hashing, to avoid confusion
  with the full message
- ε-almost-∆-universal is now abbreviated as ε-∆U instead of εA∆U
- "block" is now used only to mean block cipher and Poly1305 blocks

Also, Adiantum hashing was moved from the appendix to the main paper.

To avoid confusion, update relevant comments in the code to match.

Signed-off-by: Eric Biggers 
---
 crypto/adiantum.c   | 35 +++
 crypto/nhpoly1305.c |  8 
 2 files changed, 23 insertions(+), 20 deletions(-)

diff --git a/crypto/adiantum.c b/crypto/adiantum.c
index ca27e0dc2958c..e62e34f5e389b 100644
--- a/crypto/adiantum.c
+++ b/crypto/adiantum.c
@@ -9,7 +9,7 @@
  * Adiantum is a tweakable, length-preserving encryption mode designed for fast
  * and secure disk encryption, especially on CPUs without dedicated crypto
  * instructions.  Adiantum encrypts each sector using the XChaCha12 stream
- * cipher, two passes of an ε-almost-∆-universal (εA∆U) hash function based on
+ * cipher, two passes of an ε-almost-∆-universal (ε-∆U) hash function based on
  * NH and Poly1305, and an invocation of the AES-256 block cipher on a single
  * 16-byte block.  See the paper for details:
  *
@@ -21,12 +21,12 @@
  * - Stream cipher: XChaCha12 or XChaCha20
  * - Block cipher: any with a 128-bit block size and 256-bit key
  *
- * This implementation doesn't currently allow other εA∆U hash functions, i.e.
+ * This implementation doesn't currently allow other ε-∆U hash functions, i.e.
  * HPolyC is not supported.  This is because Adiantum is ~20% faster than 
HPolyC
- * but still provably as secure, and also the εA∆U hash function of HBSH is
+ * but still provably as secure, and also the ε-∆U hash function of HBSH is
  * formally defined to take two inputs (tweak, message) which makes it 
difficult
  * to wrap with the crypto_shash API.  Rather, some details need to be handled
- * here.  Nevertheless, if needed in the future, support for other εA∆U hash
+ * here.  Nevertheless, if needed in the future, support for other ε-∆U hash
  * functions could be added here.
  */
 
@@ -41,7 +41,7 @@
 #include "internal.h"
 
 /*
- * Size of right-hand block of input data, in bytes; also the size of the block
+ * Size of right-hand part of input data, in bytes; also the size of the block
  * cipher's block size and the hash function's output.
  */
 #define BLOCKCIPHER_BLOCK_SIZE 16
@@ -77,7 +77,7 @@ struct adiantum_tfm_ctx {
 struct adiantum_request_ctx {
 
/*
-* Buffer for right-hand block of data, i.e.
+* Buffer for right-hand part of data, i.e.
 *
 *P_L => P_M => C_M => C_R when encrypting, or
 *C_R => C_M => P_M => P_L when decrypting.
@@ -93,8 +93,8 @@ struct adiantum_request_ctx {
bool enc; /* true if encrypting, false if decrypting */
 
/*
-* The result of the Poly1305 εA∆U hash function applied to
-* (message length, tweak).
+* The result of the Poly1305 ε-∆U hash function applied to
+* (bulk length, tweak)
 */
le128 header_hash;
 
@@ -213,13 +213,16 @@ static inline void le128_sub(le128 *r, const le128 *v1, 
const le128 *v2)
 }
 
 /*
- * Apply the Poly1305 εA∆U hash function to (message length, tweak) and save 
the
- * result to rctx->header_hash.
+ * Apply the Poly1305 ε-∆U hash function to (bulk length, tweak) and save the
+ * result to rctx->header_hash.  This is the calculation
  *
- * This value is reused in both the first and second hash steps.  Specifically,
- * it's added to the result of an independently keyed εA∆U hash function (for
- * equal length inputs only) taken over the message.  This gives the overall
- * Adiantum hash of the (tweak, message) pair.
+ * H_T ← Poly1305_{K_T}(bin_{128}(|L|) || T)
+ *
+ * from the procedure in section 6.4 of the Adiantum paper.  The resulting 
value
+ * is reused in both the first and second hash steps.  Specifically, it's added
+ * to the result of an independently keyed ε-∆U hash function (for equal length
+ * inputs only) taken over the left-hand part (the "bulk") of the message, to
+ * give the overall Adiantum hash of the (tweak, left-hand part) pair.
  */
 static void adiantum_hash_header(struct skcipher_request *req)
 {
@@ -248,7 +251,7 @@ static void adiantum_hash_header(struct skcipher_request 
*req)
poly1305_core_emit(, >header_hash);
 }
 
-/* Hash the left-hand block (the "bulk") of the message using NHPoly1305 */
+/* Hash the left-hand part (the "bulk") of the message using NHPoly1305 */
 static int adiantum_hash_message(struct skcipher_request *req,
 struct scatterlist *sgl, le128 *digest)
 {
@@ -550,7 +553,7 @@ static int adiantum_create(struct crypto_template *tmpl, 
struct rtattr **tb)

[PATCH] crypto: xchacha20 - fix comments for test vectors

2018-12-06 Thread Eric Biggers
From: Eric Biggers 

The kernel's ChaCha20 uses the RFC7539 convention of the nonce being 12
bytes rather than 8, so actually I only appended 12 random bytes (not
16) to its test vectors to form 24-byte nonces for the XChaCha20 test
vectors.  The other 4 bytes were just from zero-padding the stream
position to 8 bytes.  Fix the comments above the test vectors.

Signed-off-by: Eric Biggers 
---
 crypto/testmgr.h | 14 ++
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/crypto/testmgr.h b/crypto/testmgr.h
index 357cf4cbcbb1c..e8f47d7b92cdd 100644
--- a/crypto/testmgr.h
+++ b/crypto/testmgr.h
@@ -32281,8 +32281,9 @@ static const struct cipher_testvec 
xchacha20_tv_template[] = {
  "\x57\x78\x8e\x6f\xae\x90\xfc\x31"
  "\x09\x7c\xfc",
.len= 91,
-   }, { /* Taken from the ChaCha20 test vectors, appended 16 random bytes
-   to nonce, and recomputed the ciphertext with libsodium */
+   }, { /* Taken from the ChaCha20 test vectors, appended 12 random bytes
+   to the nonce, zero-padded the stream position from 4 to 8 bytes,
+   and recomputed the ciphertext using libsodium's XChaCha20 */
.key= "\x00\x00\x00\x00\x00\x00\x00\x00"
  "\x00\x00\x00\x00\x00\x00\x00\x00"
  "\x00\x00\x00\x00\x00\x00\x00\x00"
@@ -32309,8 +32310,7 @@ static const struct cipher_testvec 
xchacha20_tv_template[] = {
  "\x03\xdc\xf8\x2b\xc1\xe1\x75\x67"
  "\x23\x7b\xe6\xfc\xd4\x03\x86\x54",
.len= 64,
-   }, { /* Taken from the ChaCha20 test vectors, appended 16 random bytes
-   to nonce, and recomputed the ciphertext with libsodium */
+   }, { /* Derived from a ChaCha20 test vector, via the process above */
.key= "\x00\x00\x00\x00\x00\x00\x00\x00"
  "\x00\x00\x00\x00\x00\x00\x00\x00"
  "\x00\x00\x00\x00\x00\x00\x00\x00"
@@ -32419,8 +32419,7 @@ static const struct cipher_testvec 
xchacha20_tv_template[] = {
.np = 3,
.tap= { 375 - 20, 4, 16 },
 
-   }, { /* Taken from the ChaCha20 test vectors, appended 16 random bytes
-   to nonce, and recomputed the ciphertext with libsodium */
+   }, { /* Derived from a ChaCha20 test vector, via the process above */
.key= "\x1c\x92\x40\xa5\xeb\x55\xd3\x8a"
  "\xf3\x33\x88\x86\x04\xf6\xb5\xf0"
  "\x47\x39\x17\xc1\x40\x2b\x80\x09"
@@ -32463,8 +32462,7 @@ static const struct cipher_testvec 
xchacha20_tv_template[] = {
  "\x65\x03\xfa\x45\xf7\x9e\x53\x7a"
  "\x99\xf1\x82\x25\x4f\x8d\x07",
.len= 127,
-   }, { /* Taken from the ChaCha20 test vectors, appended 16 random bytes
-   to nonce, and recomputed the ciphertext with libsodium */
+   }, { /* Derived from a ChaCha20 test vector, via the process above */
.key= "\x1c\x92\x40\xa5\xeb\x55\xd3\x8a"
  "\xf3\x33\x88\x86\x04\xf6\xb5\xf0"
  "\x47\x39\x17\xc1\x40\x2b\x80\x09"
-- 
2.20.0.rc2.403.gdbc3b29805-goog



[PATCH] crypto: xchacha - add test vector from XChaCha20 draft RFC

2018-12-06 Thread Eric Biggers
From: Eric Biggers 

There is a draft specification for XChaCha20 being worked on.  Add the
XChaCha20 test vector from the appendix so that we can be extra sure the
kernel's implementation is compatible.

I also recomputed the ciphertext with XChaCha12 and added it there too,
to keep the tests for XChaCha20 and XChaCha12 in sync.

Signed-off-by: Eric Biggers 
---
 crypto/testmgr.h | 178 ++-
 1 file changed, 176 insertions(+), 2 deletions(-)

diff --git a/crypto/testmgr.h b/crypto/testmgr.h
index e7e56a8febbca..357cf4cbcbb1c 100644
--- a/crypto/testmgr.h
+++ b/crypto/testmgr.h
@@ -32800,7 +32800,94 @@ static const struct cipher_testvec 
xchacha20_tv_template[] = {
.also_non_np = 1,
.np = 3,
.tap= { 1200, 1, 80 },
-   },
+   }, { /* test vector from 
https://tools.ietf.org/html/draft-arciszewski-xchacha-02#appendix-A.3.2 */
+   .key= "\x80\x81\x82\x83\x84\x85\x86\x87"
+ "\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f"
+ "\x90\x91\x92\x93\x94\x95\x96\x97"
+ "\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f",
+   .klen   = 32,
+   .iv = "\x40\x41\x42\x43\x44\x45\x46\x47"
+ "\x48\x49\x4a\x4b\x4c\x4d\x4e\x4f"
+ "\x50\x51\x52\x53\x54\x55\x56\x58"
+ "\x00\x00\x00\x00\x00\x00\x00\x00",
+   .ptext  = "\x54\x68\x65\x20\x64\x68\x6f\x6c"
+ "\x65\x20\x28\x70\x72\x6f\x6e\x6f"
+ "\x75\x6e\x63\x65\x64\x20\x22\x64"
+ "\x6f\x6c\x65\x22\x29\x20\x69\x73"
+ "\x20\x61\x6c\x73\x6f\x20\x6b\x6e"
+ "\x6f\x77\x6e\x20\x61\x73\x20\x74"
+ "\x68\x65\x20\x41\x73\x69\x61\x74"
+ "\x69\x63\x20\x77\x69\x6c\x64\x20"
+ "\x64\x6f\x67\x2c\x20\x72\x65\x64"
+ "\x20\x64\x6f\x67\x2c\x20\x61\x6e"
+ "\x64\x20\x77\x68\x69\x73\x74\x6c"
+ "\x69\x6e\x67\x20\x64\x6f\x67\x2e"
+ "\x20\x49\x74\x20\x69\x73\x20\x61"
+ "\x62\x6f\x75\x74\x20\x74\x68\x65"
+ "\x20\x73\x69\x7a\x65\x20\x6f\x66"
+ "\x20\x61\x20\x47\x65\x72\x6d\x61"
+ "\x6e\x20\x73\x68\x65\x70\x68\x65"
+ "\x72\x64\x20\x62\x75\x74\x20\x6c"
+ "\x6f\x6f\x6b\x73\x20\x6d\x6f\x72"
+ "\x65\x20\x6c\x69\x6b\x65\x20\x61"
+ "\x20\x6c\x6f\x6e\x67\x2d\x6c\x65"
+ "\x67\x67\x65\x64\x20\x66\x6f\x78"
+ "\x2e\x20\x54\x68\x69\x73\x20\x68"
+ "\x69\x67\x68\x6c\x79\x20\x65\x6c"
+ "\x75\x73\x69\x76\x65\x20\x61\x6e"
+ "\x64\x20\x73\x6b\x69\x6c\x6c\x65"
+ "\x64\x20\x6a\x75\x6d\x70\x65\x72"
+ "\x20\x69\x73\x20\x63\x6c\x61\x73"
+ "\x73\x69\x66\x69\x65\x64\x20\x77"
+ "\x69\x74\x68\x20\x77\x6f\x6c\x76"
+ "\x65\x73\x2c\x20\x63\x6f\x79\x6f"
+ "\x74\x65\x73\x2c\x20\x6a\x61\x63"
+ "\x6b\x61\x6c\x73\x2c\x20\x61\x6e"
+ "\x64\x20\x66\x6f\x78\x65\x73\x20"
+ "\x69\x6e\x20\x74\x68\x65\x20\x74"
+ "\x61\x78\x6f\x6e\x6f\x6d\x69\x63"
+ "\x20\x66\x61\x6d\x69\x6c\x79\x20"
+ "\x43\x61\x6e\x69\x64\x61\x65\x2e",
+   .ctext  = "\x45\x59\xab\xba\x4e\x48\xc1\x61"
+ "\x02\xe8\xbb\x2c\x05\xe6\x94\x7f"
+ "\x50\xa7\x86\xde\x16\x2f\x9b\x0b"
+ "\x7e\x59\x2a\x9b\x53\xd0\xd4\xe9"
+ "\x8d\x8d\x64\x10\xd5\x40\xa1\xa6"
+ "\x37\x5b\x26\xd8\x0d\xac\xe4\xfa"
+ "\xb5\x23\x84\xc7\x31\xac\xbf\x16"
+ "\xa5\x92\x3c\x0c\x48\xd3\x57\x5d"
+ "\x4d\x0d\x2c\x67\x3b\x66\x6f\xaa"
+ "\x73\x10\x61\x27\x77\x01\x09\x3a"
+ "\x6b\xf7\xa1\x58\xa8\x86\x42\x92"
+ "\xa4\x1c\x48\xe3\xa9\xb4\xc0\xda"
+ "\xec\xe0\xf8\xd9\x8d\x0d\x7e\x05"
+ "\xb3\x7a\x30\x7b\xbb\x66\x33\x31"
+ "\x64\xec\x9e\x1b\x24\xea\x0d\x6c"
+ "\x3f\xfd\xdc\xec\x4f\x68\xe7\x44"
+ "\x30\x56\x19\x3a\x03\xc8\x10\xe1"
+ "\x13\x44\xca\x06\xd8\xed\x8a\x2b"
+ "\xfb\x1e\x8d\x48\xcf\xa6\xbc\x0e"
+ 

Using Advanced Vector eXtensions with hand-coded x64 algorithms (e.g /arch/x86/blowfish-x86_64-asm_64.S)

2018-12-04 Thread Shipof _
I was curious if it might make implementing F() faster to use
instructions that are meant to work with sets of data similar to what
would be processed


[PATCH] crypto: adiantum - propagate CRYPTO_ALG_ASYNC flag to instance

2018-12-04 Thread Eric Biggers
From: Eric Biggers 

If the stream cipher implementation is asynchronous, then the Adiantum
instance must be flagged as asynchronous as well.  Otherwise someone
asking for a synchronous algorithm can get an asynchronous algorithm.

There are no asynchronous xchacha12 or xchacha20 implementations yet
which makes this largely a theoretical issue, but it should be fixed.

Fixes: 059c2a4d8e16 ("crypto: adiantum - add Adiantum support")
Signed-off-by: Eric Biggers 
---
 crypto/adiantum.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/crypto/adiantum.c b/crypto/adiantum.c
index 2dfcf12fd4529..ca27e0dc2958c 100644
--- a/crypto/adiantum.c
+++ b/crypto/adiantum.c
@@ -590,6 +590,8 @@ static int adiantum_create(struct crypto_template *tmpl, 
struct rtattr **tb)
 hash_alg->base.cra_driver_name) >= CRYPTO_MAX_ALG_NAME)
goto out_drop_hash;
 
+   inst->alg.base.cra_flags = streamcipher_alg->base.cra_flags &
+  CRYPTO_ALG_ASYNC;
inst->alg.base.cra_blocksize = BLOCKCIPHER_BLOCK_SIZE;
inst->alg.base.cra_ctxsize = sizeof(struct adiantum_tfm_ctx);
inst->alg.base.cra_alignmask = streamcipher_alg->base.cra_alignmask |
-- 
2.20.0.rc1.387.gf8505762e3-goog



Re: [PATCH] fscrypt: remove CRYPTO_CTR dependency

2018-12-04 Thread Eric Biggers
On Thu, Sep 06, 2018 at 12:43:41PM +0200, Ard Biesheuvel wrote:
> On 5 September 2018 at 21:24, Eric Biggers  wrote:
> > From: Eric Biggers 
> >
> > fscrypt doesn't use the CTR mode of operation for anything, so there's
> > no need to select CRYPTO_CTR.  It was added by commit 71dea01ea2ed
> > ("ext4 crypto: require CONFIG_CRYPTO_CTR if ext4 encryption is
> > enabled").  But, I've been unable to identify the arm64 crypto bug it
> > was supposedly working around.
> >
> > I suspect the issue was seen only on some old Android device kernel
> > (circa 3.10?).  So if the fix wasn't mistaken, the real bug is probably
> > already fixed.  Or maybe it was actually a bug in a non-upstream crypto
> > driver.
> >
> > So, remove the dependency.  If it turns out there's actually still a
> > bug, we'll fix it properly.
> >
> > Signed-off-by: Eric Biggers 
> 
> Acked-by: Ard Biesheuvel 
> 
> This may be related to
> 
> 11e3b725cfc2 crypto: arm64/aes-blk - honour iv_out requirement in CBC
> and CTR modes
> 
> given that the commit in question mentions CTS. How it actually works
> around the issue is unclear to me, though.
> 
> 
> 
> 
> > ---
> >  fs/crypto/Kconfig | 1 -
> >  1 file changed, 1 deletion(-)
> >
> > diff --git a/fs/crypto/Kconfig b/fs/crypto/Kconfig
> > index 02b7d91c92310..284b589b4774d 100644
> > --- a/fs/crypto/Kconfig
> > +++ b/fs/crypto/Kconfig
> > @@ -6,7 +6,6 @@ config FS_ENCRYPTION
> > select CRYPTO_ECB
> > select CRYPTO_XTS
> > select CRYPTO_CTS
> > -   select CRYPTO_CTR
> > select CRYPTO_SHA256
> > select KEYS
> > help
> > --
> > 2.19.0.rc2.392.g5ba43deb5a-goog
> >

Ping.  Ted, can you consider applying this to the fscrypt tree for 4.21?

Thanks,

- Eric


[PATCH v2 0/3] crypto: arm64/chacha - performance improvements

2018-12-04 Thread Ard Biesheuvel
Improve the performance of NEON based ChaCha:

Patch #1 adds a block size of 1472 to the tcrypt test template so we have
something that reflects the VPN case.

Patch #2 improves performance for arbitrary length inputs: on deep pipelines,
throughput increases ~30% when running on inputs blocks whose size is drawn
randomly from the interval [64, 1024)

Patch #3 adopts the OpenSSL approach to use the ALU in parallel with the
SIMD unit to process a fifth block while the SIMD is operating on 4 blocks.

Performance on Cortex-A57:

BEFORE:
===
testing speed of async chacha20 (chacha20-neon) encryption
tcrypt: test 0 (256 bit key, 16 byte blocks): 2528223 operations in 1 seconds 
(40451568 bytes)
tcrypt: test 1 (256 bit key, 64 byte blocks): 2518155 operations in 1 seconds 
(161161920 bytes)
tcrypt: test 2 (256 bit key, 256 byte blocks): 1207948 operations in 1 seconds 
(309234688 bytes)
tcrypt: test 3 (256 bit key, 1024 byte blocks): 332194 operations in 1 seconds 
(340166656 bytes)
tcrypt: test 4 (256 bit key, 1472 byte blocks): 185659 operations in 1 seconds 
(273290048 bytes)
tcrypt: test 5 (256 bit key, 8192 byte blocks): 41829 operations in 1 seconds 
(342663168 bytes)

AFTER:
==
testing speed of async chacha20 (chacha20-neon) encryption
tcrypt: test 0 (256 bit key, 16 byte blocks): 2530018 operations in 1 seconds 
(40480288 bytes)
tcrypt: test 1 (256 bit key, 64 byte blocks): 2518270 operations in 1 seconds 
(161169280 bytes)
tcrypt: test 2 (256 bit key, 256 byte blocks): 1187760 operations in 1 seconds 
(304066560 bytes)
tcrypt: test 3 (256 bit key, 1024 byte blocks): 361652 operations in 1 seconds 
(370331648 bytes)
tcrypt: test 4 (256 bit key, 1472 byte blocks): 280971 operations in 1 seconds 
(413589312 bytes)
tcrypt: test 5 (256 bit key, 8192 byte blocks): 53654 operations in 1 seconds 
(439533568 bytes)

Zinc:
=
testing speed of async chacha20 (chacha20-software) encryption
tcrypt: test 0 (256 bit key, 16 byte blocks): 2510300 operations in 1 seconds 
(40164800 bytes)
tcrypt: test 1 (256 bit key, 64 byte blocks): 2663794 operations in 1 seconds 
(170482816 bytes)
tcrypt: test 2 (256 bit key, 256 byte blocks): 1237617 operations in 1 seconds 
(316829952 bytes)
tcrypt: test 3 (256 bit key, 1024 byte blocks): 364645 operations in 1 seconds 
(373396480 bytes)
tcrypt: test 4 (256 bit key, 1472 byte blocks): 251548 operations in 1 seconds 
(370278656 bytes)
tcrypt: test 5 (256 bit key, 8192 byte blocks): 47650 operations in 1 seconds 
(390348800 bytes)

Cc: Eric Biggers 
Cc: Martin Willi 

Ard Biesheuvel (3):
  crypto: tcrypt - add block size of 1472 to skcipher template
  crypto: arm64/chacha - optimize for arbitrary length inputs
  crypto: arm64/chacha - use combined SIMD/ALU routine for more speed

 arch/arm64/crypto/chacha-neon-core.S | 396 +++-
 arch/arm64/crypto/chacha-neon-glue.c |  59 ++-
 crypto/tcrypt.c  |   2 +-
 3 files changed, 404 insertions(+), 53 deletions(-)

-- 
2.19.2



[PATCH v2 1/3] crypto: tcrypt - add block size of 1472 to skcipher template

2018-12-04 Thread Ard Biesheuvel
In order to have better coverage of algorithms operating on block
sizes that are in the ballpark of a VPN  packet, add 1472 to the
block_sizes array.

Signed-off-by: Ard Biesheuvel 
---
 crypto/tcrypt.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/crypto/tcrypt.c b/crypto/tcrypt.c
index 0590a9204562..e7fb87e114a5 100644
--- a/crypto/tcrypt.c
+++ b/crypto/tcrypt.c
@@ -81,7 +81,7 @@ static char *check[] = {
NULL
 };
 
-static u32 block_sizes[] = { 16, 64, 256, 1024, 8192, 0 };
+static u32 block_sizes[] = { 16, 64, 256, 1024, 1472, 8192, 0 };
 static u32 aead_sizes[] = { 16, 64, 256, 512, 1024, 2048, 4096, 8192, 0 };
 
 #define XBUFSIZE 8
-- 
2.19.2



[PATCH v2 3/3] crypto: arm64/chacha - use combined SIMD/ALU routine for more speed

2018-12-04 Thread Ard Biesheuvel
To some degree, most known AArch64 micro-architectures appear to be
able to issue ALU instructions in parellel to SIMD instructions
without affecting the SIMD throughput. This means we can use the ALU
to process a fifth ChaCha block while the SIMD is processing four
blocks in parallel.

Signed-off-by: Ard Biesheuvel 
---
 arch/arm64/crypto/chacha-neon-core.S | 235 ++--
 arch/arm64/crypto/chacha-neon-glue.c |  39 ++--
 2 files changed, 239 insertions(+), 35 deletions(-)

diff --git a/arch/arm64/crypto/chacha-neon-core.S 
b/arch/arm64/crypto/chacha-neon-core.S
index 32086709e6b3..534e0a3fafa4 100644
--- a/arch/arm64/crypto/chacha-neon-core.S
+++ b/arch/arm64/crypto/chacha-neon-core.S
@@ -1,13 +1,13 @@
 /*
  * ChaCha/XChaCha NEON helper functions
  *
- * Copyright (C) 2016 Linaro, Ltd. 
+ * Copyright (C) 2016-2018 Linaro, Ltd. 
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License version 2 as
  * published by the Free Software Foundation.
  *
- * Based on:
+ * Originally based on:
  * ChaCha20 256-bit cipher algorithm, RFC7539, x64 SSSE3 functions
  *
  * Copyright (C) 2015 Martin Willi
@@ -160,8 +160,27 @@ ENTRY(hchacha_block_neon)
ret x9
 ENDPROC(hchacha_block_neon)
 
+   a0  .reqw12
+   a1  .reqw13
+   a2  .reqw14
+   a3  .reqw15
+   a4  .reqw16
+   a5  .reqw17
+   a6  .reqw19
+   a7  .reqw20
+   a8  .reqw21
+   a9  .reqw22
+   a10 .reqw23
+   a11 .reqw24
+   a12 .reqw25
+   a13 .reqw26
+   a14 .reqw27
+   a15 .reqw28
+
.align  6
 ENTRY(chacha_4block_xor_neon)
+   frame_push  10
+
// x0: Input state matrix, s
// x1: 4 data blocks output, o
// x2: 4 data blocks input, i
@@ -181,6 +200,9 @@ ENTRY(chacha_4block_xor_neon)
// matrix by interleaving 32- and then 64-bit words, which allows us to
// do XOR in NEON registers.
//
+   // At the same time, a fifth block is encrypted in parallel using
+   // scalar registers
+   //
adr_l   x9, CTRINC  // ... and ROT8
ld1 {v30.4s-v31.4s}, [x9]
 
@@ -191,7 +213,24 @@ ENTRY(chacha_4block_xor_neon)
ld4r{ v8.4s-v11.4s}, [x8], #16
ld4r{v12.4s-v15.4s}, [x8]
 
-   // x12 += counter values 0-3
+   mov a0, v0.s[0]
+   mov a1, v1.s[0]
+   mov a2, v2.s[0]
+   mov a3, v3.s[0]
+   mov a4, v4.s[0]
+   mov a5, v5.s[0]
+   mov a6, v6.s[0]
+   mov a7, v7.s[0]
+   mov a8, v8.s[0]
+   mov a9, v9.s[0]
+   mov a10, v10.s[0]
+   mov a11, v11.s[0]
+   mov a12, v12.s[0]
+   mov a13, v13.s[0]
+   mov a14, v14.s[0]
+   mov a15, v15.s[0]
+
+   // x12 += counter values 1-4
add v12.4s, v12.4s, v30.4s
 
 .Ldoubleround4:
@@ -200,33 +239,53 @@ ENTRY(chacha_4block_xor_neon)
// x2 += x6, x14 = rotl32(x14 ^ x2, 16)
// x3 += x7, x15 = rotl32(x15 ^ x3, 16)
add v0.4s, v0.4s, v4.4s
+ add   a0, a0, a4
add v1.4s, v1.4s, v5.4s
+ add   a1, a1, a5
add v2.4s, v2.4s, v6.4s
+ add   a2, a2, a6
add v3.4s, v3.4s, v7.4s
+ add   a3, a3, a7
 
eor v12.16b, v12.16b, v0.16b
+ eor   a12, a12, a0
eor v13.16b, v13.16b, v1.16b
+ eor   a13, a13, a1
eor v14.16b, v14.16b, v2.16b
+ eor   a14, a14, a2
eor v15.16b, v15.16b, v3.16b
+ eor   a15, a15, a3
 
rev32   v12.8h, v12.8h
+ ror   a12, a12, #16
rev32   v13.8h, v13.8h
+ ror   a13, a13, #16
rev32   v14.8h, v14.8h
+ ror   a14, a14, #16
rev32   v15.8h, v15.8h
+ ror   a15, a15, #16
 
// x8 += x12, x4 = rotl32(x4 ^ x8, 12)
// x9 += x13, x5 = rotl32(x5 ^ x9, 12)
// x10 += x14, x6 = rotl32(x6 ^ x10, 12)
// x11 += x15, x7 = rotl32(x7 ^ x11, 12)
add v8.4s, v8.4s, v12.4s
+ add   a8, a8, a12
add v9.4s, v9.4s, v13.4s
+ add   a9, a9, a13
add v10.4s, v10.4s, v14.4s
+ add   a10, a10, a14
add v11.4s, v11.4s, v15.4s
+ add   

[PATCH v2 2/3] crypto: arm64/chacha - optimize for arbitrary length inputs

2018-12-04 Thread Ard Biesheuvel
Update the 4-way NEON ChaCha routine so it can handle input of any
length >64 bytes in its entirety, rather than having to call into
the 1-way routine and/or memcpy()s via temp buffers to handle the
tail of a ChaCha invocation that is not a multiple of 256 bytes.

On inputs that are a multiple of 256 bytes (and thus in tcrypt
benchmarks), performance drops by around 1% on Cortex-A57, while
performance for inputs drawn randomly from the range [64, 1024)
increases by around 30%.

Signed-off-by: Ard Biesheuvel 
---
 arch/arm64/crypto/chacha-neon-core.S | 183 ++--
 arch/arm64/crypto/chacha-neon-glue.c |  38 ++--
 2 files changed, 184 insertions(+), 37 deletions(-)

diff --git a/arch/arm64/crypto/chacha-neon-core.S 
b/arch/arm64/crypto/chacha-neon-core.S
index 75b4e06cee79..32086709e6b3 100644
--- a/arch/arm64/crypto/chacha-neon-core.S
+++ b/arch/arm64/crypto/chacha-neon-core.S
@@ -19,6 +19,8 @@
  */
 
 #include 
+#include 
+#include 
 
.text
.align  6
@@ -36,7 +38,7 @@
  */
 chacha_permute:
 
-   adr x10, ROT8
+   adr_l   x10, ROT8
ld1 {v12.4s}, [x10]
 
 .Ldoubleround:
@@ -164,6 +166,12 @@ ENTRY(chacha_4block_xor_neon)
// x1: 4 data blocks output, o
// x2: 4 data blocks input, i
// w3: nrounds
+   // x4: byte count
+
+   adr_l   x10, .Lpermute
+   and x5, x4, #63
+   add x10, x10, x5
+   add x11, x10, #64
 
//
// This function encrypts four consecutive ChaCha blocks by loading
@@ -173,15 +181,15 @@ ENTRY(chacha_4block_xor_neon)
// matrix by interleaving 32- and then 64-bit words, which allows us to
// do XOR in NEON registers.
//
-   adr x9, CTRINC  // ... and ROT8
+   adr_l   x9, CTRINC  // ... and ROT8
ld1 {v30.4s-v31.4s}, [x9]
 
// x0..15[0-3] = s0..3[0..3]
-   mov x4, x0
-   ld4r{ v0.4s- v3.4s}, [x4], #16
-   ld4r{ v4.4s- v7.4s}, [x4], #16
-   ld4r{ v8.4s-v11.4s}, [x4], #16
-   ld4r{v12.4s-v15.4s}, [x4]
+   add x8, x0, #16
+   ld4r{ v0.4s- v3.4s}, [x0]
+   ld4r{ v4.4s- v7.4s}, [x8], #16
+   ld4r{ v8.4s-v11.4s}, [x8], #16
+   ld4r{v12.4s-v15.4s}, [x8]
 
// x12 += counter values 0-3
add v12.4s, v12.4s, v30.4s
@@ -425,24 +433,47 @@ ENTRY(chacha_4block_xor_neon)
zip1v30.4s, v14.4s, v15.4s
zip2v31.4s, v14.4s, v15.4s
 
+   mov x3, #64
+   subsx5, x4, #64
+   add x6, x5, x2
+   cselx3, x3, xzr, ge
+   cselx2, x2, x6, ge
+
// interleave 64-bit words in state n, n+2
zip1v0.2d, v16.2d, v18.2d
zip2v4.2d, v16.2d, v18.2d
zip1v8.2d, v17.2d, v19.2d
zip2v12.2d, v17.2d, v19.2d
-   ld1 {v16.16b-v19.16b}, [x2], #64
+   ld1 {v16.16b-v19.16b}, [x2], x3
+
+   subsx6, x4, #128
+   ccmpx3, xzr, #4, lt
+   add x7, x6, x2
+   cselx3, x3, xzr, eq
+   cselx2, x2, x7, eq
 
zip1v1.2d, v20.2d, v22.2d
zip2v5.2d, v20.2d, v22.2d
zip1v9.2d, v21.2d, v23.2d
zip2v13.2d, v21.2d, v23.2d
-   ld1 {v20.16b-v23.16b}, [x2], #64
+   ld1 {v20.16b-v23.16b}, [x2], x3
+
+   subsx7, x4, #192
+   ccmpx3, xzr, #4, lt
+   add x8, x7, x2
+   cselx3, x3, xzr, eq
+   cselx2, x2, x8, eq
 
zip1v2.2d, v24.2d, v26.2d
zip2v6.2d, v24.2d, v26.2d
zip1v10.2d, v25.2d, v27.2d
zip2v14.2d, v25.2d, v27.2d
-   ld1 {v24.16b-v27.16b}, [x2], #64
+   ld1 {v24.16b-v27.16b}, [x2], x3
+
+   subsx8, x4, #256
+   ccmpx3, xzr, #4, lt
+   add x9, x8, x2
+   cselx2, x2, x9, eq
 
zip1v3.2d, v28.2d, v30.2d
zip2v7.2d, v28.2d, v30.2d
@@ -451,29 +482,155 @@ ENTRY(chacha_4block_xor_neon)
ld1 {v28.16b-v31.16b}, [x2]
 
// xor with corresponding input, write to output
+   tbnzx5, #63, 0f
eor v16.16b, v16.16b, v0.16b
eor v17.16b, v17.16b, v1.16b
eor v18.16b, v18.16b, v2.16b
eor v19.16b, v19.16b, v3.16b
+   st1 {v16.16b-v19.16b}, [x1], #64
+
+   tbnzx6, #63, 1f
eor v20.16b, v20.16b, v4.16b
eor v21.16b, v21.16b, v5.16b

[crypto chcr 2/2] ESN for Inline IPSec Tx

2018-11-30 Thread Atul Gupta
Send SPI, 64b seq nos and 64b IV with aadiv drop for inline crypto.
This information is added in outgoing packet after the CPL TX PKT XT
and removed by hardware.
The aad, auth and cipher offsets are then adjusted for ESN enabled tunnel.

Signed-off-by: Atul Gupta 
---
 drivers/crypto/chelsio/chcr_core.h  |   9 ++
 drivers/crypto/chelsio/chcr_ipsec.c | 175 
 2 files changed, 148 insertions(+), 36 deletions(-)

diff --git a/drivers/crypto/chelsio/chcr_core.h 
b/drivers/crypto/chelsio/chcr_core.h
index de3a9c0..4616663 100644
--- a/drivers/crypto/chelsio/chcr_core.h
+++ b/drivers/crypto/chelsio/chcr_core.h
@@ -159,8 +159,17 @@ struct chcr_ipsec_wr {
struct chcr_ipsec_req req;
 };
 
+#define ESN_IV_INSERT_OFFSET 12
+struct chcr_ipsec_aadiv {
+   __be32 spi;
+   u8 seq_no[8];
+   u8 iv[8];
+};
+
 struct ipsec_sa_entry {
int hmac_ctrl;
+   u16 esn;
+   u16 imm;
unsigned int enckey_len;
unsigned int kctx_len;
unsigned int authsize;
diff --git a/drivers/crypto/chelsio/chcr_ipsec.c 
b/drivers/crypto/chelsio/chcr_ipsec.c
index 1ff8738..9321d2b 100644
--- a/drivers/crypto/chelsio/chcr_ipsec.c
+++ b/drivers/crypto/chelsio/chcr_ipsec.c
@@ -76,12 +76,14 @@
 static void chcr_xfrm_del_state(struct xfrm_state *x);
 static void chcr_xfrm_free_state(struct xfrm_state *x);
 static bool chcr_ipsec_offload_ok(struct sk_buff *skb, struct xfrm_state *x);
+static void chcr_advance_esn_state(struct xfrm_state *x);
 
 static const struct xfrmdev_ops chcr_xfrmdev_ops = {
.xdo_dev_state_add  = chcr_xfrm_add_state,
.xdo_dev_state_delete   = chcr_xfrm_del_state,
.xdo_dev_state_free = chcr_xfrm_free_state,
.xdo_dev_offload_ok = chcr_ipsec_offload_ok,
+   .xdo_dev_state_advance_esn = chcr_advance_esn_state,
 };
 
 /* Add offload xfrms to Chelsio Interface */
@@ -210,10 +212,6 @@ static int chcr_xfrm_add_state(struct xfrm_state *x)
pr_debug("CHCR: Cannot offload compressed xfrm states\n");
return -EINVAL;
}
-   if (x->props.flags & XFRM_STATE_ESN) {
-   pr_debug("CHCR: Cannot offload ESN xfrm states\n");
-   return -EINVAL;
-   }
if (x->props.family != AF_INET &&
x->props.family != AF_INET6) {
pr_debug("CHCR: Only IPv4/6 xfrm state offloaded\n");
@@ -266,6 +264,8 @@ static int chcr_xfrm_add_state(struct xfrm_state *x)
}
 
sa_entry->hmac_ctrl = chcr_ipsec_setauthsize(x, sa_entry);
+   if (x->props.flags & XFRM_STATE_ESN)
+   sa_entry->esn = 1;
chcr_ipsec_setkey(x, sa_entry);
x->xso.offload_handle = (unsigned long)sa_entry;
try_module_get(THIS_MODULE);
@@ -294,31 +294,57 @@ static void chcr_xfrm_free_state(struct xfrm_state *x)
 
 static bool chcr_ipsec_offload_ok(struct sk_buff *skb, struct xfrm_state *x)
 {
-   /* Offload with IP options is not supported yet */
-   if (ip_hdr(skb)->ihl > 5)
-   return false;
-
+   if (x->props.family == AF_INET) {
+   /* Offload with IP options is not supported yet */
+   if (ip_hdr(skb)->ihl > 5)
+   return false;
+   } else {
+   /* Offload with IPv6 extension headers is not support yet */
+   if (ipv6_ext_hdr(ipv6_hdr(skb)->nexthdr))
+   return false;
+   }
return true;
 }
 
-static inline int is_eth_imm(const struct sk_buff *skb, unsigned int kctx_len)
+static void chcr_advance_esn_state(struct xfrm_state *x)
+{
+   /* do nothing */
+   if (!x->xso.offload_handle)
+   return;
+}
+
+static inline int is_eth_imm(const struct sk_buff *skb,
+struct ipsec_sa_entry *sa_entry)
 {
+   unsigned int kctx_len;
int hdrlen;
 
+   kctx_len = sa_entry->kctx_len;
hdrlen = sizeof(struct fw_ulptx_wr) +
 sizeof(struct chcr_ipsec_req) + kctx_len;
 
hdrlen += sizeof(struct cpl_tx_pkt);
+   if (sa_entry->esn)
+   hdrlen += (DIV_ROUND_UP(sizeof(struct chcr_ipsec_aadiv), 16)
+  << 4);
if (skb->len <= MAX_IMM_TX_PKT_LEN - hdrlen)
return hdrlen;
return 0;
 }
 
 static inline unsigned int calc_tx_sec_flits(const struct sk_buff *skb,
-unsigned int kctx_len)
+struct ipsec_sa_entry *sa_entry)
 {
+   unsigned int kctx_len;
unsigned int flits;
-   int hdrlen = is_eth_imm(skb, kctx_len);
+   int aadivlen;
+   int hdrlen;
+
+   kctx_len = sa_entry->kctx_len;
+   hdrlen = is_eth_imm(skb, sa_entry);
+   aadivlen = sa_entry->esn ? DIV_ROUND_UP(sizeof(struct chcr_ipsec_aadiv),
+   16) : 0;
+   aadivlen <<= 4;
 
/* If the skb is small enough, we can pump it out as a work 

[crypto chcr 1/2] small packet Tx stalls the queue

2018-11-30 Thread Atul Gupta
Immediate packets sent to hardware should include the work
request length in calculating the flits. WR occupy one flit and
if not accounted result in invalid request which stalls the HW
queue.

Cc: sta...@vger.kernel.org
Signed-off-by: Atul Gupta 
---
 drivers/crypto/chelsio/chcr_ipsec.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/crypto/chelsio/chcr_ipsec.c 
b/drivers/crypto/chelsio/chcr_ipsec.c
index 461b97e..1ff8738 100644
--- a/drivers/crypto/chelsio/chcr_ipsec.c
+++ b/drivers/crypto/chelsio/chcr_ipsec.c
@@ -303,7 +303,10 @@ static bool chcr_ipsec_offload_ok(struct sk_buff *skb, 
struct xfrm_state *x)
 
 static inline int is_eth_imm(const struct sk_buff *skb, unsigned int kctx_len)
 {
-   int hdrlen = sizeof(struct chcr_ipsec_req) + kctx_len;
+   int hdrlen;
+
+   hdrlen = sizeof(struct fw_ulptx_wr) +
+sizeof(struct chcr_ipsec_req) + kctx_len;
 
hdrlen += sizeof(struct cpl_tx_pkt);
if (skb->len <= MAX_IMM_TX_PKT_LEN - hdrlen)
-- 
1.8.3.1



Re: [PATCH 0/3] crypto: x86/chacha20 - AVX-512VL block functions

2018-11-29 Thread Herbert Xu
On Tue, Nov 20, 2018 at 05:30:47PM +0100, Martin Willi wrote:
> In the quest for pushing the limits of chacha20 encryption for both IPsec
> and Wireguard, this small series adds AVX-512VL block functions. The VL
> variant works on 256-bit ymm registers, but compared to AVX2 can benefit
> from the new instructions.
> 
> Compared to the AVX2 version, these block functions bring an overall
> speed improvement across encryption lengths of ~20%. Below the tcrypt
> results for additional block sizes in kOps/s, for the current AVX2
> code path, the new AVX-512VL code path and the comparison to Zinc in
> AVX2 and AVX-512VL. All numbers from a Xeon Platinum 8168 (2.7GHz).
> 
> These numbers result in a very nice chart, available at:
>   https://download.strongswan.org/misc/chacha-avx-512vl.svg
> 
>  zinc   zinc
>  len   avx2  512vl   avx2  512vl
>8   5719   5672   5468   5612
>   16   5675   5627   5355   5621
>   24   5687   5601   5322   5633
>   32   5667   5622   5244   5564
>   40   5603   5582   5337   5578
>   48   5638   5539   5400   5556
>   56   5624   5566   5375   5482
>   64   5590   5573   5352   5531
>   72   4841   5467   3365   3457
>   80   5316   5761   3310   3381
>   88   4798   5470   3239   3343
>   96   5324   5723   3197   3281
>  104   4819   5460   3155   3232
>  112   5266   5749   3020   3195
>  120   4776   5391   2959   3145
>  128   5291   5723   3398   3489
>  136   4122   4837   3321   3423
>  144   4507   5057   3247   3389
>  152   4139   4815   3233   3329
>  160   4482   5043   3159   3256
>  168   4142   4766   3131   3224
>  176   4506   5028   3073   3162
>  184   4119   4772   3010   3109
>  192   4499   5016   3402   3502
>  200   4127   4766   3329   3448
>  208   4452   5012   3276   3371
>  216   4128   4744   3243   3334
>  224   4484   5008   3203   3298
>  232   4103   4772   3141   3237
>  240   4458   4963   3115   3217
>  248   4121   4751   3085   3177
>  256   4461   4987   3364   4046
>  264   3406   4282   3270   4006
>  272   3408   4287   3207   3961
>  280   3371   4271   3203   3825
>  288   3625   4301   3129   3751
>  296   3402   4283   3093   3688
>  304   3401   4247   3062   3637
>  312   3382   4282   2995   3614
>  320   3611   4279   3305   4070
>  328   3386   4260   3276   3968
>  336   3369   4288   3171   3929
>  344   3389   4289   3134   3847
>  352   3609   4266   3127   3720
>  360   3355   4252   3076   3692
>  368   3387   4264   3048   3650
>  376   3387   4238   2967   3553
>  384   3568   4265   3277   4035
>  392   3369   4262   3299   3973
>  400   3362   4235   3239   3899
>  408   3352   4269   3196   3843
>  416   3585   4243   3127   3736
>  424   3364   4216   3092   3672
>  432   3341   4246   3067   3628
>  440   3353   4235   3018   3593
>  448   3538   4245   3327   4035
>  456   3322   4244   3275   3900
>  464   3340   4237   3212   3880
>  472   3330   4242   3054   3802
>  480   3530   4234   3078   3707
>  488   3337   4228   3094   3664
>  496   3330   4223   3015   3591
>  504   3317   4214   3002   3517
>  512   3531   4197   3339   4016
>  520   2511   3101   2030   2682
>  528   2627   3087   2027   2641
>  536   2508   3102   2001   2601
>  544   2638   3090   1964   2564
>  552   2494   3077   1962   2516
>  560   2625   3064   1941   2515
>  568   2500   3086   1922   2493
>  576   2611   3074   2050   2689
>  584   2482   3062   2041   2680
>  592   2595   3074   2026   2644
>  600   2470   3060   1985   2595
>  608   2581   3039   1961   2555
>  616   2478   3062   1956   2521
>  624   2587   3066   1930   2493
>  632   2457   3053   1923   2486
>  640   2581   3050   2059   2712
>  648   2296   2839   2024   2655
>  656   2389   2845   2019   2642
>  664   2292   2842   2002   2610
>  672   2404   2838   1959   2537
>  680   2273   2827   1956   2527
>  688   2389   2840   1938   2510
>  696   2280   2837   1911   2463
>  704   2370   2819   2055   2702
>  712   2277   2834   2029   2663
>  720   2369   2829   2020   2625
>  728   2255   2820   2001   2600
>  736   2373   2819   1958   2543
>  744   2269   2827   1956   2524
>  752   2364   2817   1937   2492
>  760   2270   2805   1909   2483
>  768   2378   2820   2050   2696
>  776   2053   2700   2002   2643
>  784   2066   2693   1922   2640
>  792   2065   2703   1928   2602
>  800   2138   2706   1962   2535
>  808   2065   2679   1938   2528
>  816   2063   2699   1929   2500
>  824   2053   2676   1915   2468
>  832   2149   2692   2036   2693
>  840   2055   2689   2024   2659
>  848   2049   2689   2006   2610
>  856   2057   2702   1979   2585
>  864   2144   2703   1960   2547
>  872   2047   2685   1945   2501
>  880   2055   2683   1902   2497
>  888   2060   2689   1897   2478
>  896   2139   2693   2023   2663
>  904   2049   2686   1970   2644
>  912   2055   2688   1925   2621
>  920   2047   2685   1911   2572
>  928   2114   2695   1907   2545
>  936   2055   2681   1927   2492
>  944   2055   2693   1930   2478

Re: [Help] Null pointer exception in scatterwalk_start() in kernel-4.9

2018-11-28 Thread Herbert Xu
On Tue, Nov 20, 2018 at 07:09:53AM +, gongchen (E) wrote:
> Hi Dear Herbert,
> 
> Sorry to bother you , but we’ve met a problem in crypto module, 
> would you please kindly help us look into it ? Thank you very much.
> 
>  In the below function chain, scatterwalk_start() doesn't check 
> the result of sg_next(), so the kernel will crash if sg_next() returns a null 
> pointer, which is our case. (The full stack is at the end of letter)
>  
> blkcipher_walk_done()->scatterwalk_done()->scatterwalk_pagedone()->scatterwalk_start(walk,
>  sg_next(walk->sg));
> 
> Should we add a null-pointer-check in scatterwalk_start()? Or is 
> there any process can ensure that there should be a valid sg pointer if the 
> condition (walk->offset >= walk->sg->offset + walk->sg->length) is true?
>   
> We are really looking forward to your reply, any information will 
> be appreciated , thanks again.

Did you apply the following patch?

commit 0868def3e4100591e7a1fdbf3eed1439cc8f7ca3
Author: Eric Biggers 
Date:   Mon Jul 23 10:54:57 2018 -0700

crypto: blkcipher - fix crash flushing dcache in error path

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


[no subject]

2018-11-26 Thread Offer!
-- 
Guten Tag, Wir sind eine registrierte private Geldverleiher. Wir geben
Kredite an Firmen, Einzelpersonen, die ihre finanzielle Status auf der
ganzen Welt aktualisieren müssen, mit minimalen jährlichen Zinsen von
2% .reply, wenn nötig.

Good Day, We are a registered private money lender. We give out loans
to firms, Individual who need to update their financial status all
over the world, with Minimal annual Interest Rates of 2%.reply if
needed.


[PATCH 1/1] cavium: Update firmware for CNN55XX crypto driver

2018-11-22 Thread Nagadheeraj, Rottela
Firmware upgraded to v10

Signed-off-by: Nagadheeraj Rottela 
---
 WHENCE   |   2 +-
 cavium/cnn55xx_se.fw | Bin 27698 -> 35010 bytes
 2 files changed, 1 insertion(+), 1 deletion(-)

diff --git a/WHENCE b/WHENCE
index a188c0d..ed10d5b 100644
--- a/WHENCE
+++ b/WHENCE
@@ -3586,7 +3586,7 @@ Licence: Redistributable. See LICENCE.cavium_liquidio for 
details
 Driver: nitrox -- Cavium CNN55XX crypto driver
 
 File: cavium/cnn55xx_se.fw
-Version: v07
+Version: v10
 
 Licence: Redistributable. See LICENCE.cavium for details
 
diff --git a/cavium/cnn55xx_se.fw b/cavium/cnn55xx_se.fw
index 
076e270d383488e7ae67e8f4224a519c4173bedc..bc3c4d070625794b0b5aa61df48e186502a75e1d
 100644
GIT binary patch
literal 35010
zcmcG$30M=?+5kL>kQqWk5W%fllRzK|VHLEupSEMIt@PR!QR{-XsGy+W4sC6#Ljr<|
zYhBQ5TUN;qDpq@48bSa!Hd%^RB?+sFpjNA`Apd)2pxWDe@Be+z|NMELoS8ZMdC$9@
z_q^w@rhf22=x2WKPxYHM&2QGM8GeC*L4nxIzc*Pge{m>B%F(1$l~S%ah97eJZP!q4
z80lzor1mW4j+|ogfVaU^4uw;giEYv2FaMP)M~Y%${2hftr@y7g^YgdIWe4%Ufl$c*
zW@}tF#&=AQ%N_}7LjG7f)(^sUbiQSLKD>{Cv^ah!ykT?>rUVnZLs|@90%>vl{Se0R
zPty5N4DXLR*Yl^r_XfTrw6~bf=L%`d=sK73-=|BUUNCh!(B)7M|5}dtFXeijf(FL0
zhVJ*v?P33=+#|XiYVWV*oSq_E#n7VxNJ8nw`~(O$(C_ylZ5H#3NQA
z#nZLL&}kd!zA)w9{x9XGLs|s?n{9E~5!!q7Gh`f6aOVG9MV4GzYnoX
z=)SGy#{hIz(mj|BY0UUu9>b?}zIl)~h_wjfE9uzJX$UNYw7-slDHru$%0<)dErQyF
z{7ML!Ia$n4h4)xUTTjDc^#IPJ*=I%>4{0ys4+HDALYfnQ4%D`sAG#|p`z-$(2nqiCJL9q`NL$Xoxg#z+m7fUV
zCB7shF8hD@li@uD(pK@?pw4x4iGBRXP|lZM24Om+eaQENT(9#D@I98_2mN|L_aFn(
zT=+ZSdozDOgzfwU$XCnvfSPwhnj`-vyk*c>^ag(s)IW}%#eI;b;`ae>(D5swZ
zq2`tRdr&$P(yaWOP}?Ye0;Fx?e*^hG=kuZT5lB13Uj(s>`1hb@M?N3koA{66Jqywf
z^F3hfa{l{}_e;8mINty{@*r(Ke+T4Rz~@74Yxt>LjnkoGd1UWQ9Oq`eG_m`)llWc$tTELYgN%
ztIPR*kXOflAL{6#aibBXCE8=R~oPRxsU~
zoc-Va6wA+aHr0wtOdgaK$N25iHL-7obgT$+r?@!*CfkIN8u-8IK93;M{Y*kK3(vxP
zpVup~{|~*+tuj?p+lq4W&5rqxjMCOMVTK9r8;nmV_*j1zYEYBxi?-i<~Y+MUIIK

Re: [PATCH 0/6] crypto: x86/chacha20 - SIMD performance improvements

2018-11-20 Thread Jason A. Donenfeld
Hi Martin,

On Tue, Nov 20, 2018 at 5:29 PM Martin Willi  wrote:
> Thanks for the offer, no need at this time. But I certainly would
> welcome if you could do some (Wireguard) benching with that code to see
> if it works for you.

I certainly will test it in a few different network circumstances,
especially since real testing like this is sometimes more telling than
busy-loop benchmarks.

> > Actually, similarly here, a 10nm Cannon Lake machine should be
> > arriving at my house this week, which should make for some
> > interesting testing ground for non-throttled zmm, if you'd like to
> > play with it.
>
> Maybe in a future iteration, thanks. In fact would it be interesting to
> know if Cannon Lake can handle that throttling better.

Everything I've read on the Internet seems to indicate that's the
case, so one of the first things I'll be doing is seeing if that's
true. There are also the AVX512 IFMA instructions to play with!

Jason


[PATCH 3/3] crypto: x86/chacha20 - Add a 4-block AVX-512VL variant

2018-11-20 Thread Martin Willi
This version uses the same principle as the AVX2 version by scheduling the
operations for two block pairs in parallel. It benefits from the AVX-512VL
rotate instructions and the more efficient partial block handling using
"vmovdqu8", resulting in a speedup of the raw block function of ~20%.

Signed-off-by: Martin Willi 
---
 arch/x86/crypto/chacha20-avx512vl-x86_64.S | 272 +
 arch/x86/crypto/chacha20_glue.c|   7 +
 2 files changed, 279 insertions(+)

diff --git a/arch/x86/crypto/chacha20-avx512vl-x86_64.S 
b/arch/x86/crypto/chacha20-avx512vl-x86_64.S
index 261097578715..55d34de29e3e 100644
--- a/arch/x86/crypto/chacha20-avx512vl-x86_64.S
+++ b/arch/x86/crypto/chacha20-avx512vl-x86_64.S
@@ -12,6 +12,11 @@
 CTR2BL:.octa 0x
.octa 0x0001
 
+.section   .rodata.cst32.CTR4BL, "aM", @progbits, 32
+.align 32
+CTR4BL:.octa 0x0002
+   .octa 0x0003
+
 .section   .rodata.cst32.CTR8BL, "aM", @progbits, 32
 .align 32
 CTR8BL:.octa 0x000300020001
@@ -185,6 +190,273 @@ ENTRY(chacha20_2block_xor_avx512vl)
 
 ENDPROC(chacha20_2block_xor_avx512vl)
 
+ENTRY(chacha20_4block_xor_avx512vl)
+   # %rdi: Input state matrix, s
+   # %rsi: up to 4 data blocks output, o
+   # %rdx: up to 4 data blocks input, i
+   # %rcx: input/output length in bytes
+
+   # This function encrypts four ChaCha20 block by loading the state
+   # matrix four times across eight AVX registers. It performs matrix
+   # operations on four words in two matrices in parallel, sequentially
+   # to the operations on the four words of the other two matrices. The
+   # required word shuffling has a rather high latency, we can do the
+   # arithmetic on two matrix-pairs without much slowdown.
+
+   vzeroupper
+
+   # x0..3[0-4] = s0..3
+   vbroadcasti128  0x00(%rdi),%ymm0
+   vbroadcasti128  0x10(%rdi),%ymm1
+   vbroadcasti128  0x20(%rdi),%ymm2
+   vbroadcasti128  0x30(%rdi),%ymm3
+
+   vmovdqa %ymm0,%ymm4
+   vmovdqa %ymm1,%ymm5
+   vmovdqa %ymm2,%ymm6
+   vmovdqa %ymm3,%ymm7
+
+   vpaddd  CTR2BL(%rip),%ymm3,%ymm3
+   vpaddd  CTR4BL(%rip),%ymm7,%ymm7
+
+   vmovdqa %ymm0,%ymm11
+   vmovdqa %ymm1,%ymm12
+   vmovdqa %ymm2,%ymm13
+   vmovdqa %ymm3,%ymm14
+   vmovdqa %ymm7,%ymm15
+
+   mov $10,%rax
+
+.Ldoubleround4:
+
+   # x0 += x1, x3 = rotl32(x3 ^ x0, 16)
+   vpaddd  %ymm1,%ymm0,%ymm0
+   vpxord  %ymm0,%ymm3,%ymm3
+   vprold  $16,%ymm3,%ymm3
+
+   vpaddd  %ymm5,%ymm4,%ymm4
+   vpxord  %ymm4,%ymm7,%ymm7
+   vprold  $16,%ymm7,%ymm7
+
+   # x2 += x3, x1 = rotl32(x1 ^ x2, 12)
+   vpaddd  %ymm3,%ymm2,%ymm2
+   vpxord  %ymm2,%ymm1,%ymm1
+   vprold  $12,%ymm1,%ymm1
+
+   vpaddd  %ymm7,%ymm6,%ymm6
+   vpxord  %ymm6,%ymm5,%ymm5
+   vprold  $12,%ymm5,%ymm5
+
+   # x0 += x1, x3 = rotl32(x3 ^ x0, 8)
+   vpaddd  %ymm1,%ymm0,%ymm0
+   vpxord  %ymm0,%ymm3,%ymm3
+   vprold  $8,%ymm3,%ymm3
+
+   vpaddd  %ymm5,%ymm4,%ymm4
+   vpxord  %ymm4,%ymm7,%ymm7
+   vprold  $8,%ymm7,%ymm7
+
+   # x2 += x3, x1 = rotl32(x1 ^ x2, 7)
+   vpaddd  %ymm3,%ymm2,%ymm2
+   vpxord  %ymm2,%ymm1,%ymm1
+   vprold  $7,%ymm1,%ymm1
+
+   vpaddd  %ymm7,%ymm6,%ymm6
+   vpxord  %ymm6,%ymm5,%ymm5
+   vprold  $7,%ymm5,%ymm5
+
+   # x1 = shuffle32(x1, MASK(0, 3, 2, 1))
+   vpshufd $0x39,%ymm1,%ymm1
+   vpshufd $0x39,%ymm5,%ymm5
+   # x2 = shuffle32(x2, MASK(1, 0, 3, 2))
+   vpshufd $0x4e,%ymm2,%ymm2
+   vpshufd $0x4e,%ymm6,%ymm6
+   # x3 = shuffle32(x3, MASK(2, 1, 0, 3))
+   vpshufd $0x93,%ymm3,%ymm3
+   vpshufd $0x93,%ymm7,%ymm7
+
+   # x0 += x1, x3 = rotl32(x3 ^ x0, 16)
+   vpaddd  %ymm1,%ymm0,%ymm0
+   vpxord  %ymm0,%ymm3,%ymm3
+   vprold  $16,%ymm3,%ymm3
+
+   vpaddd  %ymm5,%ymm4,%ymm4
+   vpxord  %ymm4,%ymm7,%ymm7
+   vprold  $16,%ymm7,%ymm7
+
+   # x2 += x3, x1 = rotl32(x1 ^ x2, 12)
+   vpaddd  %ymm3,%ymm2,%ymm2
+   vpxord  %ymm2,%ymm1,%ymm1
+   vprold  $12,%ymm1,%ymm1
+
+   vpaddd  %ymm7,%ymm6,%ymm6
+   vpxord  %ymm6,%ymm5,%ymm5
+   vprold  $12,%ymm5,%ymm5
+
+   # x0 += x1, x3 = rotl32(x3 ^ x0, 8)
+   vpaddd  %ymm1,%ymm0,%ymm0
+   vpxord  %ymm0,%ymm3,%ymm3
+   vprold  $8,%ymm3,%ymm3
+
+   vpaddd  

[PATCH 0/3] crypto: x86/chacha20 - AVX-512VL block functions

2018-11-20 Thread Martin Willi
In the quest for pushing the limits of chacha20 encryption for both IPsec
and Wireguard, this small series adds AVX-512VL block functions. The VL
variant works on 256-bit ymm registers, but compared to AVX2 can benefit
from the new instructions.

Compared to the AVX2 version, these block functions bring an overall
speed improvement across encryption lengths of ~20%. Below the tcrypt
results for additional block sizes in kOps/s, for the current AVX2
code path, the new AVX-512VL code path and the comparison to Zinc in
AVX2 and AVX-512VL. All numbers from a Xeon Platinum 8168 (2.7GHz).

These numbers result in a very nice chart, available at:
  https://download.strongswan.org/misc/chacha-avx-512vl.svg

 zinc   zinc
 len   avx2  512vl   avx2  512vl
   8   5719   5672   5468   5612
  16   5675   5627   5355   5621
  24   5687   5601   5322   5633
  32   5667   5622   5244   5564
  40   5603   5582   5337   5578
  48   5638   5539   5400   5556
  56   5624   5566   5375   5482
  64   5590   5573   5352   5531
  72   4841   5467   3365   3457
  80   5316   5761   3310   3381
  88   4798   5470   3239   3343
  96   5324   5723   3197   3281
 104   4819   5460   3155   3232
 112   5266   5749   3020   3195
 120   4776   5391   2959   3145
 128   5291   5723   3398   3489
 136   4122   4837   3321   3423
 144   4507   5057   3247   3389
 152   4139   4815   3233   3329
 160   4482   5043   3159   3256
 168   4142   4766   3131   3224
 176   4506   5028   3073   3162
 184   4119   4772   3010   3109
 192   4499   5016   3402   3502
 200   4127   4766   3329   3448
 208   4452   5012   3276   3371
 216   4128   4744   3243   3334
 224   4484   5008   3203   3298
 232   4103   4772   3141   3237
 240   4458   4963   3115   3217
 248   4121   4751   3085   3177
 256   4461   4987   3364   4046
 264   3406   4282   3270   4006
 272   3408   4287   3207   3961
 280   3371   4271   3203   3825
 288   3625   4301   3129   3751
 296   3402   4283   3093   3688
 304   3401   4247   3062   3637
 312   3382   4282   2995   3614
 320   3611   4279   3305   4070
 328   3386   4260   3276   3968
 336   3369   4288   3171   3929
 344   3389   4289   3134   3847
 352   3609   4266   3127   3720
 360   3355   4252   3076   3692
 368   3387   4264   3048   3650
 376   3387   4238   2967   3553
 384   3568   4265   3277   4035
 392   3369   4262   3299   3973
 400   3362   4235   3239   3899
 408   3352   4269   3196   3843
 416   3585   4243   3127   3736
 424   3364   4216   3092   3672
 432   3341   4246   3067   3628
 440   3353   4235   3018   3593
 448   3538   4245   3327   4035
 456   3322   4244   3275   3900
 464   3340   4237   3212   3880
 472   3330   4242   3054   3802
 480   3530   4234   3078   3707
 488   3337   4228   3094   3664
 496   3330   4223   3015   3591
 504   3317   4214   3002   3517
 512   3531   4197   3339   4016
 520   2511   3101   2030   2682
 528   2627   3087   2027   2641
 536   2508   3102   2001   2601
 544   2638   3090   1964   2564
 552   2494   3077   1962   2516
 560   2625   3064   1941   2515
 568   2500   3086   1922   2493
 576   2611   3074   2050   2689
 584   2482   3062   2041   2680
 592   2595   3074   2026   2644
 600   2470   3060   1985   2595
 608   2581   3039   1961   2555
 616   2478   3062   1956   2521
 624   2587   3066   1930   2493
 632   2457   3053   1923   2486
 640   2581   3050   2059   2712
 648   2296   2839   2024   2655
 656   2389   2845   2019   2642
 664   2292   2842   2002   2610
 672   2404   2838   1959   2537
 680   2273   2827   1956   2527
 688   2389   2840   1938   2510
 696   2280   2837   1911   2463
 704   2370   2819   2055   2702
 712   2277   2834   2029   2663
 720   2369   2829   2020   2625
 728   2255   2820   2001   2600
 736   2373   2819   1958   2543
 744   2269   2827   1956   2524
 752   2364   2817   1937   2492
 760   2270   2805   1909   2483
 768   2378   2820   2050   2696
 776   2053   2700   2002   2643
 784   2066   2693   1922   2640
 792   2065   2703   1928   2602
 800   2138   2706   1962   2535
 808   2065   2679   1938   2528
 816   2063   2699   1929   2500
 824   2053   2676   1915   2468
 832   2149   2692   2036   2693
 840   2055   2689   2024   2659
 848   2049   2689   2006   2610
 856   2057   2702   1979   2585
 864   2144   2703   1960   2547
 872   2047   2685   1945   2501
 880   2055   2683   1902   2497
 888   2060   2689   1897   2478
 896   2139   2693   2023   2663
 904   2049   2686   1970   2644
 912   2055   2688   1925   2621
 920   2047   2685   1911   2572
 928   2114   2695   1907   2545
 936   2055   2681   1927   2492
 944   2055   2693   1930   2478
 952   2042   2688   1909   2471
 960   2136   2682   2014   2672
 968   2054   2687   1999   2626
 976   2040   2682   1982   2598
 984   2055   2687   1943   2569
 992   2138   2694   1884   2522
1000   2036   2681   1929   2506
1008   2052   2676   1926   2475
1016   2050   2686   1889   2430
1024   2125   2670   2039   2656

[PATCH 2/3] crypto: x86/chacha20 - Add a 2-block AVX-512VL variant

2018-11-20 Thread Martin Willi
This version uses the same principle as the AVX2 version. It benefits
from the AVX-512VL rotate instructions and the more efficient partial
block handling using "vmovdqu8", resulting in a speedup of ~20%.

Unlike the AVX2 version, it is faster than the single block SSSE3 version
to process a single block. Hence we engage that function for (partial)
single block lengths as well.

Signed-off-by: Martin Willi 
---
 arch/x86/crypto/chacha20-avx512vl-x86_64.S | 171 +
 arch/x86/crypto/chacha20_glue.c|   7 +
 2 files changed, 178 insertions(+)

diff --git a/arch/x86/crypto/chacha20-avx512vl-x86_64.S 
b/arch/x86/crypto/chacha20-avx512vl-x86_64.S
index e1877afcaa73..261097578715 100644
--- a/arch/x86/crypto/chacha20-avx512vl-x86_64.S
+++ b/arch/x86/crypto/chacha20-avx512vl-x86_64.S
@@ -7,6 +7,11 @@
 
 #include 
 
+.section   .rodata.cst32.CTR2BL, "aM", @progbits, 32
+.align 32
+CTR2BL:.octa 0x
+   .octa 0x0001
+
 .section   .rodata.cst32.CTR8BL, "aM", @progbits, 32
 .align 32
 CTR8BL:.octa 0x000300020001
@@ -14,6 +19,172 @@ CTR8BL: .octa 0x000300020001
 
 .text
 
+ENTRY(chacha20_2block_xor_avx512vl)
+   # %rdi: Input state matrix, s
+   # %rsi: up to 2 data blocks output, o
+   # %rdx: up to 2 data blocks input, i
+   # %rcx: input/output length in bytes
+
+   # This function encrypts two ChaCha20 blocks by loading the state
+   # matrix twice across four AVX registers. It performs matrix operations
+   # on four words in each matrix in parallel, but requires shuffling to
+   # rearrange the words after each round.
+
+   vzeroupper
+
+   # x0..3[0-2] = s0..3
+   vbroadcasti128  0x00(%rdi),%ymm0
+   vbroadcasti128  0x10(%rdi),%ymm1
+   vbroadcasti128  0x20(%rdi),%ymm2
+   vbroadcasti128  0x30(%rdi),%ymm3
+
+   vpaddd  CTR2BL(%rip),%ymm3,%ymm3
+
+   vmovdqa %ymm0,%ymm8
+   vmovdqa %ymm1,%ymm9
+   vmovdqa %ymm2,%ymm10
+   vmovdqa %ymm3,%ymm11
+
+   mov $10,%rax
+
+.Ldoubleround:
+
+   # x0 += x1, x3 = rotl32(x3 ^ x0, 16)
+   vpaddd  %ymm1,%ymm0,%ymm0
+   vpxord  %ymm0,%ymm3,%ymm3
+   vprold  $16,%ymm3,%ymm3
+
+   # x2 += x3, x1 = rotl32(x1 ^ x2, 12)
+   vpaddd  %ymm3,%ymm2,%ymm2
+   vpxord  %ymm2,%ymm1,%ymm1
+   vprold  $12,%ymm1,%ymm1
+
+   # x0 += x1, x3 = rotl32(x3 ^ x0, 8)
+   vpaddd  %ymm1,%ymm0,%ymm0
+   vpxord  %ymm0,%ymm3,%ymm3
+   vprold  $8,%ymm3,%ymm3
+
+   # x2 += x3, x1 = rotl32(x1 ^ x2, 7)
+   vpaddd  %ymm3,%ymm2,%ymm2
+   vpxord  %ymm2,%ymm1,%ymm1
+   vprold  $7,%ymm1,%ymm1
+
+   # x1 = shuffle32(x1, MASK(0, 3, 2, 1))
+   vpshufd $0x39,%ymm1,%ymm1
+   # x2 = shuffle32(x2, MASK(1, 0, 3, 2))
+   vpshufd $0x4e,%ymm2,%ymm2
+   # x3 = shuffle32(x3, MASK(2, 1, 0, 3))
+   vpshufd $0x93,%ymm3,%ymm3
+
+   # x0 += x1, x3 = rotl32(x3 ^ x0, 16)
+   vpaddd  %ymm1,%ymm0,%ymm0
+   vpxord  %ymm0,%ymm3,%ymm3
+   vprold  $16,%ymm3,%ymm3
+
+   # x2 += x3, x1 = rotl32(x1 ^ x2, 12)
+   vpaddd  %ymm3,%ymm2,%ymm2
+   vpxord  %ymm2,%ymm1,%ymm1
+   vprold  $12,%ymm1,%ymm1
+
+   # x0 += x1, x3 = rotl32(x3 ^ x0, 8)
+   vpaddd  %ymm1,%ymm0,%ymm0
+   vpxord  %ymm0,%ymm3,%ymm3
+   vprold  $8,%ymm3,%ymm3
+
+   # x2 += x3, x1 = rotl32(x1 ^ x2, 7)
+   vpaddd  %ymm3,%ymm2,%ymm2
+   vpxord  %ymm2,%ymm1,%ymm1
+   vprold  $7,%ymm1,%ymm1
+
+   # x1 = shuffle32(x1, MASK(2, 1, 0, 3))
+   vpshufd $0x93,%ymm1,%ymm1
+   # x2 = shuffle32(x2, MASK(1, 0, 3, 2))
+   vpshufd $0x4e,%ymm2,%ymm2
+   # x3 = shuffle32(x3, MASK(0, 3, 2, 1))
+   vpshufd $0x39,%ymm3,%ymm3
+
+   dec %rax
+   jnz .Ldoubleround
+
+   # o0 = i0 ^ (x0 + s0)
+   vpaddd  %ymm8,%ymm0,%ymm7
+   cmp $0x10,%rcx
+   jl  .Lxorpart2
+   vpxord  0x00(%rdx),%xmm7,%xmm6
+   vmovdqu %xmm6,0x00(%rsi)
+   vextracti128$1,%ymm7,%xmm0
+   # o1 = i1 ^ (x1 + s1)
+   vpaddd  %ymm9,%ymm1,%ymm7
+   cmp $0x20,%rcx
+   jl  .Lxorpart2
+   vpxord  0x10(%rdx),%xmm7,%xmm6
+   vmovdqu %xmm6,0x10(%rsi)
+   vextracti128$1,%ymm7,%xmm1
+   # o2 = i2 ^ (x2 + s2)
+   vpaddd  %ymm10,%ymm2,%ymm7
+   cmp $0x30,%rcx
+   jl  .Lxorpart2
+   vpxord  0x20(%rdx),%xmm7,%xmm6
+   vmovdqu %xmm6,0x20(%rsi)
+   vextracti128$1,%ymm7,%xmm2
+   # o3 = i3 ^ (x3 + 

[PATCH 1/3] crypto: x86/chacha20 - Add a 8-block AVX-512VL variant

2018-11-20 Thread Martin Willi
This variant is similar to the AVX2 version, but benefits from the AVX-512
rotate instructions and the additional registers, so it can operate without
any data on the stack. It uses ymm registers only to avoid the massive core
throttling on Skylake-X platforms. Nontheless does it bring a ~30% speed
improvement compared to the AVX2 variant for random encryption lengths.

The AVX2 version uses "rep movsb" for partial block XORing via the stack.
With AVX-512, the new "vmovdqu8" can do this much more efficiently. The
associated "kmov" instructions to work with dynamic masks is not part of
the AVX-512VL instruction set, hence we depend on AVX-512BW as well. Given
that the major AVX-512VL architectures provide AVX-512BW and this extension
does not affect core clocking, this seems to be no problem at least for
now.

Signed-off-by: Martin Willi 
---
 arch/x86/crypto/Makefile   |   5 +
 arch/x86/crypto/chacha20-avx512vl-x86_64.S | 396 +
 arch/x86/crypto/chacha20_glue.c|  26 ++
 3 files changed, 427 insertions(+)
 create mode 100644 arch/x86/crypto/chacha20-avx512vl-x86_64.S

diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile
index a4b0007a54e1..ce4e43642984 100644
--- a/arch/x86/crypto/Makefile
+++ b/arch/x86/crypto/Makefile
@@ -8,6 +8,7 @@ OBJECT_FILES_NON_STANDARD := y
 avx_supported := $(call as-instr,vpxor %xmm0$(comma)%xmm0$(comma)%xmm0,yes,no)
 avx2_supported := $(call as-instr,vpgatherdd %ymm0$(comma)(%eax$(comma)%ymm1\
$(comma)4)$(comma)%ymm2,yes,no)
+avx512_supported :=$(call as-instr,vpmovm2b %k1$(comma)%zmm5,yes,no)
 sha1_ni_supported :=$(call as-instr,sha1msg1 %xmm0$(comma)%xmm1,yes,no)
 sha256_ni_supported :=$(call as-instr,sha256msg1 %xmm0$(comma)%xmm1,yes,no)
 
@@ -103,6 +104,10 @@ ifeq ($(avx2_supported),yes)
morus1280-avx2-y := morus1280-avx2-asm.o morus1280-avx2-glue.o
 endif
 
+ifeq ($(avx512_supported),yes)
+   chacha20-x86_64-y += chacha20-avx512vl-x86_64.o
+endif
+
 aesni-intel-y := aesni-intel_asm.o aesni-intel_glue.o
 aesni-intel-$(CONFIG_64BIT) += aesni-intel_avx-x86_64.o aes_ctrby8_avx-x86_64.o
 ghash-clmulni-intel-y := ghash-clmulni-intel_asm.o ghash-clmulni-intel_glue.o
diff --git a/arch/x86/crypto/chacha20-avx512vl-x86_64.S 
b/arch/x86/crypto/chacha20-avx512vl-x86_64.S
new file mode 100644
index ..e1877afcaa73
--- /dev/null
+++ b/arch/x86/crypto/chacha20-avx512vl-x86_64.S
@@ -0,0 +1,396 @@
+/* SPDX-License-Identifier: GPL-2.0+ */
+/*
+ * ChaCha20 256-bit cipher algorithm, RFC7539, x64 AVX-512VL functions
+ *
+ * Copyright (C) 2018 Martin Willi
+ */
+
+#include 
+
+.section   .rodata.cst32.CTR8BL, "aM", @progbits, 32
+.align 32
+CTR8BL:.octa 0x000300020001
+   .octa 0x0007000600050004
+
+.text
+
+ENTRY(chacha20_8block_xor_avx512vl)
+   # %rdi: Input state matrix, s
+   # %rsi: up to 8 data blocks output, o
+   # %rdx: up to 8 data blocks input, i
+   # %rcx: input/output length in bytes
+
+   # This function encrypts eight consecutive ChaCha20 blocks by loading
+   # the state matrix in AVX registers eight times. Compared to AVX2, this
+   # mostly benefits from the new rotate instructions in VL and the
+   # additional registers.
+
+   vzeroupper
+
+   # x0..15[0-7] = s[0..15]
+   vpbroadcastd0x00(%rdi),%ymm0
+   vpbroadcastd0x04(%rdi),%ymm1
+   vpbroadcastd0x08(%rdi),%ymm2
+   vpbroadcastd0x0c(%rdi),%ymm3
+   vpbroadcastd0x10(%rdi),%ymm4
+   vpbroadcastd0x14(%rdi),%ymm5
+   vpbroadcastd0x18(%rdi),%ymm6
+   vpbroadcastd0x1c(%rdi),%ymm7
+   vpbroadcastd0x20(%rdi),%ymm8
+   vpbroadcastd0x24(%rdi),%ymm9
+   vpbroadcastd0x28(%rdi),%ymm10
+   vpbroadcastd0x2c(%rdi),%ymm11
+   vpbroadcastd0x30(%rdi),%ymm12
+   vpbroadcastd0x34(%rdi),%ymm13
+   vpbroadcastd0x38(%rdi),%ymm14
+   vpbroadcastd0x3c(%rdi),%ymm15
+
+   # x12 += counter values 0-3
+   vpaddd  CTR8BL(%rip),%ymm12,%ymm12
+
+   vmovdqa64   %ymm0,%ymm16
+   vmovdqa64   %ymm1,%ymm17
+   vmovdqa64   %ymm2,%ymm18
+   vmovdqa64   %ymm3,%ymm19
+   vmovdqa64   %ymm4,%ymm20
+   vmovdqa64   %ymm5,%ymm21
+   vmovdqa64   %ymm6,%ymm22
+   vmovdqa64   %ymm7,%ymm23
+   vmovdqa64   %ymm8,%ymm24
+   vmovdqa64   %ymm9,%ymm25
+   vmovdqa64   %ymm10,%ymm26
+   vmovdqa64   %ymm11,%ymm27
+   vmovdqa64   %ymm12,%ymm28
+   vmovdqa64   %ymm13,%ymm29
+   vmovdqa64   %ymm14,%ymm30
+   vmovdqa64   %ymm15,%ymm31
+
+   mov $10,%eax
+
+.Ldoubleround8:
+   # x0 += x4, x12 = rotl32(x12 ^ x0, 16)
+   vpaddd  %ymm0,%ymm4,%ymm0
+   vpxord  %ymm0,%ymm12,%ymm12
+   vprold  $16,%ymm12,%ymm12
+   # x1 += x5, x13 = rotl32(x13 ^ x1, 16)
+   

Re: [PATCH 0/6] crypto: x86/chacha20 - SIMD performance improvements

2018-11-20 Thread Martin Willi
Hi Jason,

> [...] I have a massive Xeon Gold 5120 machine that I can give you
> access to if you'd like to do some testing and benching.

Thanks for the offer, no need at this time. But I certainly would
welcome if you could do some (Wireguard) benching with that code to see
if it works for you.

> Actually, similarly here, a 10nm Cannon Lake machine should be
> arriving at my house this week, which should make for some
> interesting testing ground for non-throttled zmm, if you'd like to
> play with it.

Maybe in a future iteration, thanks. In fact would it be interesting to
know if Cannon Lake can handle that throttling better.

Regards
Martin



[Help] Null pointer exception in scatterwalk_start() in kernel-4.9

2018-11-19 Thread gongchen (E)
Hi Dear Herbert,

Sorry to bother you , but we’ve met a problem in crypto module, 
would you please kindly help us look into it ? Thank you very much.

 In the below function chain, scatterwalk_start() doesn't check the 
result of sg_next(), so the kernel will crash if sg_next() returns a null 
pointer, which is our case. (The full stack is at the end of letter)
 
blkcipher_walk_done()->scatterwalk_done()->scatterwalk_pagedone()->scatterwalk_start(walk,
 sg_next(walk->sg));

Should we add a null-pointer-check in scatterwalk_start()? Or is 
there any process can ensure that there should be a valid sg pointer if the 
condition (walk->offset >= walk->sg->offset + walk->sg->length) is true?
  
We are really looking forward to your reply, any information will 
be appreciated , thanks again.
  


  Best regards


   Chen Gong


2018.11.20

---
Full Stack:
<1>[395491.178009s][pid:29501,cpu4,Binder:708_A]Unable to handle kernel NULL 
pointer dereference at virtual address 0008
<1>[395491.178039s][pid:29501,cpu4,Binder:708_A]pgd = ffc112c27000
<1>[395491.178039s][pid:29501,cpu4,Binder:708_A][0008] 
*pgd=, *pud=
<0>[395491.178070s][pid:29501,cpu4,Binder:708_A]Internal error: Oops: 9605 
[#1] PREEMPT SMP
<4>[395491.178070s][pid:29501,cpu4,Binder:708_A]Modules linked in: hisi_dummy_ko
<4>[395491.178100s][pid:29501,cpu4,Binder:708_A]CPU: 4 PID: 29501 Comm: 
Binder:708_A VIP: 00 Tainted: GW   4.9.111 #1
<4>[395491.178100s][pid:29501,cpu4,Binder:708_A]TGID: 708 Comm: Binder:708_2
<4>[395491.178100s][pid:29501,cpu4,Binder:708_A]Hardware name: hi3660 (DT)
<4>[395491.178100s][pid:29501,cpu4,Binder:708_A]task: ffc1d43ec880 
task.stack: ffc3007e
<4>[395491.178100s][pid:29501,cpu4,Binder:708_A]PC is at 
blkcipher_walk_done+0x210/0x354
<4>[395491.178131s][pid:29501,cpu4,Binder:708_A]LR is at 
blkcipher_walk_done+0x20c/0x354
<4>[395491.178131s][pid:29501,cpu4,Binder:708_A]pc : [] lr : 
[] pstate: 6145
<4>[395491.178131s][pid:29501,cpu4,Binder:708_A]sp : ffc3007e3950
<4>[395491.178131s][pid:29501,cpu4,Binder:708_A]x29: ffc3007e3950 x28: 
 
<4>[395491.178161s][pid:29501,cpu4,Binder:708_A]x27: ffc1c6ef501e x26: 
0100 
<4>[395491.178161s][pid:29501,cpu4,Binder:708_A]x25: ffc3007e3b40 x24: 
ffc3007e3be8 
<4>[395491.178161s][pid:29501,cpu4,Binder:708_A]x23: 0001 x22: 
0500 
<4>[395491.178161s][pid:29501,cpu4,Binder:708_A]x21: ffc3007e3a90 x20: 
ffc3007e3a10 
<4>[395491.178192s][pid:29501,cpu4,Binder:708_A]x19: ffc3007e39d8 x18: 
0001 
<4>[395491.178192s][pid:29501,cpu4,Binder:708_A]x17: 0075aca06934 x16: 
ff9c1b032d10 
<4>[395491.178192s][pid:29501,cpu4,Binder:708_A]x15: 0075aaffe5b8 x14: 
 
<4>[395491.178222s][pid:29501,cpu4,Binder:708_A]x13: 0075ac08642d x12: 
0001 
<4>[395491.178222s][pid:29501,cpu4,Binder:708_A]x11:  x10: 
ffc3175e1680 
<4>[395491.178222s][pid:29501,cpu4,Binder:708_A]x9 : ff9c1d408000 x8 : 
 
<4>[395491.178253s][pid:29501,cpu4,Binder:708_A]x7 : ff9c1c28 x6 : 
0001 
<4>[395491.178253s][pid:29501,cpu4,Binder:708_A]x5 : ffc3007e3be8 x4 : 
 
<4>[395491.178253s][pid:29501,cpu4,Binder:708_A]x3 : 0100 x2 : 
0500 
<4>[395491.178253s][pid:29501,cpu4,Binder:708_A]x1 : ffc31aa934c2 x0 : 
 
<4>[395491.180725s][pid:29501,cpu4,Binder:708_A][] 
blkcipher_walk_done+0x210/0x354
<4>[395491.180755s][pid:29501,cpu4,Binder:708_A][] 
cbc_decrypt+0xa0/0xe8
<4>[395491.180755s][pid:29501,cpu4,Binder:708_A][] 
ablk_decrypt+0x78/0xf4
<4>[395491.180755s][pid:29501,cpu4,Binder:708_A][] 
skcipher_decrypt_ablkcipher+0x70/0x80
<4>[395491.180786s][pid:29501,cpu4,Binder:708_A][] 
crypto_cts_decrypt+0xf0/0x184
<4>[395491.180786s][pid:29501,cpu4,Binder:708_A][] 
fname_decrypt.isra.1+0x110/0x1d8
<4>[395491.180786s][pid:29501,cpu4,Binder:708_A][] 
fscrypt_fname_disk_to_usr+0x1d8/0x264
<4>[395491.180816s][pid:29501,cpu4,Binder:708_A][] 
f2fs_fill_dentries+0x13c/0x1d4

Re: [PATCH] crypto: drop mask=CRYPTO_ALG_ASYNC from 'shash' tfm allocations

2018-11-19 Thread Herbert Xu
On Wed, Nov 14, 2018 at 12:21:11PM -0800, Eric Biggers wrote:
> From: Eric Biggers 
> 
> 'shash' algorithms are always synchronous, so passing CRYPTO_ALG_ASYNC
> in the mask to crypto_alloc_shash() has no effect.  Many users therefore
> already don't pass it, but some still do.  This inconsistency can cause
> confusion, especially since the way the 'mask' argument works is
> somewhat counterintuitive.
> 
> Thus, just remove the unneeded CRYPTO_ALG_ASYNC flags.
> 
> This patch shouldn't change any actual behavior.
> 
> Signed-off-by: Eric Biggers 
> ---
>  drivers/block/drbd/drbd_receiver.c  | 2 +-
>  drivers/md/dm-integrity.c   | 2 +-
>  drivers/net/wireless/intersil/orinoco/mic.c | 6 ++
>  fs/ubifs/auth.c | 5 ++---
>  net/bluetooth/smp.c | 2 +-
>  security/apparmor/crypto.c  | 2 +-
>  security/integrity/evm/evm_crypto.c | 3 +--
>  security/keys/encrypted-keys/encrypted.c| 4 ++--
>  security/keys/trusted.c | 4 ++--
>  9 files changed, 13 insertions(+), 17 deletions(-)

Patch applied.  Thanks.
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: [PATCH] crypto: drop mask=CRYPTO_ALG_ASYNC from 'cipher' tfm allocations

2018-11-19 Thread Herbert Xu
On Wed, Nov 14, 2018 at 12:19:39PM -0800, Eric Biggers wrote:
> From: Eric Biggers 
> 
> 'cipher' algorithms (single block ciphers) are always synchronous, so
> passing CRYPTO_ALG_ASYNC in the mask to crypto_alloc_cipher() has no
> effect.  Many users therefore already don't pass it, but some still do.
> This inconsistency can cause confusion, especially since the way the
> 'mask' argument works is somewhat counterintuitive.
> 
> Thus, just remove the unneeded CRYPTO_ALG_ASYNC flags.
> 
> This patch shouldn't change any actual behavior.
> 
> Signed-off-by: Eric Biggers 
> ---
>  arch/s390/crypto/aes_s390.c   | 2 +-
>  drivers/crypto/amcc/crypto4xx_alg.c   | 3 +--
>  drivers/crypto/ccp/ccp-crypto-aes-cmac.c  | 4 +---
>  drivers/crypto/geode-aes.c| 2 +-
>  drivers/md/dm-crypt.c | 2 +-
>  drivers/net/wireless/cisco/airo.c | 2 +-
>  drivers/staging/rtl8192e/rtllib_crypt_ccmp.c  | 2 +-
>  drivers/staging/rtl8192u/ieee80211/ieee80211_crypt_ccmp.c | 2 +-
>  drivers/usb/wusbcore/crypto.c | 2 +-
>  net/bluetooth/smp.c   | 6 +++---
>  net/mac80211/wep.c| 4 ++--
>  net/wireless/lib80211_crypt_ccmp.c| 2 +-
>  net/wireless/lib80211_crypt_tkip.c| 4 ++--
>  net/wireless/lib80211_crypt_wep.c | 4 ++--
>  14 files changed, 19 insertions(+), 22 deletions(-)

Patch applied.  Thanks.
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: [PATCH] crypto: remove useless initializations of cra_list

2018-11-19 Thread Herbert Xu
On Wed, Nov 14, 2018 at 11:35:48AM -0800, Eric Biggers wrote:
> From: Eric Biggers 
> 
> Some algorithms initialize their .cra_list prior to registration.
> But this is unnecessary since crypto_register_alg() will overwrite
> .cra_list when adding the algorithm to the 'crypto_alg_list'.
> Apparently the useless assignment has just been copy+pasted around.
> 
> So, remove the useless assignments.
> 
> Exception: paes_s390.c uses cra_list to check whether the algorithm is
> registered or not, so I left that as-is for now.
> 
> This patch shouldn't change any actual behavior.
> 
> Signed-off-by: Eric Biggers 
> ---
>  arch/sparc/crypto/aes_glue.c  | 5 -
>  arch/sparc/crypto/camellia_glue.c | 5 -
>  arch/sparc/crypto/des_glue.c  | 5 -
>  crypto/lz4.c  | 1 -
>  crypto/lz4hc.c| 1 -
>  drivers/crypto/bcm/cipher.c   | 2 --
>  drivers/crypto/omap-aes.c | 2 --
>  drivers/crypto/omap-des.c | 1 -
>  drivers/crypto/qce/ablkcipher.c   | 1 -
>  drivers/crypto/qce/sha.c  | 1 -
>  drivers/crypto/sahara.c   | 1 -
>  11 files changed, 25 deletions(-)

Patch applied.  Thanks.
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: [PATCH] crypto: inside-secure - remove useless setting of type flags

2018-11-19 Thread Herbert Xu
On Wed, Nov 14, 2018 at 11:10:53AM -0800, Eric Biggers wrote:
> From: Eric Biggers 
> 
> Remove the unnecessary setting of CRYPTO_ALG_TYPE_SKCIPHER.
> Commit 2c95e6d97892 ("crypto: skcipher - remove useless setting of type
> flags") took care of this everywhere else, but a few more instances made
> it into the tree at about the same time.  Squash them before they get
> copy+pasted around again.
> 
> This patch shouldn't change any actual behavior.
> 
> Signed-off-by: Eric Biggers 
> ---
>  drivers/crypto/inside-secure/safexcel_cipher.c | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)

Patch applied.  Thanks.
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Spende

2018-11-19 Thread daniel
Hallo,

Sie haben eine wohltätige Spende in Höhe von 4.800, 000.00EUR, ich der 
Amerika-Lotterie Wert $ 560 Millionen gewonnen und ich bin einen Teil davon 
fünf glückliche Menschen und Altersheimen Spenden.Kontaktieren Sie mich für 
diesen Gott Gelegenheit per e-Mail: jane.d...@zoho.com

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.



Re: [PATCH 0/6] crypto: x86/chacha20 - SIMD performance improvements

2018-11-19 Thread Jason A. Donenfeld
Hi Martin,

On Mon, Nov 19, 2018 at 8:52 AM Martin Willi  wrote:
>
> Adding AVX-512VL support is relatively simple. I have a patchset mostly
> ready that is more than competitive with the code from Zinc. I'll clean
> that up and do more testing before posting it later this week.

Terrific. Depending on how it turns out, it'll be nice to try
integrating this into Zinc. I have a massive Xeon Gold 5120 machine
that I can give you access to if you'd like to do some testing and
benching. Poke me on IRC -- I'm zx2c4.

> I don't think that having AVX-512F is that important until it is really
> usable on CPUs in the market.

Actually, similarly here, a 10nm Cannon Lake machine should be
arriving at my house this week, which should make for some interesting
testing ground for non-throttled zmm, if you'd like to play with it.

Jason


Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce

2018-11-19 Thread Leon Romanovsky
On Mon, Nov 19, 2018 at 05:19:10PM +0800, Kenneth Lee wrote:
> On Mon, Nov 19, 2018 at 05:14:05PM +0800, Kenneth Lee wrote:
> > Date: Mon, 19 Nov 2018 17:14:05 +0800
> > From: Kenneth Lee 
> > To: Leon Romanovsky 
> > CC: Tim Sell , linux-...@vger.kernel.org,
> >  Alexander Shishkin , Zaibo Xu
> >  , zhangfei@foxmail.com, linux...@huawei.com,
> >  haojian.zhu...@linaro.org, Christoph Lameter , Hao Fang
> >  , Gavin Schenk , RDMA mailing
> >  list , Vinod Koul , Jason
> >  Gunthorpe , Doug Ledford , Uwe
> >  Kleine-König , David Kershner
> >  , Kenneth Lee , Johan
> >  Hovold , Cyrille Pitchen
> >  , Sagar Dharia
> >  , Jens Axboe ,
> >  guodong...@linaro.org, linux-netdev , Randy Dunlap
> >  , linux-ker...@vger.kernel.org, Zhou Wang
> >  , linux-crypto@vger.kernel.org, Philippe
> >  Ombredanne , Sanyog Kale ,
> >  "David S. Miller" ,
> >  linux-accelerat...@lists.ozlabs.org
> > Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce
> > User-Agent: Mutt/1.5.21 (2010-09-15)
> > Message-ID: <20181119091405.GE157308@Turing-Arch-b>
> >
> > On Thu, Nov 15, 2018 at 04:54:55PM +0200, Leon Romanovsky wrote:
> > > Date: Thu, 15 Nov 2018 16:54:55 +0200
> > > From: Leon Romanovsky 
> > > To: Kenneth Lee 
> > > CC: Kenneth Lee , Tim Sell ,
> > >  linux-...@vger.kernel.org, Alexander Shishkin
> > >  , Zaibo Xu ,
> > >  zhangfei@foxmail.com, linux...@huawei.com, haojian.zhu...@linaro.org,
> > >  Christoph Lameter , Hao Fang , 
> > > Gavin
> > >  Schenk , RDMA mailing list
> > >  , Zhou Wang , Jason
> > >  Gunthorpe , Doug Ledford , Uwe
> > >  Kleine-König , David Kershner
> > >  , Johan Hovold , Cyrille
> > >  Pitchen , Sagar Dharia
> > >  , Jens Axboe ,
> > >  guodong...@linaro.org, linux-netdev , Randy 
> > > Dunlap
> > >  , linux-ker...@vger.kernel.org, Vinod Koul
> > >  , linux-crypto@vger.kernel.org, Philippe Ombredanne
> > >  , Sanyog Kale , "David S.
> > >  Miller" , linux-accelerat...@lists.ozlabs.org
> > > Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce
> > > User-Agent: Mutt/1.10.1 (2018-07-13)
> > > Message-ID: <20181115145455.gn3...@mtr-leonro.mtl.com>
> > >
> > > On Thu, Nov 15, 2018 at 04:51:09PM +0800, Kenneth Lee wrote:
> > > > On Wed, Nov 14, 2018 at 06:00:17PM +0200, Leon Romanovsky wrote:
> > > > > Date: Wed, 14 Nov 2018 18:00:17 +0200
> > > > > From: Leon Romanovsky 
> > > > > To: Kenneth Lee 
> > > > > CC: Tim Sell , linux-...@vger.kernel.org,
> > > > >  Alexander Shishkin , Zaibo Xu
> > > > >  , zhangfei@foxmail.com, linux...@huawei.com,
> > > > >  haojian.zhu...@linaro.org, Christoph Lameter , Hao 
> > > > > Fang
> > > > >  , Gavin Schenk , RDMA 
> > > > > mailing
> > > > >  list , Zhou Wang 
> > > > > ,
> > > > >  Jason Gunthorpe , Doug Ledford , 
> > > > > Uwe
> > > > >  Kleine-König , David Kershner
> > > > >  , Johan Hovold , Cyrille
> > > > >  Pitchen , Sagar Dharia
> > > > >  , Jens Axboe ,
> > > > >  guodong...@linaro.org, linux-netdev , Randy 
> > > > > Dunlap
> > > > >  , linux-ker...@vger.kernel.org, Vinod Koul
> > > > >  , linux-crypto@vger.kernel.org, Philippe Ombredanne
> > > > >  , Sanyog Kale , 
> > > > > Kenneth Lee
> > > > >  , "David S. Miller" ,
> > > > >  linux-accelerat...@lists.ozlabs.org
> > > > > Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for 
> > > > > WarpDrive/uacce
> > > > > User-Agent: Mutt/1.10.1 (2018-07-13)
> > > > > Message-ID: <20181114160017.gi3...@mtr-leonro.mtl.com>
> > > > >
> > > > > On Wed, Nov 14, 2018 at 10:58:09AM +0800, Kenneth Lee wrote:
> > > > > >
> > > > > > 在 2018/11/13 上午8:23, Leon Romanovsky 写道:
> > > > > > > On Mon, Nov 12, 2018 at 03:58:02PM +0800, Kenneth Lee wrote:
> > > > > > > > From: Kenneth Lee 
> > > > > > > >
> > > > > > > > WarpDrive is a general accelerator framework for the user 
> > > > > > > > application to
> > > > > > > > access the hardware without going through the kernel in data 
> > > > > > > > path.
> > > > > > > >
> > > > > > > > The kernel component to provide kernel facility to driver for 
> > > > > > > > expose the
> > > > > > > > user interface is called uacce. It a short name for
> > > > > > > > "Unified/User-space-access-intended Accelerator Framework".
> > > > > > > >
> > > > > > > > This patch add document to explain how it works.
> > > > > > > + RDMA and netdev folks
> > > > > > >
> > > > > > > Sorry, to be late in the game, I don't see other patches, but from
> > > > > > > the description below it seems like you are reinventing RDMA verbs
> > > > > > > model. I have hard time to see the differences in the proposed
> > > > > > > framework to already implemented in drivers/infiniband/* for the 
> > > > > > > kernel
> > > > > > > space and for the https://github.com/linux-rdma/rdma-core/ for 
> > > > > > > the user
> > > > > > > space parts.
> > > > > >
> > > > > > Thanks Leon,
> > > > > >
> > > > > > Yes, we tried to solve similar problem in RDMA. We also learned a 
> > > > > > lot from
> > > > > > the exist code of RDMA. But we we have to make a new one 

Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce

2018-11-19 Thread Kenneth Lee
On Mon, Nov 19, 2018 at 05:14:05PM +0800, Kenneth Lee wrote:
> Date: Mon, 19 Nov 2018 17:14:05 +0800
> From: Kenneth Lee 
> To: Leon Romanovsky 
> CC: Tim Sell , linux-...@vger.kernel.org,
>  Alexander Shishkin , Zaibo Xu
>  , zhangfei@foxmail.com, linux...@huawei.com,
>  haojian.zhu...@linaro.org, Christoph Lameter , Hao Fang
>  , Gavin Schenk , RDMA mailing
>  list , Vinod Koul , Jason
>  Gunthorpe , Doug Ledford , Uwe
>  Kleine-König , David Kershner
>  , Kenneth Lee , Johan
>  Hovold , Cyrille Pitchen
>  , Sagar Dharia
>  , Jens Axboe ,
>  guodong...@linaro.org, linux-netdev , Randy Dunlap
>  , linux-ker...@vger.kernel.org, Zhou Wang
>  , linux-crypto@vger.kernel.org, Philippe
>  Ombredanne , Sanyog Kale ,
>  "David S. Miller" ,
>  linux-accelerat...@lists.ozlabs.org
> Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce
> User-Agent: Mutt/1.5.21 (2010-09-15)
> Message-ID: <20181119091405.GE157308@Turing-Arch-b>
> 
> On Thu, Nov 15, 2018 at 04:54:55PM +0200, Leon Romanovsky wrote:
> > Date: Thu, 15 Nov 2018 16:54:55 +0200
> > From: Leon Romanovsky 
> > To: Kenneth Lee 
> > CC: Kenneth Lee , Tim Sell ,
> >  linux-...@vger.kernel.org, Alexander Shishkin
> >  , Zaibo Xu ,
> >  zhangfei@foxmail.com, linux...@huawei.com, haojian.zhu...@linaro.org,
> >  Christoph Lameter , Hao Fang , Gavin
> >  Schenk , RDMA mailing list
> >  , Zhou Wang , Jason
> >  Gunthorpe , Doug Ledford , Uwe
> >  Kleine-König , David Kershner
> >  , Johan Hovold , Cyrille
> >  Pitchen , Sagar Dharia
> >  , Jens Axboe ,
> >  guodong...@linaro.org, linux-netdev , Randy Dunlap
> >  , linux-ker...@vger.kernel.org, Vinod Koul
> >  , linux-crypto@vger.kernel.org, Philippe Ombredanne
> >  , Sanyog Kale , "David S.
> >  Miller" , linux-accelerat...@lists.ozlabs.org
> > Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce
> > User-Agent: Mutt/1.10.1 (2018-07-13)
> > Message-ID: <20181115145455.gn3...@mtr-leonro.mtl.com>
> > 
> > On Thu, Nov 15, 2018 at 04:51:09PM +0800, Kenneth Lee wrote:
> > > On Wed, Nov 14, 2018 at 06:00:17PM +0200, Leon Romanovsky wrote:
> > > > Date: Wed, 14 Nov 2018 18:00:17 +0200
> > > > From: Leon Romanovsky 
> > > > To: Kenneth Lee 
> > > > CC: Tim Sell , linux-...@vger.kernel.org,
> > > >  Alexander Shishkin , Zaibo Xu
> > > >  , zhangfei@foxmail.com, linux...@huawei.com,
> > > >  haojian.zhu...@linaro.org, Christoph Lameter , Hao Fang
> > > >  , Gavin Schenk , RDMA 
> > > > mailing
> > > >  list , Zhou Wang ,
> > > >  Jason Gunthorpe , Doug Ledford , 
> > > > Uwe
> > > >  Kleine-König , David Kershner
> > > >  , Johan Hovold , Cyrille
> > > >  Pitchen , Sagar Dharia
> > > >  , Jens Axboe ,
> > > >  guodong...@linaro.org, linux-netdev , Randy 
> > > > Dunlap
> > > >  , linux-ker...@vger.kernel.org, Vinod Koul
> > > >  , linux-crypto@vger.kernel.org, Philippe Ombredanne
> > > >  , Sanyog Kale , Kenneth 
> > > > Lee
> > > >  , "David S. Miller" ,
> > > >  linux-accelerat...@lists.ozlabs.org
> > > > Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce
> > > > User-Agent: Mutt/1.10.1 (2018-07-13)
> > > > Message-ID: <20181114160017.gi3...@mtr-leonro.mtl.com>
> > > >
> > > > On Wed, Nov 14, 2018 at 10:58:09AM +0800, Kenneth Lee wrote:
> > > > >
> > > > > 在 2018/11/13 上午8:23, Leon Romanovsky 写道:
> > > > > > On Mon, Nov 12, 2018 at 03:58:02PM +0800, Kenneth Lee wrote:
> > > > > > > From: Kenneth Lee 
> > > > > > >
> > > > > > > WarpDrive is a general accelerator framework for the user 
> > > > > > > application to
> > > > > > > access the hardware without going through the kernel in data path.
> > > > > > >
> > > > > > > The kernel component to provide kernel facility to driver for 
> > > > > > > expose the
> > > > > > > user interface is called uacce. It a short name for
> > > > > > > "Unified/User-space-access-intended Accelerator Framework".
> > > > > > >
> > > > > > > This patch add document to explain how it works.
> > > > > > + RDMA and netdev folks
> > > > > >
> > > > > > Sorry, to be late in the game, I don't see other patches, but from
> > > > > > the description below it seems like you are reinventing RDMA verbs
> > > > > > model. I have hard time to see the differences in the proposed
> > > > > > framework to already implemented in drivers/infiniband/* for the 
> > > > > > kernel
> > > > > > space and for the https://github.com/linux-rdma/rdma-core/ for the 
> > > > > > user
> > > > > > space parts.
> > > > >
> > > > > Thanks Leon,
> > > > >
> > > > > Yes, we tried to solve similar problem in RDMA. We also learned a lot 
> > > > > from
> > > > > the exist code of RDMA. But we we have to make a new one because we 
> > > > > cannot
> > > > > register accelerators such as AI operation, encryption or compression 
> > > > > to the
> > > > > RDMA framework:)
> > > >
> > > > Assuming that you did everything right and still failed to use RDMA
> > > > framework, you was supposed to fix it and not to reinvent new exactly
> > > > same one. It is how we 

Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce

2018-11-19 Thread Kenneth Lee
On Thu, Nov 15, 2018 at 04:54:55PM +0200, Leon Romanovsky wrote:
> Date: Thu, 15 Nov 2018 16:54:55 +0200
> From: Leon Romanovsky 
> To: Kenneth Lee 
> CC: Kenneth Lee , Tim Sell ,
>  linux-...@vger.kernel.org, Alexander Shishkin
>  , Zaibo Xu ,
>  zhangfei@foxmail.com, linux...@huawei.com, haojian.zhu...@linaro.org,
>  Christoph Lameter , Hao Fang , Gavin
>  Schenk , RDMA mailing list
>  , Zhou Wang , Jason
>  Gunthorpe , Doug Ledford , Uwe
>  Kleine-König , David Kershner
>  , Johan Hovold , Cyrille
>  Pitchen , Sagar Dharia
>  , Jens Axboe ,
>  guodong...@linaro.org, linux-netdev , Randy Dunlap
>  , linux-ker...@vger.kernel.org, Vinod Koul
>  , linux-crypto@vger.kernel.org, Philippe Ombredanne
>  , Sanyog Kale , "David S.
>  Miller" , linux-accelerat...@lists.ozlabs.org
> Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce
> User-Agent: Mutt/1.10.1 (2018-07-13)
> Message-ID: <20181115145455.gn3...@mtr-leonro.mtl.com>
> 
> On Thu, Nov 15, 2018 at 04:51:09PM +0800, Kenneth Lee wrote:
> > On Wed, Nov 14, 2018 at 06:00:17PM +0200, Leon Romanovsky wrote:
> > > Date: Wed, 14 Nov 2018 18:00:17 +0200
> > > From: Leon Romanovsky 
> > > To: Kenneth Lee 
> > > CC: Tim Sell , linux-...@vger.kernel.org,
> > >  Alexander Shishkin , Zaibo Xu
> > >  , zhangfei@foxmail.com, linux...@huawei.com,
> > >  haojian.zhu...@linaro.org, Christoph Lameter , Hao Fang
> > >  , Gavin Schenk , RDMA 
> > > mailing
> > >  list , Zhou Wang ,
> > >  Jason Gunthorpe , Doug Ledford , Uwe
> > >  Kleine-König , David Kershner
> > >  , Johan Hovold , Cyrille
> > >  Pitchen , Sagar Dharia
> > >  , Jens Axboe ,
> > >  guodong...@linaro.org, linux-netdev , Randy 
> > > Dunlap
> > >  , linux-ker...@vger.kernel.org, Vinod Koul
> > >  , linux-crypto@vger.kernel.org, Philippe Ombredanne
> > >  , Sanyog Kale , Kenneth 
> > > Lee
> > >  , "David S. Miller" ,
> > >  linux-accelerat...@lists.ozlabs.org
> > > Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce
> > > User-Agent: Mutt/1.10.1 (2018-07-13)
> > > Message-ID: <20181114160017.gi3...@mtr-leonro.mtl.com>
> > >
> > > On Wed, Nov 14, 2018 at 10:58:09AM +0800, Kenneth Lee wrote:
> > > >
> > > > 在 2018/11/13 上午8:23, Leon Romanovsky 写道:
> > > > > On Mon, Nov 12, 2018 at 03:58:02PM +0800, Kenneth Lee wrote:
> > > > > > From: Kenneth Lee 
> > > > > >
> > > > > > WarpDrive is a general accelerator framework for the user 
> > > > > > application to
> > > > > > access the hardware without going through the kernel in data path.
> > > > > >
> > > > > > The kernel component to provide kernel facility to driver for 
> > > > > > expose the
> > > > > > user interface is called uacce. It a short name for
> > > > > > "Unified/User-space-access-intended Accelerator Framework".
> > > > > >
> > > > > > This patch add document to explain how it works.
> > > > > + RDMA and netdev folks
> > > > >
> > > > > Sorry, to be late in the game, I don't see other patches, but from
> > > > > the description below it seems like you are reinventing RDMA verbs
> > > > > model. I have hard time to see the differences in the proposed
> > > > > framework to already implemented in drivers/infiniband/* for the 
> > > > > kernel
> > > > > space and for the https://github.com/linux-rdma/rdma-core/ for the 
> > > > > user
> > > > > space parts.
> > > >
> > > > Thanks Leon,
> > > >
> > > > Yes, we tried to solve similar problem in RDMA. We also learned a lot 
> > > > from
> > > > the exist code of RDMA. But we we have to make a new one because we 
> > > > cannot
> > > > register accelerators such as AI operation, encryption or compression 
> > > > to the
> > > > RDMA framework:)
> > >
> > > Assuming that you did everything right and still failed to use RDMA
> > > framework, you was supposed to fix it and not to reinvent new exactly
> > > same one. It is how we develop kernel, by reusing existing code.
> >
> > Yes, but we don't force other system such as NIC or GPU into RDMA, do we?
> 
> You don't introduce new NIC or GPU, but proposing another interface to
> directly access HW memory and bypass kernel for the data path. This is
> whole idea of RDMA and this is why it is already present in the kernel.
> 
> Various hardware devices are supported in our stack allow a ton of crazy
> stuff, including GPUs interconnections and NIC functionalities.

Yes. We don't want to invent new wheel. That is why we did it behind VFIO in RFC
v1 and v2. But finally we were persuaded by Mr. Jerome Glisse that VFIO was not
a good place to solve the problem.

And currently, as you see, IB is bound with devices doing RDMA. The register
function, ib_register_device() hint that it is a netdev (get_netdev() 
callback), it know
about gid, pkey, and Memory Window. IB is not simply a address space management
framework. And verbs to IB are not transparent. If we start to add
compression/decompression, AI (RNN, CNN stuff) operations, and 
encryption/decryption
to the verbs set. It will become very complexity. Or maybe I 

Re: [PATCH 0/6] crypto: x86/chacha20 - SIMD performance improvements

2018-11-18 Thread Martin Willi
Hi Jason,

> I'd be inclined to roll with your implementation if it can eventually
> become competitive with Andy Polyakov's, [...]

I think for the SSSE3/AVX2 code paths it is competitive; especially for
small sizes it is faster, which is not that unimportant when
implementing layer 3 VPNs.

> there are still no AVX-512 paths, which means it's considerably
> slower on all newer generation Intel chips. Andy's has the AVX-512VL
> implementation for Skylake (using ymm, so as not to hit throttling)
> and AVX-512F for Cannon Lake and beyond (using zmm).

I don't think that having AVX-512F is that important until it is really
usable on CPUs in the market.

Adding AVX-512VL support is relatively simple. I have a patchset mostly
ready that is more than competitive with the code from Zinc. I'll clean
that up and do more testing before posting it later this week.

Best regards
Martin



Important

2018-11-18 Thread Reem Al-Hashimi
Hello,

My name is ms. Reem Al-Hashimi. The UAE minister of state for international 
cooparation. I got your contact from a certain email database from your country 
while i was looking for someone to handle a huge financial transaction for me 
in confidence. Can you receive and invest on behalf of my only son. Please 
reply to reem2...@daum.net, for more details if you are interested.

Regards,

Ms. Reem Al-Hashimy


Re: [PATCH 0/5] crypto: caam - add support for Era 10

2018-11-15 Thread Herbert Xu
On Thu, Nov 08, 2018 at 03:36:26PM +0200, Horia Geantă wrote:
> This patch set adds support for CAAM Era 10, currently used in LX2160A SoC:
> -new register mapping: some registers/fields are deprecated and moved
> to different locations, mainly version registers
> -algorithms
> chacha20 (over DPSECI - Data Path SEC Interface on fsl-mc bus)
> rfc7539(chacha20,poly1305) (over both DPSECI and Job Ring Interface)
> rfc7539esp(chacha20,poly1305) (over both DPSECI and Job Ring Interface)
> 
> Note: the patch set is generated on top of cryptodev-2.6, however testing
> was performed based on linux-next (tag: next-20181108) - which includes
> LX2160A platform support + manually updating LX2160A dts with:
> -fsl-mc bus DT node
> -missing dma-ranges property in soc DT node
> 
> Cristian Stoica (1):
>   crypto: export CHACHAPOLY_IV_SIZE
> 
> Horia Geantă (4):
>   crypto: caam - add register map changes cf. Era 10
>   crypto: caam/qi2 - add support for ChaCha20
>   crypto: caam/jr - add support for Chacha20 + Poly1305
>   crypto: caam/qi2 - add support for Chacha20 + Poly1305
> 
>  crypto/chacha20poly1305.c  |   2 -
>  drivers/crypto/caam/caamalg.c  | 266 
> ++---
>  drivers/crypto/caam/caamalg_desc.c | 139 ++-
>  drivers/crypto/caam/caamalg_desc.h |   5 +
>  drivers/crypto/caam/caamalg_qi.c   |  37 --
>  drivers/crypto/caam/caamalg_qi2.c  | 156 +-
>  drivers/crypto/caam/caamhash.c |  20 ++-
>  drivers/crypto/caam/caampkc.c  |  10 +-
>  drivers/crypto/caam/caamrng.c  |  10 +-
>  drivers/crypto/caam/compat.h   |   2 +
>  drivers/crypto/caam/ctrl.c |  28 +++-
>  drivers/crypto/caam/desc.h |  28 
>  drivers/crypto/caam/desc_constr.h  |   7 +-
>  drivers/crypto/caam/regs.h |  74 +--
>  include/crypto/chacha20.h  |   1 +
>  15 files changed, 724 insertions(+), 61 deletions(-)

All applied.  Thanks.
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: [PATCH 0/6] crypto: x86/chacha20 - SIMD performance improvements

2018-11-15 Thread Herbert Xu
On Sun, Nov 11, 2018 at 10:36:24AM +0100, Martin Willi wrote:
> This patchset improves performance of the ChaCha20 SIMD implementations
> for x86_64. For some specific encryption lengths, performance is more
> than doubled. Two mechanisms are used to achieve this:
> 
> * Instead of calculating the minimal number of required blocks for a
>   given encryption length, functions producing more blocks are used
>   more aggressively. Calculating a 4-block function can be faster than
>   calculating a 2-block and a 1-block function, even if only three
>   blocks are actually required.
> 
> * In addition to the 8-block AVX2 function, a 4-block and a 2-block
>   function are introduced.
> 
> Patches 1-3 add support for partial lengths to the existing 1-, 4- and
> 8-block functions. Patch 4 makes use of that by engaging the next higher
> level block functions more aggressively. Patch 5 and 6 add the new AVX2
> functions for 2 and 4 blocks. Patches are based on cryptodev and would
> need adjustments to apply on top of the Adiantum patchset.
> 
> Note that the more aggressive use of larger block functions calculate
> blocks that may get discarded. This may have a negative impact on energy
> usage or the processors thermal budget. However, with the new block
> functions we can avoid this over-calculation for many lengths, so the
> performance win can be considered more important.
> 
> Below are performance numbers measured with tcrypt using additional
> encryption lengths; numbers in kOps/s, on my i7-5557U. old is the
> existing, new the implementation with this patchset. As comparison
> the numbers for zinc in v6:
> 
>  len  old  new zinc
>8 5908 5818 5818
>   16 5917 5828 5726
>   24 5916 5869 5757
>   32 5920 5789 5813
>   40 5868 5799 5710
>   48 5877 5761 5761
>   56 5869 5797 5742
>   64 5897 5862 5685
>   72 3381 4979 3520
>   80 3364 5541 3475
>   88 3350 4977 3424
>   96 3342 5530 3371
>  104 3328 4923 3313
>  112 3317 5528 3207
>  120 3313 4970 3150
>  128 3492 5535 3568
>  136 2487 4570 3690
>  144 2481 5047 3599
>  152 2473 4565 3566
>  160 2459 5022 3515
>  168 2461 4550 3437
>  176 2454 5020 3325
>  184 2449 4535 3279
>  192 2538 5011 3762
>  200 1962 4537 3702
>  208 1962 4971 3622
>  216 1954 4487 3518
>  224 1949 4936 3445
>  232 1948 4497 3422
>  240 1941 4947 3317
>  248 1940 4481 3279
>  256 3798 4964 3723
>  264 2638 3577 3639
>  272 2637 3567 3597
>  280 2628 3563 3565
>  288 2630 3795 3484
>  296 2621 3580 3422
>  304 2612 3569 3352
>  312 2602 3599 3308
>  320 2694 3821 3694
>  328 2060 3538 3681
>  336 2054 3565 3599
>  344 2054 3553 3523
>  352 2049 3809 3419
>  360 2045 3575 3403
>  368 2035 3560 3334
>  376 2036 3555 3257
>  384 2092 3785 3715
>  392 1691 3505 3612
>  400 1684 3527 3553
>  408 1686 3527 3496
>  416 1684 3804 3430
>  424 1681 3555 3402
>  432 1675 3559 3311
>  440 1672 3558 3275
>  448 1710 3780 3689
>  456 1431 3541 3618
>  464 1428 3538 3576
>  472 1430 3527 3509
>  480 1426 3788 3405
>  488 1423 3502 3397
>  496 1423 3519 3298
>  504 1418 3519 3277
>  512 3694 3736 3735
>  520 2601 2571 2209
>  528 2601 2677 2148
>  536 2587 2534 2164
>  544 2578 2659 2138
>  552 2570 2552 2126
>  560 2566 2661 2035
>  568 2567 2542 2041
>  576 2639 2674 2199
>  584 2031 2531 2183
>  592 2027 2660 2145
>  600 2016 2513 2155
>  608 2009 2638 2133
>  616 2006 2522 2115
>  624 2000 2649 2064
>  632 1996 2518 2045
>  640 2053 2651 2188
>  648 1666 2402 2182
>  656 1663 2517 2158
>  664 1659 2397 2147
>  672 1657 2510 2139
>  680 1656 2394 2114
>  688 1653 2497 2077
>  696 1646 2393 2043
>  704 1678 2510 2208
>  712 1414 2391 2189
>  720 1412 2506 2169
>  728 1411 2384 2145
>  736 1408 2494 2142
>  744 1408 2379 2081
>  752 1405 2485 2064
>  760 1403 2376 2043
>  768 2189 2498 2211
>  776 1756 2137 2192
>  784 1746 2145 2146
>  792 1744 2141 2141
>  800 1743  2094
>  808 1742 2140 2100
>  816 1735 2134 2061
>  824 1731 2135 2045
>  832 1778  2223
>  840 1480 2132 2184
>  848 1480 2134 2173
>  856 1476 2124 2145
>  864 1474 2210 2126
>  872 1472 2127 2105
>  880 1463 2123 2056
>  888 1468 2123 2043
>  896 1494 2208 2219
>  904 1278 2120 2192
>  912 1277 2121 2170
>  920 1273 2118 2149
>  928 1272 2207 2125
>  936 1267 2125 2098
>  944 1265 2127 2060
>  952 1267 2126 2049
>  960 1289 2213 2204
>  968 1125 2123 2187
>  976 1122 2127 2166
>  984 1120 2123 2136
>  992 1118 2207 2119
> 1000 1118 2120 2101
> 1008 1117 2122 2042
> 1016 1115 2121 2048
> 1024 2174 2191 2195
> 1032 1748 1724 1565
> 1040 1745 1782 1544
> 1048 1736 1737 1554
> 1056 1738 1802 1541
> 1064 1735 1728 1523
> 1072 1730 1780 1507
> 1080 1729 1724 1497
> 1088 1757 1783 1592
> 1096 1475 1723 1575
> 1104 1474 1778 1563
> 1112 1472 1708 1544
> 1120 1468 1774 1521
> 1128 1466 1718 1521
> 1136 1462 1780 1501
> 1144 1460 1719 1491
> 1152 1481 1782 1575
> 1160 1271 1647 1558
> 1168 1271 1706 1554
> 1176 1268 1645 1545
> 1184 1265 1711 1538
> 1192 1265 1648 1530
> 1200 1264 1705 1493
> 1208 1262 1647 1498
> 1216 1277 1695 1581

Re: [PATCH 0/6] crypto: x86/chacha20 - SIMD performance improvements

2018-11-15 Thread Jason A. Donenfeld
Hi Martin,

This is nice work, and given that it's quite clean -- and that it's
usually hard to screw up chacha in subtle ways when test vectors pass
(unlike, say, poly1305 or curve25519), I'd be inclined to roll with
your implementation if it can eventually become competitive with Andy
Polyakov's, which I'm currently working on for Zinc (which no longer
has pre-generated code, addressing the biggest hurdle; v9 will be sent
shortly). Specifically, I'm not quite sure the improvements here tip
the balance apply to all avx2 microarchitectures, and most
importantly, there are still no AVX-512 paths, which means it's
considerably slower on all newer generation Intel chips. Andy's has
the AVX-512VL implementation for Skylake (using ymm, so as not to hit
throttling) and AVX-512F for Cannon Lake and beyond (using zmm). I've
attached some measurements below showing how stark the difference is.

The take away is that while Andy's implementation is still ahead in
terms of performance today, I'd certainly encourage your efforts to
gain parity with that, and I'd be happy have that when the performance
and fuzzing time is right for it. So please do keep chipping away at
it; I think it's a potentially useful effort.

Regards,
Jason

size old zinc
  
0 64 54
16 386 372
32 388 396
48 388 420
64 366 350
80 708 666
96 708 692
112 706 736
128 692 648
144 1036 682
160 1036 708
176 1036 730
192 1016 658
208 1360 684
224 1362 708
240 1360 732
256 644 500
272 990 526
288 988 556
304 988 576
320 972 500
336 1314 532
352 1316 558
368 1318 578
384 1308 506
400 1644 532
416 1644 556
432 1644 594
448 1624 508
464 1970 534
480 1970 556
496 1968 582
512 660 624
528 1016 682
544 1016 702
560 1018 728
576 998 654
592 1344 680
608 1344 708
624 1344 730
640 1326 654
656 1670 686
672 1670 708
688 1670 732
704 1652 658
720 1998 682
736 1998 710
752 1996 734
768 1256 662
784 1606 688
800 1606 714
816 1606 736
832 1584 660
848 1948 688
864 1950 714
880 1948 736
896 1912 688
912 2258 718
928 2258 744
944 2256 768
960 2238 692
976 2584 718
992 2584 744
1008 2584 770



On Thu, Nov 15, 2018 at 6:21 PM Herbert Xu  wrote:
>
> On Sun, Nov 11, 2018 at 10:36:24AM +0100, Martin Willi wrote:
> > This patchset improves performance of the ChaCha20 SIMD implementations
> > for x86_64. For some specific encryption lengths, performance is more
> > than doubled. Two mechanisms are used to achieve this:
> >
> > * Instead of calculating the minimal number of required blocks for a
> >   given encryption length, functions producing more blocks are used
> >   more aggressively. Calculating a 4-block function can be faster than
> >   calculating a 2-block and a 1-block function, even if only three
> >   blocks are actually required.
> >
> > * In addition to the 8-block AVX2 function, a 4-block and a 2-block
> >   function are introduced.
> >
> > Patches 1-3 add support for partial lengths to the existing 1-, 4- and
> > 8-block functions. Patch 4 makes use of that by engaging the next higher
> > level block functions more aggressively. Patch 5 and 6 add the new AVX2
> > functions for 2 and 4 blocks. Patches are based on cryptodev and would
> > need adjustments to apply on top of the Adiantum patchset.
> >
> > Note that the more aggressive use of larger block functions calculate
> > blocks that may get discarded. This may have a negative impact on energy
> > usage or the processors thermal budget. However, with the new block
> > functions we can avoid this over-calculation for many lengths, so the
> > performance win can be considered more important.
> >
> > Below are performance numbers measured with tcrypt using additional
> > encryption lengths; numbers in kOps/s, on my i7-5557U. old is the
> > existing, new the implementation with this patchset. As comparison
> > the numbers for zinc in v6:
> >
> >  len  old  new zinc
> >8 5908 5818 5818
> >   16 5917 5828 5726
> >   24 5916 5869 5757
> >   32 5920 5789 5813
> >   40 5868 5799 5710
> >   48 5877 5761 5761
> >   56 5869 5797 5742
> >   64 5897 5862 5685
> >   72 3381 4979 3520
> >   80 3364 5541 3475
> >   88 3350 4977 3424
> >   96 3342 5530 3371
> >  104 3328 4923 3313
> >  112 3317 5528 3207
> >  120 3313 4970 3150
> >  128 3492 5535 3568
> >  136 2487 4570 3690
> >  144 2481 5047 3599
> >  152 2473 4565 3566
> >  160 2459 5022 3515
> >  168 2461 4550 3437
> >  176 2454 5020 3325
> >  184 2449 4535 3279
> >  192 2538 5011 3762
> >  200 1962 4537 3702
> >  208 1962 4971 3622
> >  216 1954 4487 3518
> >  224 1949 4936 3445
> >  232 1948 4497 3422
> >  240 1941 4947 3317
> >  248 1940 4481 3279
> >  256 3798 4964 3723
> >  264 2638 3577 3639
> >  272 2637 3567 3597
> >  280 2628 3563 3565
> >  288 2630 3795 3484
> >  296 2621 3580 3422
> >  304 2612 3569 3352
> >  312 2602 3599 3308
> >  320 2694 3821 3694
> >  328 2060 3538 3681
> >  336 2054 3565 3599
> >  344 2054 3553 3523
> >  352 2049 3809 3419
> >  360 2045 3575 3403
> >  368 2035 3560 3334
> >  376 2036 3555 3257
> >  384 2092 3785 3715
> > 

Re: [PATCH 0/6] crypto: x86/chacha20 - SIMD performance improvements

2018-11-15 Thread Herbert Xu
On Sun, Nov 11, 2018 at 10:36:24AM +0100, Martin Willi wrote:
> This patchset improves performance of the ChaCha20 SIMD implementations
> for x86_64. For some specific encryption lengths, performance is more
> than doubled. Two mechanisms are used to achieve this:
> 
> * Instead of calculating the minimal number of required blocks for a
>   given encryption length, functions producing more blocks are used
>   more aggressively. Calculating a 4-block function can be faster than
>   calculating a 2-block and a 1-block function, even if only three
>   blocks are actually required.
> 
> * In addition to the 8-block AVX2 function, a 4-block and a 2-block
>   function are introduced.
> 
> Patches 1-3 add support for partial lengths to the existing 1-, 4- and
> 8-block functions. Patch 4 makes use of that by engaging the next higher
> level block functions more aggressively. Patch 5 and 6 add the new AVX2
> functions for 2 and 4 blocks. Patches are based on cryptodev and would
> need adjustments to apply on top of the Adiantum patchset.
> 
> Note that the more aggressive use of larger block functions calculate
> blocks that may get discarded. This may have a negative impact on energy
> usage or the processors thermal budget. However, with the new block
> functions we can avoid this over-calculation for many lengths, so the
> performance win can be considered more important.
> 
> Below are performance numbers measured with tcrypt using additional
> encryption lengths; numbers in kOps/s, on my i7-5557U. old is the
> existing, new the implementation with this patchset. As comparison
> the numbers for zinc in v6:
> 
>  len  old  new zinc
>8 5908 5818 5818
>   16 5917 5828 5726
>   24 5916 5869 5757
>   32 5920 5789 5813
>   40 5868 5799 5710
>   48 5877 5761 5761
>   56 5869 5797 5742
>   64 5897 5862 5685
>   72 3381 4979 3520
>   80 3364 5541 3475
>   88 3350 4977 3424
>   96 3342 5530 3371
>  104 3328 4923 3313
>  112 3317 5528 3207
>  120 3313 4970 3150
>  128 3492 5535 3568
>  136 2487 4570 3690
>  144 2481 5047 3599
>  152 2473 4565 3566
>  160 2459 5022 3515
>  168 2461 4550 3437
>  176 2454 5020 3325
>  184 2449 4535 3279
>  192 2538 5011 3762
>  200 1962 4537 3702
>  208 1962 4971 3622
>  216 1954 4487 3518
>  224 1949 4936 3445
>  232 1948 4497 3422
>  240 1941 4947 3317
>  248 1940 4481 3279
>  256 3798 4964 3723
>  264 2638 3577 3639
>  272 2637 3567 3597
>  280 2628 3563 3565
>  288 2630 3795 3484
>  296 2621 3580 3422
>  304 2612 3569 3352
>  312 2602 3599 3308
>  320 2694 3821 3694
>  328 2060 3538 3681
>  336 2054 3565 3599
>  344 2054 3553 3523
>  352 2049 3809 3419
>  360 2045 3575 3403
>  368 2035 3560 3334
>  376 2036 3555 3257
>  384 2092 3785 3715
>  392 1691 3505 3612
>  400 1684 3527 3553
>  408 1686 3527 3496
>  416 1684 3804 3430
>  424 1681 3555 3402
>  432 1675 3559 3311
>  440 1672 3558 3275
>  448 1710 3780 3689
>  456 1431 3541 3618
>  464 1428 3538 3576
>  472 1430 3527 3509
>  480 1426 3788 3405
>  488 1423 3502 3397
>  496 1423 3519 3298
>  504 1418 3519 3277
>  512 3694 3736 3735
>  520 2601 2571 2209
>  528 2601 2677 2148
>  536 2587 2534 2164
>  544 2578 2659 2138
>  552 2570 2552 2126
>  560 2566 2661 2035
>  568 2567 2542 2041
>  576 2639 2674 2199
>  584 2031 2531 2183
>  592 2027 2660 2145
>  600 2016 2513 2155
>  608 2009 2638 2133
>  616 2006 2522 2115
>  624 2000 2649 2064
>  632 1996 2518 2045
>  640 2053 2651 2188
>  648 1666 2402 2182
>  656 1663 2517 2158
>  664 1659 2397 2147
>  672 1657 2510 2139
>  680 1656 2394 2114
>  688 1653 2497 2077
>  696 1646 2393 2043
>  704 1678 2510 2208
>  712 1414 2391 2189
>  720 1412 2506 2169
>  728 1411 2384 2145
>  736 1408 2494 2142
>  744 1408 2379 2081
>  752 1405 2485 2064
>  760 1403 2376 2043
>  768 2189 2498 2211
>  776 1756 2137 2192
>  784 1746 2145 2146
>  792 1744 2141 2141
>  800 1743  2094
>  808 1742 2140 2100
>  816 1735 2134 2061
>  824 1731 2135 2045
>  832 1778  2223
>  840 1480 2132 2184
>  848 1480 2134 2173
>  856 1476 2124 2145
>  864 1474 2210 2126
>  872 1472 2127 2105
>  880 1463 2123 2056
>  888 1468 2123 2043
>  896 1494 2208 2219
>  904 1278 2120 2192
>  912 1277 2121 2170
>  920 1273 2118 2149
>  928 1272 2207 2125
>  936 1267 2125 2098
>  944 1265 2127 2060
>  952 1267 2126 2049
>  960 1289 2213 2204
>  968 1125 2123 2187
>  976 1122 2127 2166
>  984 1120 2123 2136
>  992 1118 2207 2119
> 1000 1118 2120 2101
> 1008 1117 2122 2042
> 1016 1115 2121 2048
> 1024 2174 2191 2195
> 1032 1748 1724 1565
> 1040 1745 1782 1544
> 1048 1736 1737 1554
> 1056 1738 1802 1541
> 1064 1735 1728 1523
> 1072 1730 1780 1507
> 1080 1729 1724 1497
> 1088 1757 1783 1592
> 1096 1475 1723 1575
> 1104 1474 1778 1563
> 1112 1472 1708 1544
> 1120 1468 1774 1521
> 1128 1466 1718 1521
> 1136 1462 1780 1501
> 1144 1460 1719 1491
> 1152 1481 1782 1575
> 1160 1271 1647 1558
> 1168 1271 1706 1554
> 1176 1268 1645 1545
> 1184 1265 1711 1538
> 1192 1265 1648 1530
> 1200 1264 1705 1493
> 1208 1262 1647 1498
> 1216 1277 1695 1581

Re: [PATCH] crypto: inside-secure - remove useless setting of type flags

2018-11-14 Thread Antoine Tenart
Hi Eric,

On Wed, Nov 14, 2018 at 11:10:53AM -0800, Eric Biggers wrote:
> From: Eric Biggers 
> 
> Remove the unnecessary setting of CRYPTO_ALG_TYPE_SKCIPHER.
> Commit 2c95e6d97892 ("crypto: skcipher - remove useless setting of type
> flags") took care of this everywhere else, but a few more instances made
> it into the tree at about the same time.  Squash them before they get
> copy+pasted around again.
> 
> This patch shouldn't change any actual behavior.
> 
> Signed-off-by: Eric Biggers 

Acked-by: Antoine Tenart 

Thanks!
Antoine

> ---
>  drivers/crypto/inside-secure/safexcel_cipher.c | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/crypto/inside-secure/safexcel_cipher.c 
> b/drivers/crypto/inside-secure/safexcel_cipher.c
> index 3aef1d43e4351..d531c14020dcb 100644
> --- a/drivers/crypto/inside-secure/safexcel_cipher.c
> +++ b/drivers/crypto/inside-secure/safexcel_cipher.c
> @@ -970,7 +970,7 @@ struct safexcel_alg_template safexcel_alg_cbc_des = {
>   .cra_name = "cbc(des)",
>   .cra_driver_name = "safexcel-cbc-des",
>   .cra_priority = 300,
> - .cra_flags = CRYPTO_ALG_TYPE_SKCIPHER | 
> CRYPTO_ALG_ASYNC |
> + .cra_flags = CRYPTO_ALG_ASYNC |
>CRYPTO_ALG_KERN_DRIVER_ONLY,
>   .cra_blocksize = DES_BLOCK_SIZE,
>   .cra_ctxsize = sizeof(struct safexcel_cipher_ctx),
> @@ -1010,7 +1010,7 @@ struct safexcel_alg_template safexcel_alg_ecb_des = {
>   .cra_name = "ecb(des)",
>   .cra_driver_name = "safexcel-ecb-des",
>   .cra_priority = 300,
> - .cra_flags = CRYPTO_ALG_TYPE_SKCIPHER | 
> CRYPTO_ALG_ASYNC |
> + .cra_flags = CRYPTO_ALG_ASYNC |
>CRYPTO_ALG_KERN_DRIVER_ONLY,
>   .cra_blocksize = DES_BLOCK_SIZE,
>   .cra_ctxsize = sizeof(struct safexcel_cipher_ctx),
> @@ -1074,7 +1074,7 @@ struct safexcel_alg_template safexcel_alg_cbc_des3_ede 
> = {
>   .cra_name = "cbc(des3_ede)",
>   .cra_driver_name = "safexcel-cbc-des3_ede",
>   .cra_priority = 300,
> - .cra_flags = CRYPTO_ALG_TYPE_SKCIPHER | 
> CRYPTO_ALG_ASYNC |
> + .cra_flags = CRYPTO_ALG_ASYNC |
>CRYPTO_ALG_KERN_DRIVER_ONLY,
>   .cra_blocksize = DES3_EDE_BLOCK_SIZE,
>   .cra_ctxsize = sizeof(struct safexcel_cipher_ctx),
> @@ -1114,7 +1114,7 @@ struct safexcel_alg_template safexcel_alg_ecb_des3_ede 
> = {
>   .cra_name = "ecb(des3_ede)",
>   .cra_driver_name = "safexcel-ecb-des3_ede",
>   .cra_priority = 300,
> - .cra_flags = CRYPTO_ALG_TYPE_SKCIPHER | 
> CRYPTO_ALG_ASYNC |
> + .cra_flags = CRYPTO_ALG_ASYNC |
>CRYPTO_ALG_KERN_DRIVER_ONLY,
>   .cra_blocksize = DES3_EDE_BLOCK_SIZE,
>   .cra_ctxsize = sizeof(struct safexcel_cipher_ctx),
> -- 
> 2.19.1.930.g4563a0d9d0-goog
> 

-- 
Antoine Ténart, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com


[PATCH] crypto: drop mask=CRYPTO_ALG_ASYNC from 'shash' tfm allocations

2018-11-14 Thread Eric Biggers
From: Eric Biggers 

'shash' algorithms are always synchronous, so passing CRYPTO_ALG_ASYNC
in the mask to crypto_alloc_shash() has no effect.  Many users therefore
already don't pass it, but some still do.  This inconsistency can cause
confusion, especially since the way the 'mask' argument works is
somewhat counterintuitive.

Thus, just remove the unneeded CRYPTO_ALG_ASYNC flags.

This patch shouldn't change any actual behavior.

Signed-off-by: Eric Biggers 
---
 drivers/block/drbd/drbd_receiver.c  | 2 +-
 drivers/md/dm-integrity.c   | 2 +-
 drivers/net/wireless/intersil/orinoco/mic.c | 6 ++
 fs/ubifs/auth.c | 5 ++---
 net/bluetooth/smp.c | 2 +-
 security/apparmor/crypto.c  | 2 +-
 security/integrity/evm/evm_crypto.c | 3 +--
 security/keys/encrypted-keys/encrypted.c| 4 ++--
 security/keys/trusted.c | 4 ++--
 9 files changed, 13 insertions(+), 17 deletions(-)

diff --git a/drivers/block/drbd/drbd_receiver.c 
b/drivers/block/drbd/drbd_receiver.c
index 61c392752fe4b..ccfcf00f2798d 100644
--- a/drivers/block/drbd/drbd_receiver.c
+++ b/drivers/block/drbd/drbd_receiver.c
@@ -3623,7 +3623,7 @@ static int receive_protocol(struct drbd_connection 
*connection, struct packet_in
 * change.
 */
 
-   peer_integrity_tfm = crypto_alloc_shash(integrity_alg, 0, 
CRYPTO_ALG_ASYNC);
+   peer_integrity_tfm = crypto_alloc_shash(integrity_alg, 0, 0);
if (IS_ERR(peer_integrity_tfm)) {
peer_integrity_tfm = NULL;
drbd_err(connection, "peer data-integrity-alg %s not 
supported\n",
diff --git a/drivers/md/dm-integrity.c b/drivers/md/dm-integrity.c
index bb3096bf2cc6b..d4ad0bfee2519 100644
--- a/drivers/md/dm-integrity.c
+++ b/drivers/md/dm-integrity.c
@@ -2804,7 +2804,7 @@ static int get_mac(struct crypto_shash **hash, struct 
alg_spec *a, char **error,
int r;
 
if (a->alg_string) {
-   *hash = crypto_alloc_shash(a->alg_string, 0, CRYPTO_ALG_ASYNC);
+   *hash = crypto_alloc_shash(a->alg_string, 0, 0);
if (IS_ERR(*hash)) {
*error = error_alg;
r = PTR_ERR(*hash);
diff --git a/drivers/net/wireless/intersil/orinoco/mic.c 
b/drivers/net/wireless/intersil/orinoco/mic.c
index 08bc7822f8209..709d9ab3e7bcb 100644
--- a/drivers/net/wireless/intersil/orinoco/mic.c
+++ b/drivers/net/wireless/intersil/orinoco/mic.c
@@ -16,8 +16,7 @@
 //
 int orinoco_mic_init(struct orinoco_private *priv)
 {
-   priv->tx_tfm_mic = crypto_alloc_shash("michael_mic", 0,
- CRYPTO_ALG_ASYNC);
+   priv->tx_tfm_mic = crypto_alloc_shash("michael_mic", 0, 0);
if (IS_ERR(priv->tx_tfm_mic)) {
printk(KERN_DEBUG "orinoco_mic_init: could not allocate "
   "crypto API michael_mic\n");
@@ -25,8 +24,7 @@ int orinoco_mic_init(struct orinoco_private *priv)
return -ENOMEM;
}
 
-   priv->rx_tfm_mic = crypto_alloc_shash("michael_mic", 0,
- CRYPTO_ALG_ASYNC);
+   priv->rx_tfm_mic = crypto_alloc_shash("michael_mic", 0, 0);
if (IS_ERR(priv->rx_tfm_mic)) {
printk(KERN_DEBUG "orinoco_mic_init: could not allocate "
   "crypto API michael_mic\n");
diff --git a/fs/ubifs/auth.c b/fs/ubifs/auth.c
index 124e965a28b30..5bf5fd08879e6 100644
--- a/fs/ubifs/auth.c
+++ b/fs/ubifs/auth.c
@@ -269,8 +269,7 @@ int ubifs_init_authentication(struct ubifs_info *c)
goto out;
}
 
-   c->hash_tfm = crypto_alloc_shash(c->auth_hash_name, 0,
-CRYPTO_ALG_ASYNC);
+   c->hash_tfm = crypto_alloc_shash(c->auth_hash_name, 0, 0);
if (IS_ERR(c->hash_tfm)) {
err = PTR_ERR(c->hash_tfm);
ubifs_err(c, "Can not allocate %s: %d",
@@ -286,7 +285,7 @@ int ubifs_init_authentication(struct ubifs_info *c)
goto out_free_hash;
}
 
-   c->hmac_tfm = crypto_alloc_shash(hmac_name, 0, CRYPTO_ALG_ASYNC);
+   c->hmac_tfm = crypto_alloc_shash(hmac_name, 0, 0);
if (IS_ERR(c->hmac_tfm)) {
err = PTR_ERR(c->hmac_tfm);
ubifs_err(c, "Can not allocate %s: %d", hmac_name, err);
diff --git a/net/bluetooth/smp.c b/net/bluetooth/smp.c
index 1f94a25beef69..621146d04c038 100644
--- a/net/bluetooth/smp.c
+++ b/net/bluetooth/smp.c
@@ -3912,7 +3912,7 @@ int __init bt_selftest_smp(void)
return PTR_ERR(tfm_aes);
}
 
-   tfm_cmac = crypto_alloc_shash("cmac(aes)", 0, CRYPTO_ALG_ASYNC);
+   tfm_cmac = crypto_alloc_shash("cmac(aes)", 0, 0);
if (IS_ERR(tfm_cmac)) {
BT_ERR("Unable to create CMAC crypto 

[PATCH] crypto: drop mask=CRYPTO_ALG_ASYNC from 'cipher' tfm allocations

2018-11-14 Thread Eric Biggers
From: Eric Biggers 

'cipher' algorithms (single block ciphers) are always synchronous, so
passing CRYPTO_ALG_ASYNC in the mask to crypto_alloc_cipher() has no
effect.  Many users therefore already don't pass it, but some still do.
This inconsistency can cause confusion, especially since the way the
'mask' argument works is somewhat counterintuitive.

Thus, just remove the unneeded CRYPTO_ALG_ASYNC flags.

This patch shouldn't change any actual behavior.

Signed-off-by: Eric Biggers 
---
 arch/s390/crypto/aes_s390.c   | 2 +-
 drivers/crypto/amcc/crypto4xx_alg.c   | 3 +--
 drivers/crypto/ccp/ccp-crypto-aes-cmac.c  | 4 +---
 drivers/crypto/geode-aes.c| 2 +-
 drivers/md/dm-crypt.c | 2 +-
 drivers/net/wireless/cisco/airo.c | 2 +-
 drivers/staging/rtl8192e/rtllib_crypt_ccmp.c  | 2 +-
 drivers/staging/rtl8192u/ieee80211/ieee80211_crypt_ccmp.c | 2 +-
 drivers/usb/wusbcore/crypto.c | 2 +-
 net/bluetooth/smp.c   | 6 +++---
 net/mac80211/wep.c| 4 ++--
 net/wireless/lib80211_crypt_ccmp.c| 2 +-
 net/wireless/lib80211_crypt_tkip.c| 4 ++--
 net/wireless/lib80211_crypt_wep.c | 4 ++--
 14 files changed, 19 insertions(+), 22 deletions(-)

diff --git a/arch/s390/crypto/aes_s390.c b/arch/s390/crypto/aes_s390.c
index 812d9498d97be..dd456725189f2 100644
--- a/arch/s390/crypto/aes_s390.c
+++ b/arch/s390/crypto/aes_s390.c
@@ -137,7 +137,7 @@ static int fallback_init_cip(struct crypto_tfm *tfm)
struct s390_aes_ctx *sctx = crypto_tfm_ctx(tfm);
 
sctx->fallback.cip = crypto_alloc_cipher(name, 0,
-   CRYPTO_ALG_ASYNC | CRYPTO_ALG_NEED_FALLBACK);
+CRYPTO_ALG_NEED_FALLBACK);
 
if (IS_ERR(sctx->fallback.cip)) {
pr_err("Allocating AES fallback algorithm %s failed\n",
diff --git a/drivers/crypto/amcc/crypto4xx_alg.c 
b/drivers/crypto/amcc/crypto4xx_alg.c
index f5c07498ea4f0..4092c2aad8e21 100644
--- a/drivers/crypto/amcc/crypto4xx_alg.c
+++ b/drivers/crypto/amcc/crypto4xx_alg.c
@@ -520,8 +520,7 @@ static int crypto4xx_compute_gcm_hash_key_sw(__le32 
*hash_start, const u8 *key,
uint8_t src[16] = { 0 };
int rc = 0;
 
-   aes_tfm = crypto_alloc_cipher("aes", 0, CRYPTO_ALG_ASYNC |
- CRYPTO_ALG_NEED_FALLBACK);
+   aes_tfm = crypto_alloc_cipher("aes", 0, CRYPTO_ALG_NEED_FALLBACK);
if (IS_ERR(aes_tfm)) {
rc = PTR_ERR(aes_tfm);
pr_warn("could not load aes cipher driver: %d\n", rc);
diff --git a/drivers/crypto/ccp/ccp-crypto-aes-cmac.c 
b/drivers/crypto/ccp/ccp-crypto-aes-cmac.c
index 3c6fe57f91f8c..9108015e56cc5 100644
--- a/drivers/crypto/ccp/ccp-crypto-aes-cmac.c
+++ b/drivers/crypto/ccp/ccp-crypto-aes-cmac.c
@@ -346,9 +346,7 @@ static int ccp_aes_cmac_cra_init(struct crypto_tfm *tfm)
 
crypto_ahash_set_reqsize(ahash, sizeof(struct ccp_aes_cmac_req_ctx));
 
-   cipher_tfm = crypto_alloc_cipher("aes", 0,
-CRYPTO_ALG_ASYNC |
-CRYPTO_ALG_NEED_FALLBACK);
+   cipher_tfm = crypto_alloc_cipher("aes", 0, CRYPTO_ALG_NEED_FALLBACK);
if (IS_ERR(cipher_tfm)) {
pr_warn("could not load aes cipher driver\n");
return PTR_ERR(cipher_tfm);
diff --git a/drivers/crypto/geode-aes.c b/drivers/crypto/geode-aes.c
index eb2a0a73cbed1..b4c24a35b3d08 100644
--- a/drivers/crypto/geode-aes.c
+++ b/drivers/crypto/geode-aes.c
@@ -261,7 +261,7 @@ static int fallback_init_cip(struct crypto_tfm *tfm)
struct geode_aes_op *op = crypto_tfm_ctx(tfm);
 
op->fallback.cip = crypto_alloc_cipher(name, 0,
-   CRYPTO_ALG_ASYNC | CRYPTO_ALG_NEED_FALLBACK);
+  CRYPTO_ALG_NEED_FALLBACK);
 
if (IS_ERR(op->fallback.cip)) {
printk(KERN_ERR "Error allocating fallback algo %s\n", name);
diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index b8eec515a003c..a7195eb5b8d89 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -377,7 +377,7 @@ static struct crypto_cipher *alloc_essiv_cipher(struct 
crypt_config *cc,
int err;
 
/* Setup the essiv_tfm with the given salt */
-   essiv_tfm = crypto_alloc_cipher(cc->cipher, 0, CRYPTO_ALG_ASYNC);
+   essiv_tfm = crypto_alloc_cipher(cc->cipher, 0, 0);
if (IS_ERR(essiv_tfm)) {
ti->error = "Error allocating crypto tfm for ESSIV";
return essiv_tfm;
diff --git a/drivers/net/wireless/cisco/airo.c 
b/drivers/net/wireless/cisco/airo.c
index 04dd7a9365938..6fab69fe6c92c 100644
--- 

[PATCH] crypto: remove useless initializations of cra_list

2018-11-14 Thread Eric Biggers
From: Eric Biggers 

Some algorithms initialize their .cra_list prior to registration.
But this is unnecessary since crypto_register_alg() will overwrite
.cra_list when adding the algorithm to the 'crypto_alg_list'.
Apparently the useless assignment has just been copy+pasted around.

So, remove the useless assignments.

Exception: paes_s390.c uses cra_list to check whether the algorithm is
registered or not, so I left that as-is for now.

This patch shouldn't change any actual behavior.

Signed-off-by: Eric Biggers 
---
 arch/sparc/crypto/aes_glue.c  | 5 -
 arch/sparc/crypto/camellia_glue.c | 5 -
 arch/sparc/crypto/des_glue.c  | 5 -
 crypto/lz4.c  | 1 -
 crypto/lz4hc.c| 1 -
 drivers/crypto/bcm/cipher.c   | 2 --
 drivers/crypto/omap-aes.c | 2 --
 drivers/crypto/omap-des.c | 1 -
 drivers/crypto/qce/ablkcipher.c   | 1 -
 drivers/crypto/qce/sha.c  | 1 -
 drivers/crypto/sahara.c   | 1 -
 11 files changed, 25 deletions(-)

diff --git a/arch/sparc/crypto/aes_glue.c b/arch/sparc/crypto/aes_glue.c
index 3cd4f6b198b65..a9b8b0b94a8d4 100644
--- a/arch/sparc/crypto/aes_glue.c
+++ b/arch/sparc/crypto/aes_glue.c
@@ -476,11 +476,6 @@ static bool __init sparc64_has_aes_opcode(void)
 
 static int __init aes_sparc64_mod_init(void)
 {
-   int i;
-
-   for (i = 0; i < ARRAY_SIZE(algs); i++)
-   INIT_LIST_HEAD([i].cra_list);
-
if (sparc64_has_aes_opcode()) {
pr_info("Using sparc64 aes opcodes optimized AES 
implementation\n");
return crypto_register_algs(algs, ARRAY_SIZE(algs));
diff --git a/arch/sparc/crypto/camellia_glue.c 
b/arch/sparc/crypto/camellia_glue.c
index 561a84d93cf68..900d5c617e83b 100644
--- a/arch/sparc/crypto/camellia_glue.c
+++ b/arch/sparc/crypto/camellia_glue.c
@@ -299,11 +299,6 @@ static bool __init sparc64_has_camellia_opcode(void)
 
 static int __init camellia_sparc64_mod_init(void)
 {
-   int i;
-
-   for (i = 0; i < ARRAY_SIZE(algs); i++)
-   INIT_LIST_HEAD([i].cra_list);
-
if (sparc64_has_camellia_opcode()) {
pr_info("Using sparc64 camellia opcodes optimized CAMELLIA 
implementation\n");
return crypto_register_algs(algs, ARRAY_SIZE(algs));
diff --git a/arch/sparc/crypto/des_glue.c b/arch/sparc/crypto/des_glue.c
index 61af794aa2d31..56499ea39fd36 100644
--- a/arch/sparc/crypto/des_glue.c
+++ b/arch/sparc/crypto/des_glue.c
@@ -510,11 +510,6 @@ static bool __init sparc64_has_des_opcode(void)
 
 static int __init des_sparc64_mod_init(void)
 {
-   int i;
-
-   for (i = 0; i < ARRAY_SIZE(algs); i++)
-   INIT_LIST_HEAD([i].cra_list);
-
if (sparc64_has_des_opcode()) {
pr_info("Using sparc64 des opcodes optimized DES 
implementation\n");
return crypto_register_algs(algs, ARRAY_SIZE(algs));
diff --git a/crypto/lz4.c b/crypto/lz4.c
index 2ce2660d3519e..c160dfdbf2e07 100644
--- a/crypto/lz4.c
+++ b/crypto/lz4.c
@@ -122,7 +122,6 @@ static struct crypto_alg alg_lz4 = {
.cra_flags  = CRYPTO_ALG_TYPE_COMPRESS,
.cra_ctxsize= sizeof(struct lz4_ctx),
.cra_module = THIS_MODULE,
-   .cra_list   = LIST_HEAD_INIT(alg_lz4.cra_list),
.cra_init   = lz4_init,
.cra_exit   = lz4_exit,
.cra_u  = { .compress = {
diff --git a/crypto/lz4hc.c b/crypto/lz4hc.c
index 2be14f054dafd..583b5e013d7a5 100644
--- a/crypto/lz4hc.c
+++ b/crypto/lz4hc.c
@@ -123,7 +123,6 @@ static struct crypto_alg alg_lz4hc = {
.cra_flags  = CRYPTO_ALG_TYPE_COMPRESS,
.cra_ctxsize= sizeof(struct lz4hc_ctx),
.cra_module = THIS_MODULE,
-   .cra_list   = LIST_HEAD_INIT(alg_lz4hc.cra_list),
.cra_init   = lz4hc_init,
.cra_exit   = lz4hc_exit,
.cra_u  = { .compress = {
diff --git a/drivers/crypto/bcm/cipher.c b/drivers/crypto/bcm/cipher.c
index 2d1f1db9f8074..8808eacc65801 100644
--- a/drivers/crypto/bcm/cipher.c
+++ b/drivers/crypto/bcm/cipher.c
@@ -4605,7 +4605,6 @@ static int spu_register_ablkcipher(struct iproc_alg_s 
*driver_alg)
crypto->cra_priority = cipher_pri;
crypto->cra_alignmask = 0;
crypto->cra_ctxsize = sizeof(struct iproc_ctx_s);
-   INIT_LIST_HEAD(>cra_list);
 
crypto->cra_init = ablkcipher_cra_init;
crypto->cra_exit = generic_cra_exit;
@@ -4687,7 +4686,6 @@ static int spu_register_aead(struct iproc_alg_s 
*driver_alg)
aead->base.cra_priority = aead_pri;
aead->base.cra_alignmask = 0;
aead->base.cra_ctxsize = sizeof(struct iproc_ctx_s);
-   INIT_LIST_HEAD(>base.cra_list);
 
aead->base.cra_flags |= CRYPTO_ALG_ASYNC;
/* setkey set in alg initialization */
diff --git a/drivers/crypto/omap-aes.c b/drivers/crypto/omap-aes.c

[PATCH] crypto: inside-secure - remove useless setting of type flags

2018-11-14 Thread Eric Biggers
From: Eric Biggers 

Remove the unnecessary setting of CRYPTO_ALG_TYPE_SKCIPHER.
Commit 2c95e6d97892 ("crypto: skcipher - remove useless setting of type
flags") took care of this everywhere else, but a few more instances made
it into the tree at about the same time.  Squash them before they get
copy+pasted around again.

This patch shouldn't change any actual behavior.

Signed-off-by: Eric Biggers 
---
 drivers/crypto/inside-secure/safexcel_cipher.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/crypto/inside-secure/safexcel_cipher.c 
b/drivers/crypto/inside-secure/safexcel_cipher.c
index 3aef1d43e4351..d531c14020dcb 100644
--- a/drivers/crypto/inside-secure/safexcel_cipher.c
+++ b/drivers/crypto/inside-secure/safexcel_cipher.c
@@ -970,7 +970,7 @@ struct safexcel_alg_template safexcel_alg_cbc_des = {
.cra_name = "cbc(des)",
.cra_driver_name = "safexcel-cbc-des",
.cra_priority = 300,
-   .cra_flags = CRYPTO_ALG_TYPE_SKCIPHER | 
CRYPTO_ALG_ASYNC |
+   .cra_flags = CRYPTO_ALG_ASYNC |
 CRYPTO_ALG_KERN_DRIVER_ONLY,
.cra_blocksize = DES_BLOCK_SIZE,
.cra_ctxsize = sizeof(struct safexcel_cipher_ctx),
@@ -1010,7 +1010,7 @@ struct safexcel_alg_template safexcel_alg_ecb_des = {
.cra_name = "ecb(des)",
.cra_driver_name = "safexcel-ecb-des",
.cra_priority = 300,
-   .cra_flags = CRYPTO_ALG_TYPE_SKCIPHER | 
CRYPTO_ALG_ASYNC |
+   .cra_flags = CRYPTO_ALG_ASYNC |
 CRYPTO_ALG_KERN_DRIVER_ONLY,
.cra_blocksize = DES_BLOCK_SIZE,
.cra_ctxsize = sizeof(struct safexcel_cipher_ctx),
@@ -1074,7 +1074,7 @@ struct safexcel_alg_template safexcel_alg_cbc_des3_ede = {
.cra_name = "cbc(des3_ede)",
.cra_driver_name = "safexcel-cbc-des3_ede",
.cra_priority = 300,
-   .cra_flags = CRYPTO_ALG_TYPE_SKCIPHER | 
CRYPTO_ALG_ASYNC |
+   .cra_flags = CRYPTO_ALG_ASYNC |
 CRYPTO_ALG_KERN_DRIVER_ONLY,
.cra_blocksize = DES3_EDE_BLOCK_SIZE,
.cra_ctxsize = sizeof(struct safexcel_cipher_ctx),
@@ -1114,7 +1114,7 @@ struct safexcel_alg_template safexcel_alg_ecb_des3_ede = {
.cra_name = "ecb(des3_ede)",
.cra_driver_name = "safexcel-ecb-des3_ede",
.cra_priority = 300,
-   .cra_flags = CRYPTO_ALG_TYPE_SKCIPHER | 
CRYPTO_ALG_ASYNC |
+   .cra_flags = CRYPTO_ALG_ASYNC |
 CRYPTO_ALG_KERN_DRIVER_ONLY,
.cra_blocksize = DES3_EDE_BLOCK_SIZE,
.cra_ctxsize = sizeof(struct safexcel_cipher_ctx),
-- 
2.19.1.930.g4563a0d9d0-goog



our urgent respond immediately

2018-11-14 Thread Samira mohamed
Hi Friend I am a bank director of the International Finance Bank Plc
bf .I want to transfer an abandoned sum of 10.5 millions USD  to your
account.50% will be for you. No risk involved. Contact me for more
details. Kindly reply me back to my alternative email address
(samiramohamed5...@gmail.com) mrs samira mohamed


Price Inquiry

2018-11-12 Thread Daniel Murray
Hi,friend,
 
This is Daniel Murray and i am from Sinara Group Co.Ltd Group Co.,LTD in Russia.
We are glad to know about your company from the web and we are interested in 
your products.
Could you kindly send us your Latest catalog and price list for our trial order.
 
Best Regards,
 
Daniel Murray
Purchasing Manager




Inquiry 12/11/2018

2018-11-12 Thread sinara-group
Hi,friend,
 
This is Daniel Murray and i am from Sinara Group Co.Ltd Group Co.,LTD in Russia.
We are glad to know about your company from the web and we are interested in 
your products.
Could you kindly send us your Latest catalog and price list for our trial order.
 
Best Regards,
 
Daniel Murray
Purchasing Manager




Re: Something wrong with cryptodev-2.6 tree?

2018-11-12 Thread Herbert Xu
On Mon, Nov 12, 2018 at 09:44:41AM +0200, Gilad Ben-Yossef wrote:
> Hi,
> 
> It seems that the cryptodev-2.6 tree at
> https://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git
> has somehow rolled back 3 months ago.
> 
> Not sure if it's a git.kernel.org issue or something else but probably
> worth taking a look?

Thanks Gilad.  It should be fixed now.
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Something wrong with cryptodev-2.6 tree?

2018-11-11 Thread Gilad Ben-Yossef
Hi,

It seems that the cryptodev-2.6 tree at
https://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git
has somehow rolled back 3 months ago.

Not sure if it's a git.kernel.org issue or something else but probably
worth taking a look?

Thanks,
Gilad

-- 
Gilad Ben-Yossef
Chief Coffee Drinker

values of β will give rise to dom!


Re: [PATCH 03/17] hw_random: bcm2835-rng: Switch to SPDX identifier

2018-11-11 Thread Lubomir Rintel
On Sat, 2018-11-10 at 15:51 +0100, Stefan Wahren wrote:
> Adopt the SPDX license identifier headers to ease license compliance
> management. While we are at this fix the comment style, too.
> 
> Cc: Lubomir Rintel 
> Signed-off-by: Stefan Wahren 
> ---
>  drivers/char/hw_random/bcm2835-rng.c | 7 ++-
>  1 file changed, 2 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/char/hw_random/bcm2835-rng.c
> b/drivers/char/hw_random/bcm2835-rng.c
> index 6767d96..256b0b1 100644
> --- a/drivers/char/hw_random/bcm2835-rng.c
> +++ b/drivers/char/hw_random/bcm2835-rng.c
> @@ -1,10 +1,7 @@
> -/**
> +// SPDX-License-Identifier: GPL-2.0
> +/*
>   * Copyright (c) 2010-2012 Broadcom. All rights reserved.
>   * Copyright (c) 2013 Lubomir Rintel
> - *
> - * This program is free software; you can redistribute it and/or
> - * modify it under the terms of the GNU General Public License
> ("GPL")
> - * version 2, as published by the Free Software Foundation.
>   */
>  
>  #include 

Acked-by: Lubomir Rintel 



[PATCH 6/6] crypto: x86/chacha20 - Add a 4-block AVX2 variant

2018-11-11 Thread Martin Willi
This variant builds upon the idea of the 2-block AVX2 variant that
shuffles words after each round. The shuffling has a rather high latency,
so the arithmetic units are not optimally used.

Given that we have plenty of registers in AVX, this version parallelizes
the 2-block variant to do four blocks. While the first two blocks are
shuffling, the CPU can do the XORing on the second two blocks and
vice-versa, which makes this version much faster than the SSSE3 variant
for four blocks. The latter is now mostly for systems that do not have
AVX2, but there it is the work-horse, so we keep it in place.

The partial XORing function trailer is very similar to the AVX2 2-block
variant. While it could be shared, that code segment is rather short;
profiling is also easier with the trailer integrated, so we keep it per
function.

Signed-off-by: Martin Willi 
---
 arch/x86/crypto/chacha20-avx2-x86_64.S | 310 +
 arch/x86/crypto/chacha20_glue.c|   7 +
 2 files changed, 317 insertions(+)

diff --git a/arch/x86/crypto/chacha20-avx2-x86_64.S 
b/arch/x86/crypto/chacha20-avx2-x86_64.S
index 8247076b0ba7..b6ab082be657 100644
--- a/arch/x86/crypto/chacha20-avx2-x86_64.S
+++ b/arch/x86/crypto/chacha20-avx2-x86_64.S
@@ -31,6 +31,11 @@ CTRINC:  .octa 0x000300020001
 CTR2BL:.octa 0x
.octa 0x0001
 
+.section   .rodata.cst32.CTR4BL, "aM", @progbits, 32
+.align 32
+CTR4BL:.octa 0x0002
+   .octa 0x0003
+
 .text
 
 ENTRY(chacha20_2block_xor_avx2)
@@ -225,6 +230,311 @@ ENTRY(chacha20_2block_xor_avx2)
 
 ENDPROC(chacha20_2block_xor_avx2)
 
+ENTRY(chacha20_4block_xor_avx2)
+   # %rdi: Input state matrix, s
+   # %rsi: up to 4 data blocks output, o
+   # %rdx: up to 4 data blocks input, i
+   # %rcx: input/output length in bytes
+
+   # This function encrypts four ChaCha20 block by loading the state
+   # matrix four times across eight AVX registers. It performs matrix
+   # operations on four words in two matrices in parallel, sequentially
+   # to the operations on the four words of the other two matrices. The
+   # required word shuffling has a rather high latency, we can do the
+   # arithmetic on two matrix-pairs without much slowdown.
+
+   vzeroupper
+
+   # x0..3[0-4] = s0..3
+   vbroadcasti128  0x00(%rdi),%ymm0
+   vbroadcasti128  0x10(%rdi),%ymm1
+   vbroadcasti128  0x20(%rdi),%ymm2
+   vbroadcasti128  0x30(%rdi),%ymm3
+
+   vmovdqa %ymm0,%ymm4
+   vmovdqa %ymm1,%ymm5
+   vmovdqa %ymm2,%ymm6
+   vmovdqa %ymm3,%ymm7
+
+   vpaddd  CTR2BL(%rip),%ymm3,%ymm3
+   vpaddd  CTR4BL(%rip),%ymm7,%ymm7
+
+   vmovdqa %ymm0,%ymm11
+   vmovdqa %ymm1,%ymm12
+   vmovdqa %ymm2,%ymm13
+   vmovdqa %ymm3,%ymm14
+   vmovdqa %ymm7,%ymm15
+
+   vmovdqa ROT8(%rip),%ymm8
+   vmovdqa ROT16(%rip),%ymm9
+
+   mov %rcx,%rax
+   mov $10,%ecx
+
+.Ldoubleround4:
+
+   # x0 += x1, x3 = rotl32(x3 ^ x0, 16)
+   vpaddd  %ymm1,%ymm0,%ymm0
+   vpxor   %ymm0,%ymm3,%ymm3
+   vpshufb %ymm9,%ymm3,%ymm3
+
+   vpaddd  %ymm5,%ymm4,%ymm4
+   vpxor   %ymm4,%ymm7,%ymm7
+   vpshufb %ymm9,%ymm7,%ymm7
+
+   # x2 += x3, x1 = rotl32(x1 ^ x2, 12)
+   vpaddd  %ymm3,%ymm2,%ymm2
+   vpxor   %ymm2,%ymm1,%ymm1
+   vmovdqa %ymm1,%ymm10
+   vpslld  $12,%ymm10,%ymm10
+   vpsrld  $20,%ymm1,%ymm1
+   vpor%ymm10,%ymm1,%ymm1
+
+   vpaddd  %ymm7,%ymm6,%ymm6
+   vpxor   %ymm6,%ymm5,%ymm5
+   vmovdqa %ymm5,%ymm10
+   vpslld  $12,%ymm10,%ymm10
+   vpsrld  $20,%ymm5,%ymm5
+   vpor%ymm10,%ymm5,%ymm5
+
+   # x0 += x1, x3 = rotl32(x3 ^ x0, 8)
+   vpaddd  %ymm1,%ymm0,%ymm0
+   vpxor   %ymm0,%ymm3,%ymm3
+   vpshufb %ymm8,%ymm3,%ymm3
+
+   vpaddd  %ymm5,%ymm4,%ymm4
+   vpxor   %ymm4,%ymm7,%ymm7
+   vpshufb %ymm8,%ymm7,%ymm7
+
+   # x2 += x3, x1 = rotl32(x1 ^ x2, 7)
+   vpaddd  %ymm3,%ymm2,%ymm2
+   vpxor   %ymm2,%ymm1,%ymm1
+   vmovdqa %ymm1,%ymm10
+   vpslld  $7,%ymm10,%ymm10
+   vpsrld  $25,%ymm1,%ymm1
+   vpor%ymm10,%ymm1,%ymm1
+
+   vpaddd  %ymm7,%ymm6,%ymm6
+   vpxor   %ymm6,%ymm5,%ymm5
+   vmovdqa %ymm5,%ymm10
+   vpslld  $7,%ymm10,%ymm10
+   vpsrld  $25,%ymm5,%ymm5
+   vpor%ymm10,%ymm5,%ymm5
+
+   # x1 = shuffle32(x1, MASK(0, 3, 2, 1))
+   vpshufd $0x39,%ymm1,%ymm1
+   vpshufd   

[PATCH 3/6] crypto: x86/chacha20 - Support partial lengths in 8-block AVX2 variant

2018-11-11 Thread Martin Willi
Add a length argument to the eight block function for AVX2, so the
block function may XOR only a partial length of eight blocks.

To avoid unnecessary operations, we integrate XORing of the first four
blocks in the final lane interleaving; this also avoids some work in
the partial lengths path.

Signed-off-by: Martin Willi 
---
 arch/x86/crypto/chacha20-avx2-x86_64.S | 189 +
 arch/x86/crypto/chacha20_glue.c|   5 +-
 2 files changed, 133 insertions(+), 61 deletions(-)

diff --git a/arch/x86/crypto/chacha20-avx2-x86_64.S 
b/arch/x86/crypto/chacha20-avx2-x86_64.S
index f3cd26f48332..7b62d55bee3d 100644
--- a/arch/x86/crypto/chacha20-avx2-x86_64.S
+++ b/arch/x86/crypto/chacha20-avx2-x86_64.S
@@ -30,8 +30,9 @@ CTRINC:   .octa 0x000300020001
 
 ENTRY(chacha20_8block_xor_avx2)
# %rdi: Input state matrix, s
-   # %rsi: 8 data blocks output, o
-   # %rdx: 8 data blocks input, i
+   # %rsi: up to 8 data blocks output, o
+   # %rdx: up to 8 data blocks input, i
+   # %rcx: input/output length in bytes
 
# This function encrypts eight consecutive ChaCha20 blocks by loading
# the state matrix in AVX registers eight times. As we need some
@@ -48,6 +49,7 @@ ENTRY(chacha20_8block_xor_avx2)
lea 8(%rsp),%r10
and $~31, %rsp
sub $0x80, %rsp
+   mov %rcx,%rax
 
# x0..15[0-7] = s[0..15]
vpbroadcastd0x00(%rdi),%ymm0
@@ -375,74 +377,143 @@ ENTRY(chacha20_8block_xor_avx2)
vpunpckhqdq %ymm15,%ymm0,%ymm15
 
# interleave 128-bit words in state n, n+4
-   vmovdqa 0x00(%rsp),%ymm0
-   vperm2i128  $0x20,%ymm4,%ymm0,%ymm1
-   vperm2i128  $0x31,%ymm4,%ymm0,%ymm4
-   vmovdqa %ymm1,0x00(%rsp)
-   vmovdqa 0x20(%rsp),%ymm0
-   vperm2i128  $0x20,%ymm5,%ymm0,%ymm1
-   vperm2i128  $0x31,%ymm5,%ymm0,%ymm5
-   vmovdqa %ymm1,0x20(%rsp)
-   vmovdqa 0x40(%rsp),%ymm0
-   vperm2i128  $0x20,%ymm6,%ymm0,%ymm1
-   vperm2i128  $0x31,%ymm6,%ymm0,%ymm6
-   vmovdqa %ymm1,0x40(%rsp)
-   vmovdqa 0x60(%rsp),%ymm0
-   vperm2i128  $0x20,%ymm7,%ymm0,%ymm1
-   vperm2i128  $0x31,%ymm7,%ymm0,%ymm7
-   vmovdqa %ymm1,0x60(%rsp)
+   # xor/write first four blocks
+   vmovdqa 0x00(%rsp),%ymm1
+   vperm2i128  $0x20,%ymm4,%ymm1,%ymm0
+   cmp $0x0020,%rax
+   jl  .Lxorpart8
+   vpxor   0x(%rdx),%ymm0,%ymm0
+   vmovdqu %ymm0,0x(%rsi)
+   vperm2i128  $0x31,%ymm4,%ymm1,%ymm4
+
vperm2i128  $0x20,%ymm12,%ymm8,%ymm0
+   cmp $0x0040,%rax
+   jl  .Lxorpart8
+   vpxor   0x0020(%rdx),%ymm0,%ymm0
+   vmovdqu %ymm0,0x0020(%rsi)
vperm2i128  $0x31,%ymm12,%ymm8,%ymm12
-   vmovdqa %ymm0,%ymm8
-   vperm2i128  $0x20,%ymm13,%ymm9,%ymm0
-   vperm2i128  $0x31,%ymm13,%ymm9,%ymm13
-   vmovdqa %ymm0,%ymm9
+
+   vmovdqa 0x40(%rsp),%ymm1
+   vperm2i128  $0x20,%ymm6,%ymm1,%ymm0
+   cmp $0x0060,%rax
+   jl  .Lxorpart8
+   vpxor   0x0040(%rdx),%ymm0,%ymm0
+   vmovdqu %ymm0,0x0040(%rsi)
+   vperm2i128  $0x31,%ymm6,%ymm1,%ymm6
+
vperm2i128  $0x20,%ymm14,%ymm10,%ymm0
+   cmp $0x0080,%rax
+   jl  .Lxorpart8
+   vpxor   0x0060(%rdx),%ymm0,%ymm0
+   vmovdqu %ymm0,0x0060(%rsi)
vperm2i128  $0x31,%ymm14,%ymm10,%ymm14
-   vmovdqa %ymm0,%ymm10
-   vperm2i128  $0x20,%ymm15,%ymm11,%ymm0
-   vperm2i128  $0x31,%ymm15,%ymm11,%ymm15
-   vmovdqa %ymm0,%ymm11
 
-   # xor with corresponding input, write to output
-   vmovdqa 0x00(%rsp),%ymm0
-   vpxor   0x(%rdx),%ymm0,%ymm0
-   vmovdqu %ymm0,0x(%rsi)
-   vmovdqa 0x20(%rsp),%ymm0
+   vmovdqa 0x20(%rsp),%ymm1
+   vperm2i128  $0x20,%ymm5,%ymm1,%ymm0
+   cmp $0x00a0,%rax
+   jl  .Lxorpart8
vpxor   0x0080(%rdx),%ymm0,%ymm0
vmovdqu %ymm0,0x0080(%rsi)
-   vmovdqa 0x40(%rsp),%ymm0
-   vpxor   0x0040(%rdx),%ymm0,%ymm0
-   vmovdqu %ymm0,0x0040(%rsi)
-   vmovdqa 0x60(%rsp),%ymm0
+   vperm2i128  $0x31,%ymm5,%ymm1,%ymm5
+
+   vperm2i128  $0x20,%ymm13,%ymm9,%ymm0
+   cmp $0x00c0,%rax
+   jl  .Lxorpart8
+   vpxor   0x00a0(%rdx),%ymm0,%ymm0
+   vmovdqu %ymm0,0x00a0(%rsi)
+   vperm2i128  $0x31,%ymm13,%ymm9,%ymm13
+
+   vmovdqa 0x60(%rsp),%ymm1
+   vperm2i128  $0x20,%ymm7,%ymm1,%ymm0
+   cmp $0x00e0,%rax
+   

[PATCH 4/6] crypto: x86/chacha20 - Use larger block functions more aggressively

2018-11-11 Thread Martin Willi
Now that all block functions support partial lengths, engage the wider
block sizes more aggressively. This prevents using smaller block
functions multiple times, where the next larger block function would
have been faster.

Signed-off-by: Martin Willi 
---
 arch/x86/crypto/chacha20_glue.c | 39 -
 1 file changed, 24 insertions(+), 15 deletions(-)

diff --git a/arch/x86/crypto/chacha20_glue.c b/arch/x86/crypto/chacha20_glue.c
index 882e8bf5965a..b541da71f11e 100644
--- a/arch/x86/crypto/chacha20_glue.c
+++ b/arch/x86/crypto/chacha20_glue.c
@@ -29,6 +29,12 @@ asmlinkage void chacha20_8block_xor_avx2(u32 *state, u8 
*dst, const u8 *src,
 static bool chacha20_use_avx2;
 #endif
 
+static unsigned int chacha20_advance(unsigned int len, unsigned int maxblocks)
+{
+   len = min(len, maxblocks * CHACHA20_BLOCK_SIZE);
+   return round_up(len, CHACHA20_BLOCK_SIZE) / CHACHA20_BLOCK_SIZE;
+}
+
 static void chacha20_dosimd(u32 *state, u8 *dst, const u8 *src,
unsigned int bytes)
 {
@@ -41,6 +47,11 @@ static void chacha20_dosimd(u32 *state, u8 *dst, const u8 
*src,
dst += CHACHA20_BLOCK_SIZE * 8;
state[12] += 8;
}
+   if (bytes > CHACHA20_BLOCK_SIZE * 4) {
+   chacha20_8block_xor_avx2(state, dst, src, bytes);
+   state[12] += chacha20_advance(bytes, 8);
+   return;
+   }
}
 #endif
while (bytes >= CHACHA20_BLOCK_SIZE * 4) {
@@ -50,15 +61,14 @@ static void chacha20_dosimd(u32 *state, u8 *dst, const u8 
*src,
dst += CHACHA20_BLOCK_SIZE * 4;
state[12] += 4;
}
-   while (bytes >= CHACHA20_BLOCK_SIZE) {
-   chacha20_block_xor_ssse3(state, dst, src, bytes);
-   bytes -= CHACHA20_BLOCK_SIZE;
-   src += CHACHA20_BLOCK_SIZE;
-   dst += CHACHA20_BLOCK_SIZE;
-   state[12]++;
+   if (bytes > CHACHA20_BLOCK_SIZE) {
+   chacha20_4block_xor_ssse3(state, dst, src, bytes);
+   state[12] += chacha20_advance(bytes, 4);
+   return;
}
if (bytes) {
chacha20_block_xor_ssse3(state, dst, src, bytes);
+   state[12]++;
}
 }
 
@@ -82,17 +92,16 @@ static int chacha20_simd(struct skcipher_request *req)
 
kernel_fpu_begin();
 
-   while (walk.nbytes >= CHACHA20_BLOCK_SIZE) {
-   chacha20_dosimd(state, walk.dst.virt.addr, walk.src.virt.addr,
-   rounddown(walk.nbytes, CHACHA20_BLOCK_SIZE));
-   err = skcipher_walk_done(,
-walk.nbytes % CHACHA20_BLOCK_SIZE);
-   }
+   while (walk.nbytes > 0) {
+   unsigned int nbytes = walk.nbytes;
+
+   if (nbytes < walk.total)
+   nbytes = round_down(nbytes, walk.stride);
 
-   if (walk.nbytes) {
chacha20_dosimd(state, walk.dst.virt.addr, walk.src.virt.addr,
-   walk.nbytes);
-   err = skcipher_walk_done(, 0);
+   nbytes);
+
+   err = skcipher_walk_done(, walk.nbytes - nbytes);
}
 
kernel_fpu_end();
-- 
2.17.1



[PATCH 1/6] crypto: x86/chacha20 - Support partial lengths in 1-block SSSE3 variant

2018-11-11 Thread Martin Willi
Add a length argument to the single block function for SSSE3, so the
block function may XOR only a partial length of the full block. Given
that the setup code is rather cheap, the function does not process more
than one block; this allows us to keep the block function selection in
the C glue code.

The required branching does not negatively affect performance for full
block sizes. The partial XORing uses simple "rep movsb" to copy the
data before and after doing XOR in SSE. This is rather efficient on
modern processors; movsw can be slightly faster, but the additional
complexity is probably not worth it.

Signed-off-by: Martin Willi 
---
 arch/x86/crypto/chacha20-ssse3-x86_64.S | 74 -
 arch/x86/crypto/chacha20_glue.c | 11 ++--
 2 files changed, 63 insertions(+), 22 deletions(-)

diff --git a/arch/x86/crypto/chacha20-ssse3-x86_64.S 
b/arch/x86/crypto/chacha20-ssse3-x86_64.S
index 512a2b500fd1..98d130b5e4ab 100644
--- a/arch/x86/crypto/chacha20-ssse3-x86_64.S
+++ b/arch/x86/crypto/chacha20-ssse3-x86_64.S
@@ -25,12 +25,13 @@ CTRINC: .octa 0x000300020001
 
 ENTRY(chacha20_block_xor_ssse3)
# %rdi: Input state matrix, s
-   # %rsi: 1 data block output, o
-   # %rdx: 1 data block input, i
+   # %rsi: up to 1 data block output, o
+   # %rdx: up to 1 data block input, i
+   # %rcx: input/output length in bytes
 
# This function encrypts one ChaCha20 block by loading the state matrix
# in four SSE registers. It performs matrix operation on four words in
-   # parallel, but requireds shuffling to rearrange the words after each
+   # parallel, but requires shuffling to rearrange the words after each
# round. 8/16-bit word rotation is done with the slightly better
# performing SSSE3 byte shuffling, 7/12-bit word rotation uses
# traditional shift+OR.
@@ -48,7 +49,8 @@ ENTRY(chacha20_block_xor_ssse3)
movdqa  ROT8(%rip),%xmm4
movdqa  ROT16(%rip),%xmm5
 
-   mov $10,%ecx
+   mov %rcx,%rax
+   mov $10,%ecx
 
 .Ldoubleround:
 
@@ -122,27 +124,69 @@ ENTRY(chacha20_block_xor_ssse3)
jnz .Ldoubleround
 
# o0 = i0 ^ (x0 + s0)
-   movdqu  0x00(%rdx),%xmm4
paddd   %xmm8,%xmm0
+   cmp $0x10,%rax
+   jl  .Lxorpart
+   movdqu  0x00(%rdx),%xmm4
pxor%xmm4,%xmm0
movdqu  %xmm0,0x00(%rsi)
# o1 = i1 ^ (x1 + s1)
-   movdqu  0x10(%rdx),%xmm5
paddd   %xmm9,%xmm1
-   pxor%xmm5,%xmm1
-   movdqu  %xmm1,0x10(%rsi)
+   movdqa  %xmm1,%xmm0
+   cmp $0x20,%rax
+   jl  .Lxorpart
+   movdqu  0x10(%rdx),%xmm0
+   pxor%xmm1,%xmm0
+   movdqu  %xmm0,0x10(%rsi)
# o2 = i2 ^ (x2 + s2)
-   movdqu  0x20(%rdx),%xmm6
paddd   %xmm10,%xmm2
-   pxor%xmm6,%xmm2
-   movdqu  %xmm2,0x20(%rsi)
+   movdqa  %xmm2,%xmm0
+   cmp $0x30,%rax
+   jl  .Lxorpart
+   movdqu  0x20(%rdx),%xmm0
+   pxor%xmm2,%xmm0
+   movdqu  %xmm0,0x20(%rsi)
# o3 = i3 ^ (x3 + s3)
-   movdqu  0x30(%rdx),%xmm7
paddd   %xmm11,%xmm3
-   pxor%xmm7,%xmm3
-   movdqu  %xmm3,0x30(%rsi)
-
+   movdqa  %xmm3,%xmm0
+   cmp $0x40,%rax
+   jl  .Lxorpart
+   movdqu  0x30(%rdx),%xmm0
+   pxor%xmm3,%xmm0
+   movdqu  %xmm0,0x30(%rsi)
+
+.Ldone:
ret
+
+.Lxorpart:
+   # xor remaining bytes from partial register into output
+   mov %rax,%r9
+   and $0x0f,%r9
+   jz  .Ldone
+   and $~0x0f,%rax
+
+   mov %rsi,%r11
+
+   lea 8(%rsp),%r10
+   sub $0x10,%rsp
+   and $~31,%rsp
+
+   lea (%rdx,%rax),%rsi
+   mov %rsp,%rdi
+   mov %r9,%rcx
+   rep movsb
+
+   pxor0x00(%rsp),%xmm0
+   movdqa  %xmm0,0x00(%rsp)
+
+   mov %rsp,%rsi
+   lea (%r11,%rax),%rdi
+   mov %r9,%rcx
+   rep movsb
+
+   lea -8(%r10),%rsp
+   jmp .Ldone
+
 ENDPROC(chacha20_block_xor_ssse3)
 
 ENTRY(chacha20_4block_xor_ssse3)
diff --git a/arch/x86/crypto/chacha20_glue.c b/arch/x86/crypto/chacha20_glue.c
index dce7c5d39c2f..cc4571736ce8 100644
--- a/arch/x86/crypto/chacha20_glue.c
+++ b/arch/x86/crypto/chacha20_glue.c
@@ -19,7 +19,8 @@
 
 #define CHACHA20_STATE_ALIGN 16
 
-asmlinkage void chacha20_block_xor_ssse3(u32 *state, u8 *dst, const u8 *src);
+asmlinkage void chacha20_block_xor_ssse3(u32 

[PATCH 2/6] crypto: x86/chacha20 - Support partial lengths in 4-block SSSE3 variant

2018-11-11 Thread Martin Willi
Add a length argument to the quad block function for SSSE3, so the
block function may XOR only a partial length of four blocks.

As we already have the stack set up, the partial XORing does not need
to. This gives a slightly different function trailer, so we keep that
separate from the 1-block function.

Signed-off-by: Martin Willi 
---
 arch/x86/crypto/chacha20-ssse3-x86_64.S | 163 ++--
 arch/x86/crypto/chacha20_glue.c |   5 +-
 2 files changed, 128 insertions(+), 40 deletions(-)

diff --git a/arch/x86/crypto/chacha20-ssse3-x86_64.S 
b/arch/x86/crypto/chacha20-ssse3-x86_64.S
index 98d130b5e4ab..d8ac75bb448f 100644
--- a/arch/x86/crypto/chacha20-ssse3-x86_64.S
+++ b/arch/x86/crypto/chacha20-ssse3-x86_64.S
@@ -191,8 +191,9 @@ ENDPROC(chacha20_block_xor_ssse3)
 
 ENTRY(chacha20_4block_xor_ssse3)
# %rdi: Input state matrix, s
-   # %rsi: 4 data blocks output, o
-   # %rdx: 4 data blocks input, i
+   # %rsi: up to 4 data blocks output, o
+   # %rdx: up to 4 data blocks input, i
+   # %rcx: input/output length in bytes
 
# This function encrypts four consecutive ChaCha20 blocks by loading the
# the state matrix in SSE registers four times. As we need some scratch
@@ -207,6 +208,7 @@ ENTRY(chacha20_4block_xor_ssse3)
lea 8(%rsp),%r10
sub $0x80,%rsp
and $~63,%rsp
+   mov %rcx,%rax
 
# x0..15[0-3] = s0..3[0..3]
movq0x00(%rdi),%xmm1
@@ -617,58 +619,143 @@ ENTRY(chacha20_4block_xor_ssse3)
 
# xor with corresponding input, write to output
movdqa  0x00(%rsp),%xmm0
+   cmp $0x10,%rax
+   jl  .Lxorpart4
movdqu  0x00(%rdx),%xmm1
pxor%xmm1,%xmm0
movdqu  %xmm0,0x00(%rsi)
-   movdqa  0x10(%rsp),%xmm0
-   movdqu  0x80(%rdx),%xmm1
+
+   movdqu  %xmm4,%xmm0
+   cmp $0x20,%rax
+   jl  .Lxorpart4
+   movdqu  0x10(%rdx),%xmm1
pxor%xmm1,%xmm0
-   movdqu  %xmm0,0x80(%rsi)
+   movdqu  %xmm0,0x10(%rsi)
+
+   movdqu  %xmm8,%xmm0
+   cmp $0x30,%rax
+   jl  .Lxorpart4
+   movdqu  0x20(%rdx),%xmm1
+   pxor%xmm1,%xmm0
+   movdqu  %xmm0,0x20(%rsi)
+
+   movdqu  %xmm12,%xmm0
+   cmp $0x40,%rax
+   jl  .Lxorpart4
+   movdqu  0x30(%rdx),%xmm1
+   pxor%xmm1,%xmm0
+   movdqu  %xmm0,0x30(%rsi)
+
movdqa  0x20(%rsp),%xmm0
+   cmp $0x50,%rax
+   jl  .Lxorpart4
movdqu  0x40(%rdx),%xmm1
pxor%xmm1,%xmm0
movdqu  %xmm0,0x40(%rsi)
+
+   movdqu  %xmm6,%xmm0
+   cmp $0x60,%rax
+   jl  .Lxorpart4
+   movdqu  0x50(%rdx),%xmm1
+   pxor%xmm1,%xmm0
+   movdqu  %xmm0,0x50(%rsi)
+
+   movdqu  %xmm10,%xmm0
+   cmp $0x70,%rax
+   jl  .Lxorpart4
+   movdqu  0x60(%rdx),%xmm1
+   pxor%xmm1,%xmm0
+   movdqu  %xmm0,0x60(%rsi)
+
+   movdqu  %xmm14,%xmm0
+   cmp $0x80,%rax
+   jl  .Lxorpart4
+   movdqu  0x70(%rdx),%xmm1
+   pxor%xmm1,%xmm0
+   movdqu  %xmm0,0x70(%rsi)
+
+   movdqa  0x10(%rsp),%xmm0
+   cmp $0x90,%rax
+   jl  .Lxorpart4
+   movdqu  0x80(%rdx),%xmm1
+   pxor%xmm1,%xmm0
+   movdqu  %xmm0,0x80(%rsi)
+
+   movdqu  %xmm5,%xmm0
+   cmp $0xa0,%rax
+   jl  .Lxorpart4
+   movdqu  0x90(%rdx),%xmm1
+   pxor%xmm1,%xmm0
+   movdqu  %xmm0,0x90(%rsi)
+
+   movdqu  %xmm9,%xmm0
+   cmp $0xb0,%rax
+   jl  .Lxorpart4
+   movdqu  0xa0(%rdx),%xmm1
+   pxor%xmm1,%xmm0
+   movdqu  %xmm0,0xa0(%rsi)
+
+   movdqu  %xmm13,%xmm0
+   cmp $0xc0,%rax
+   jl  .Lxorpart4
+   movdqu  0xb0(%rdx),%xmm1
+   pxor%xmm1,%xmm0
+   movdqu  %xmm0,0xb0(%rsi)
+
movdqa  0x30(%rsp),%xmm0
+   cmp $0xd0,%rax
+   jl  .Lxorpart4
movdqu  0xc0(%rdx),%xmm1
pxor%xmm1,%xmm0
movdqu  %xmm0,0xc0(%rsi)
-   movdqu  0x10(%rdx),%xmm1
-   pxor%xmm1,%xmm4
-   movdqu  %xmm4,0x10(%rsi)
-   movdqu  0x90(%rdx),%xmm1
-   pxor%xmm1,%xmm5
-   movdqu  %xmm5,0x90(%rsi)
-   movdqu  0x50(%rdx),%xmm1
-   

[PATCH 0/6] crypto: x86/chacha20 - SIMD performance improvements

2018-11-11 Thread Martin Willi
This patchset improves performance of the ChaCha20 SIMD implementations
for x86_64. For some specific encryption lengths, performance is more
than doubled. Two mechanisms are used to achieve this:

* Instead of calculating the minimal number of required blocks for a
  given encryption length, functions producing more blocks are used
  more aggressively. Calculating a 4-block function can be faster than
  calculating a 2-block and a 1-block function, even if only three
  blocks are actually required.

* In addition to the 8-block AVX2 function, a 4-block and a 2-block
  function are introduced.

Patches 1-3 add support for partial lengths to the existing 1-, 4- and
8-block functions. Patch 4 makes use of that by engaging the next higher
level block functions more aggressively. Patch 5 and 6 add the new AVX2
functions for 2 and 4 blocks. Patches are based on cryptodev and would
need adjustments to apply on top of the Adiantum patchset.

Note that the more aggressive use of larger block functions calculate
blocks that may get discarded. This may have a negative impact on energy
usage or the processors thermal budget. However, with the new block
functions we can avoid this over-calculation for many lengths, so the
performance win can be considered more important.

Below are performance numbers measured with tcrypt using additional
encryption lengths; numbers in kOps/s, on my i7-5557U. old is the
existing, new the implementation with this patchset. As comparison
the numbers for zinc in v6:

 len  old  new zinc
   8 5908 5818 5818
  16 5917 5828 5726
  24 5916 5869 5757
  32 5920 5789 5813
  40 5868 5799 5710
  48 5877 5761 5761
  56 5869 5797 5742
  64 5897 5862 5685
  72 3381 4979 3520
  80 3364 5541 3475
  88 3350 4977 3424
  96 3342 5530 3371
 104 3328 4923 3313
 112 3317 5528 3207
 120 3313 4970 3150
 128 3492 5535 3568
 136 2487 4570 3690
 144 2481 5047 3599
 152 2473 4565 3566
 160 2459 5022 3515
 168 2461 4550 3437
 176 2454 5020 3325
 184 2449 4535 3279
 192 2538 5011 3762
 200 1962 4537 3702
 208 1962 4971 3622
 216 1954 4487 3518
 224 1949 4936 3445
 232 1948 4497 3422
 240 1941 4947 3317
 248 1940 4481 3279
 256 3798 4964 3723
 264 2638 3577 3639
 272 2637 3567 3597
 280 2628 3563 3565
 288 2630 3795 3484
 296 2621 3580 3422
 304 2612 3569 3352
 312 2602 3599 3308
 320 2694 3821 3694
 328 2060 3538 3681
 336 2054 3565 3599
 344 2054 3553 3523
 352 2049 3809 3419
 360 2045 3575 3403
 368 2035 3560 3334
 376 2036 3555 3257
 384 2092 3785 3715
 392 1691 3505 3612
 400 1684 3527 3553
 408 1686 3527 3496
 416 1684 3804 3430
 424 1681 3555 3402
 432 1675 3559 3311
 440 1672 3558 3275
 448 1710 3780 3689
 456 1431 3541 3618
 464 1428 3538 3576
 472 1430 3527 3509
 480 1426 3788 3405
 488 1423 3502 3397
 496 1423 3519 3298
 504 1418 3519 3277
 512 3694 3736 3735
 520 2601 2571 2209
 528 2601 2677 2148
 536 2587 2534 2164
 544 2578 2659 2138
 552 2570 2552 2126
 560 2566 2661 2035
 568 2567 2542 2041
 576 2639 2674 2199
 584 2031 2531 2183
 592 2027 2660 2145
 600 2016 2513 2155
 608 2009 2638 2133
 616 2006 2522 2115
 624 2000 2649 2064
 632 1996 2518 2045
 640 2053 2651 2188
 648 1666 2402 2182
 656 1663 2517 2158
 664 1659 2397 2147
 672 1657 2510 2139
 680 1656 2394 2114
 688 1653 2497 2077
 696 1646 2393 2043
 704 1678 2510 2208
 712 1414 2391 2189
 720 1412 2506 2169
 728 1411 2384 2145
 736 1408 2494 2142
 744 1408 2379 2081
 752 1405 2485 2064
 760 1403 2376 2043
 768 2189 2498 2211
 776 1756 2137 2192
 784 1746 2145 2146
 792 1744 2141 2141
 800 1743  2094
 808 1742 2140 2100
 816 1735 2134 2061
 824 1731 2135 2045
 832 1778  2223
 840 1480 2132 2184
 848 1480 2134 2173
 856 1476 2124 2145
 864 1474 2210 2126
 872 1472 2127 2105
 880 1463 2123 2056
 888 1468 2123 2043
 896 1494 2208 2219
 904 1278 2120 2192
 912 1277 2121 2170
 920 1273 2118 2149
 928 1272 2207 2125
 936 1267 2125 2098
 944 1265 2127 2060
 952 1267 2126 2049
 960 1289 2213 2204
 968 1125 2123 2187
 976 1122 2127 2166
 984 1120 2123 2136
 992 1118 2207 2119
1000 1118 2120 2101
1008 1117 2122 2042
1016 1115 2121 2048
1024 2174 2191 2195
1032 1748 1724 1565
1040 1745 1782 1544
1048 1736 1737 1554
1056 1738 1802 1541
1064 1735 1728 1523
1072 1730 1780 1507
1080 1729 1724 1497
1088 1757 1783 1592
1096 1475 1723 1575
1104 1474 1778 1563
1112 1472 1708 1544
1120 1468 1774 1521
1128 1466 1718 1521
1136 1462 1780 1501
1144 1460 1719 1491
1152 1481 1782 1575
1160 1271 1647 1558
1168 1271 1706 1554
1176 1268 1645 1545
1184 1265 1711 1538
1192 1265 1648 1530
1200 1264 1705 1493
1208 1262 1647 1498
1216 1277 1695 1581
1224 1120 1642 1563
1232 1115 1702 1549
1240 1121 1646 1538
1248 1119 1703 1527
1256 1115 1640 1520
1264 1114 1693 1505
1272 1112 1642 1492
1280 1552 1699 1574
1288 1314 1525 1573
1296 1315 1522 1551
1304 1312 1521 1548
1312 1311 1564 1535
1320 1309 1518 1524
1328 1302 1527 1508
1336 1303 1521 1500
1344 1333 1561 1579
1352 1157 1524 1573
1360 1152 1520 1546
1368 1154 1522 1545
1376 1153 1562 1536
1384 1151 1525 1526
1392 

[PATCH 5/6] crypto: x86/chacha20 - Add a 2-block AVX2 variant

2018-11-11 Thread Martin Willi
This variant uses the same principle as the single block SSSE3 variant
by shuffling the state matrix after each round. With the wider AVX
registers, we can do two blocks in parallel, though.

This function can increase performance and efficiency significantly for
lengths that would otherwise require a 4-block function.

Signed-off-by: Martin Willi 
---
 arch/x86/crypto/chacha20-avx2-x86_64.S | 197 +
 arch/x86/crypto/chacha20_glue.c|   7 +
 2 files changed, 204 insertions(+)

diff --git a/arch/x86/crypto/chacha20-avx2-x86_64.S 
b/arch/x86/crypto/chacha20-avx2-x86_64.S
index 7b62d55bee3d..8247076b0ba7 100644
--- a/arch/x86/crypto/chacha20-avx2-x86_64.S
+++ b/arch/x86/crypto/chacha20-avx2-x86_64.S
@@ -26,8 +26,205 @@ ROT16:  .octa 0x0d0c0f0e09080b0a0504070601000302
 CTRINC:.octa 0x000300020001
.octa 0x0007000600050004
 
+.section   .rodata.cst32.CTR2BL, "aM", @progbits, 32
+.align 32
+CTR2BL:.octa 0x
+   .octa 0x0001
+
 .text
 
+ENTRY(chacha20_2block_xor_avx2)
+   # %rdi: Input state matrix, s
+   # %rsi: up to 2 data blocks output, o
+   # %rdx: up to 2 data blocks input, i
+   # %rcx: input/output length in bytes
+
+   # This function encrypts two ChaCha20 blocks by loading the state
+   # matrix twice across four AVX registers. It performs matrix operations
+   # on four words in each matrix in parallel, but requires shuffling to
+   # rearrange the words after each round.
+
+   vzeroupper
+
+   # x0..3[0-2] = s0..3
+   vbroadcasti128  0x00(%rdi),%ymm0
+   vbroadcasti128  0x10(%rdi),%ymm1
+   vbroadcasti128  0x20(%rdi),%ymm2
+   vbroadcasti128  0x30(%rdi),%ymm3
+
+   vpaddd  CTR2BL(%rip),%ymm3,%ymm3
+
+   vmovdqa %ymm0,%ymm8
+   vmovdqa %ymm1,%ymm9
+   vmovdqa %ymm2,%ymm10
+   vmovdqa %ymm3,%ymm11
+
+   vmovdqa ROT8(%rip),%ymm4
+   vmovdqa ROT16(%rip),%ymm5
+
+   mov %rcx,%rax
+   mov $10,%ecx
+
+.Ldoubleround:
+
+   # x0 += x1, x3 = rotl32(x3 ^ x0, 16)
+   vpaddd  %ymm1,%ymm0,%ymm0
+   vpxor   %ymm0,%ymm3,%ymm3
+   vpshufb %ymm5,%ymm3,%ymm3
+
+   # x2 += x3, x1 = rotl32(x1 ^ x2, 12)
+   vpaddd  %ymm3,%ymm2,%ymm2
+   vpxor   %ymm2,%ymm1,%ymm1
+   vmovdqa %ymm1,%ymm6
+   vpslld  $12,%ymm6,%ymm6
+   vpsrld  $20,%ymm1,%ymm1
+   vpor%ymm6,%ymm1,%ymm1
+
+   # x0 += x1, x3 = rotl32(x3 ^ x0, 8)
+   vpaddd  %ymm1,%ymm0,%ymm0
+   vpxor   %ymm0,%ymm3,%ymm3
+   vpshufb %ymm4,%ymm3,%ymm3
+
+   # x2 += x3, x1 = rotl32(x1 ^ x2, 7)
+   vpaddd  %ymm3,%ymm2,%ymm2
+   vpxor   %ymm2,%ymm1,%ymm1
+   vmovdqa %ymm1,%ymm7
+   vpslld  $7,%ymm7,%ymm7
+   vpsrld  $25,%ymm1,%ymm1
+   vpor%ymm7,%ymm1,%ymm1
+
+   # x1 = shuffle32(x1, MASK(0, 3, 2, 1))
+   vpshufd $0x39,%ymm1,%ymm1
+   # x2 = shuffle32(x2, MASK(1, 0, 3, 2))
+   vpshufd $0x4e,%ymm2,%ymm2
+   # x3 = shuffle32(x3, MASK(2, 1, 0, 3))
+   vpshufd $0x93,%ymm3,%ymm3
+
+   # x0 += x1, x3 = rotl32(x3 ^ x0, 16)
+   vpaddd  %ymm1,%ymm0,%ymm0
+   vpxor   %ymm0,%ymm3,%ymm3
+   vpshufb %ymm5,%ymm3,%ymm3
+
+   # x2 += x3, x1 = rotl32(x1 ^ x2, 12)
+   vpaddd  %ymm3,%ymm2,%ymm2
+   vpxor   %ymm2,%ymm1,%ymm1
+   vmovdqa %ymm1,%ymm6
+   vpslld  $12,%ymm6,%ymm6
+   vpsrld  $20,%ymm1,%ymm1
+   vpor%ymm6,%ymm1,%ymm1
+
+   # x0 += x1, x3 = rotl32(x3 ^ x0, 8)
+   vpaddd  %ymm1,%ymm0,%ymm0
+   vpxor   %ymm0,%ymm3,%ymm3
+   vpshufb %ymm4,%ymm3,%ymm3
+
+   # x2 += x3, x1 = rotl32(x1 ^ x2, 7)
+   vpaddd  %ymm3,%ymm2,%ymm2
+   vpxor   %ymm2,%ymm1,%ymm1
+   vmovdqa %ymm1,%ymm7
+   vpslld  $7,%ymm7,%ymm7
+   vpsrld  $25,%ymm1,%ymm1
+   vpor%ymm7,%ymm1,%ymm1
+
+   # x1 = shuffle32(x1, MASK(2, 1, 0, 3))
+   vpshufd $0x93,%ymm1,%ymm1
+   # x2 = shuffle32(x2, MASK(1, 0, 3, 2))
+   vpshufd $0x4e,%ymm2,%ymm2
+   # x3 = shuffle32(x3, MASK(0, 3, 2, 1))
+   vpshufd $0x39,%ymm3,%ymm3
+
+   dec %ecx
+   jnz .Ldoubleround
+
+   # o0 = i0 ^ (x0 + s0)
+   vpaddd  %ymm8,%ymm0,%ymm7
+   cmp $0x10,%rax
+   jl  .Lxorpart2
+   vpxor   0x00(%rdx),%xmm7,%xmm6
+   vmovdqu %xmm6,0x00(%rsi)
+   vextracti128$1,%ymm7,%xmm0
+   # o1 = i1 ^ (x1 + s1)
+   vpaddd  %ymm9,%ymm1,%ymm7
+   cmp 

Re: [PATCH 03/17] hw_random: bcm2835-rng: Switch to SPDX identifier

2018-11-10 Thread Eric Anholt
Stefan Wahren  writes:

> Adopt the SPDX license identifier headers to ease license compliance
> management. While we are at this fix the comment style, too.

Reviewed-by: Eric Anholt 


signature.asc
Description: PGP signature


Re: [PATCH 03/17] hw_random: bcm2835-rng: Switch to SPDX identifier

2018-11-10 Thread Greg Kroah-Hartman
On Sat, Nov 10, 2018 at 03:51:16PM +0100, Stefan Wahren wrote:
> Adopt the SPDX license identifier headers to ease license compliance
> management. While we are at this fix the comment style, too.
> 
> Cc: Lubomir Rintel 
> Signed-off-by: Stefan Wahren 
> ---
>  drivers/char/hw_random/bcm2835-rng.c | 7 ++-
>  1 file changed, 2 insertions(+), 5 deletions(-)

Acked-by: Greg Kroah-Hartman 


[PATCH 03/17] hw_random: bcm2835-rng: Switch to SPDX identifier

2018-11-10 Thread Stefan Wahren
Adopt the SPDX license identifier headers to ease license compliance
management. While we are at this fix the comment style, too.

Cc: Lubomir Rintel 
Signed-off-by: Stefan Wahren 
---
 drivers/char/hw_random/bcm2835-rng.c | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/char/hw_random/bcm2835-rng.c 
b/drivers/char/hw_random/bcm2835-rng.c
index 6767d96..256b0b1 100644
--- a/drivers/char/hw_random/bcm2835-rng.c
+++ b/drivers/char/hw_random/bcm2835-rng.c
@@ -1,10 +1,7 @@
-/**
+// SPDX-License-Identifier: GPL-2.0
+/*
  * Copyright (c) 2010-2012 Broadcom. All rights reserved.
  * Copyright (c) 2013 Lubomir Rintel
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of the GNU General Public License ("GPL")
- * version 2, as published by the Free Software Foundation.
  */
 
 #include 
-- 
2.7.4



How driver can mark the algo implementation Unavailable

2018-11-09 Thread Harsh Jain
Hi All,

PCI based devices can be shutdown from sysfs interface

echo "unbind" > /sys/bus/pci/drivers/cxgb4/unbind

In case device has active Transformation(tfm), Drivers cannot un-register the 
Algorithms because alg->cra_refcnt will be non zero.

Can driver use the "CRYPTO_ALG_DEAD" flag to mark it un-available so that 
crypto_alg_lookup does not allocate new tfm using dead algo.

Regards

Harsh Jain



Re: [PATCH 1/2] crypto: fix cfb mode decryption

2018-11-09 Thread Herbert Xu
On Sat, Oct 20, 2018 at 02:01:52AM +0300, Dmitry Eremin-Solenikov wrote:
> crypto_cfb_decrypt_segment() incorrectly XOR'ed generated keystream with
> IV, rather than with data stream, resulting in incorrect decryption.
> Test vectors will be added in the next patch.
> 
> Signed-off-by: Dmitry Eremin-Solenikov 
> Cc: sta...@vger.kernel.org
> ---
>  crypto/cfb.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

All applied.  Thanks.
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: [PATCH v3 0/2] crypto: some hardening against AES cache-timing attacks

2018-11-09 Thread Herbert Xu
On Wed, Oct 17, 2018 at 09:37:57PM -0700, Eric Biggers wrote:
> This series makes the "aes-fixed-time" and "aes-arm" implementations of
> AES more resistant to cache-timing attacks.
> 
> Note that even after these changes, the implementations still aren't
> necessarily guaranteed to be constant-time; see
> https://cr.yp.to/antiforgery/cachetiming-20050414.pdf for a discussion
> of the many difficulties involved in writing truly constant-time AES
> software.  But it's valuable to make such attacks more difficult.
> 
> Changed since v2:
> - In aes-arm, move the IRQ disable/enable into the assembly file.
> - Other aes-arm tweaks.
> - Add Kconfig help text.
> 
> Thanks to Ard Biesheuvel for the suggestions.
> 
> Eric Biggers (2):
>   crypto: aes_ti - disable interrupts while accessing S-box
>   crypto: arm/aes - add some hardening against cache-timing attacks
> 
>  arch/arm/crypto/Kconfig   |  9 +
>  arch/arm/crypto/aes-cipher-core.S | 62 ++-
>  crypto/Kconfig|  3 +-
>  crypto/aes_generic.c  |  9 +++--
>  crypto/aes_ti.c   | 18 +
>  5 files changed, 86 insertions(+), 15 deletions(-)

All applied.  Thanks.
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: [PATCH] crypto/simd: correctly take reqsize of wrapped skcipher into account

2018-11-09 Thread Ard Biesheuvel
On 9 November 2018 at 10:45, Herbert Xu  wrote:
> On Fri, Nov 09, 2018 at 05:44:47PM +0800, Herbert Xu wrote:
>> On Fri, Nov 09, 2018 at 12:33:23AM +0100, Ard Biesheuvel wrote:
>> >
>> > This should be
>> >
>> > reqsize += max(crypto_skcipher_reqsize(_tfm->base);
>> >crypto_skcipher_reqsize(cryptd_skcipher_child(cryptd_tfm)));
>> >
>> > since the cryptd path in simd still needs some space in the subreq for
>> > the completion.
>>
>> OK this is what I applied to the cryptodev tree, please double-check
>> to see if I did anything silly:
>
> I meant the crypto tree rather than cryptodev.
>

That looks fine. Thanks Herbert.


Re: [PATCH] crypto/simd: correctly take reqsize of wrapped skcipher into account

2018-11-09 Thread Herbert Xu
On Fri, Nov 09, 2018 at 05:44:47PM +0800, Herbert Xu wrote:
> On Fri, Nov 09, 2018 at 12:33:23AM +0100, Ard Biesheuvel wrote:
> >
> > This should be
> > 
> > reqsize += max(crypto_skcipher_reqsize(_tfm->base);
> >crypto_skcipher_reqsize(cryptd_skcipher_child(cryptd_tfm)));
> > 
> > since the cryptd path in simd still needs some space in the subreq for
> > the completion.
> 
> OK this is what I applied to the cryptodev tree, please double-check
> to see if I did anything silly:

I meant the crypto tree rather than cryptodev.

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: [PATCH] crypto/simd: correctly take reqsize of wrapped skcipher into account

2018-11-09 Thread Herbert Xu
On Fri, Nov 09, 2018 at 12:33:23AM +0100, Ard Biesheuvel wrote:
>
> This should be
> 
> reqsize += max(crypto_skcipher_reqsize(_tfm->base);
>crypto_skcipher_reqsize(cryptd_skcipher_child(cryptd_tfm)));
> 
> since the cryptd path in simd still needs some space in the subreq for
> the completion.

OK this is what I applied to the cryptodev tree, please double-check
to see if I did anything silly:

diff --git a/crypto/simd.c b/crypto/simd.c
index ea7240be3001..78e8d037ae2b 100644
--- a/crypto/simd.c
+++ b/crypto/simd.c
@@ -124,8 +124,9 @@ static int simd_skcipher_init(struct crypto_skcipher *tfm)
 
ctx->cryptd_tfm = cryptd_tfm;
 
-   reqsize = sizeof(struct skcipher_request);
-   reqsize += crypto_skcipher_reqsize(_tfm->base);
+   reqsize = crypto_skcipher_reqsize(cryptd_skcipher_child(cryptd_tfm));
+   reqsize = max(reqsize, crypto_skcipher_reqsize(_tfm->base));
+   reqsize += sizeof(struct skcipher_request);
 
crypto_skcipher_set_reqsize(tfm, reqsize);
 
Thanks,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: .S_shipped unnecessary?

2018-11-08 Thread Masahiro Yamada
On Fri, Nov 9, 2018 at 8:42 AM Ard Biesheuvel  wrote:
>
> (+ Masahiro, kbuild ml)
>
> On 8 November 2018 at 21:37, Jason A. Donenfeld  wrote:
> > Hi Ard, Eric, and others,
> >
> > As promised, the next Zinc patchset will have less generated code! After a
> > bit of work with Andy and Samuel, I'll be bundling the perlasm.
> >
>
> Wonderful! Any problems doing that for x86_64 ?
>
> > One thing I'm wondering about, though, is the wisdom behind the current
> > .S_shipped pattern. Usually the _shipped is for big firmware blobs that are
> > hard (or impossible) to build independently. But in this case, the .S is
> > generated from the .pl significantly faster than gcc even compiles a basic
> > C file. And, since perl is needed to build the kernel anyway, it's not like
> > it will be impossible to find the right tools. Rather than clutter up 
> > commits
> > with the .pl _and_ the .S_shipped, what would you think if I just generated
> > the .S each time as an ordinary build artifact. AFAICT, this is fairly 
> > usual,
> > and it's hard to see downsides. Hence, why I'm writing this email: are there
> > any downsides to that?
> >
>
> I agree 100%. When I added this the first time, it was at the request
> of the ARM maintainer, who was reluctant to rely on Perl for some
> reason.
>
> Recently, we have had to add a kludge to prevent spurious rebuilds of
> the .S_shipped files as well.
>
> I'd be perfectly happy to get rid of this entirely, and always
> generate the .S from the .pl, which to me is kind of the point of
> carrying these files in the first place.
>
> Masahiro: do you see any problems with this?


No problem.


Documentation/process/changes.rst says the following:

You will need perl 5 and the following modules: ``Getopt::Long``,
``Getopt::Std``, ``File::Basename``, and ``File::Find`` to build the kernel.



We can assume perl is installed on the user's build machine.



--
Best Regards
Masahiro Yamada


Re: [PATCH] crypto/simd: correctly take reqsize of wrapped skcipher into account

2018-11-08 Thread Qian Cai



> On Nov 8, 2018, at 6:33 PM, Ard Biesheuvel  wrote:
> 
> On 8 November 2018 at 23:55, Ard Biesheuvel  wrote:
>> The simd wrapper's skcipher request context structure consists
>> of a single subrequest whose size is taken from the subordinate
>> skcipher. However, in simd_skcipher_init(), the reqsize that is
>> retrieved is not from the subordinate skcipher but from the
>> cryptd request structure, whose size is completely unrelated to
>> the actual wrapped skcipher.
>> 
>> Reported-by: Qian Cai 
>> Signed-off-by: Ard Biesheuvel 
>> ---
>> crypto/simd.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/crypto/simd.c b/crypto/simd.c
>> index ea7240be3001..2f3d6e897afc 100644
>> --- a/crypto/simd.c
>> +++ b/crypto/simd.c
>> @@ -125,7 +125,7 @@ static int simd_skcipher_init(struct crypto_skcipher 
>> *tfm)
>>ctx->cryptd_tfm = cryptd_tfm;
>> 
>>reqsize = sizeof(struct skcipher_request);
>> -   reqsize += crypto_skcipher_reqsize(_tfm->base);
>> +   reqsize += 
>> crypto_skcipher_reqsize(cryptd_skcipher_child(cryptd_tfm));
>> 
> 
> This should be
> 
> reqsize += max(crypto_skcipher_reqsize(_tfm->base);
>   crypto_skcipher_reqsize(cryptd_skcipher_child(cryptd_tfm)));
> 
> since the cryptd path in simd still needs some space in the subreq for
> the completion.
Tested-by: Qian Cai 


Re: .S_shipped unnecessary?

2018-11-08 Thread Jason A. Donenfeld
Hey Ard,

On Fri, Nov 9, 2018 at 12:42 AM Ard Biesheuvel
 wrote:
> Wonderful! Any problems doing that for x86_64 ?

The x86_64 is still a WIP, but hopefully we'll succeed.

> I agree 100%. When I added this the first time, it was at the request
> of the ARM maintainer, who was reluctant to rely on Perl for some
> reason.
>
> Recently, we have had to add a kludge to prevent spurious rebuilds of
> the .S_shipped files as well.
>
> I'd be perfectly happy to get rid of this entirely, and always
> generate the .S from the .pl, which to me is kind of the point of
> carrying these files in the first place.

Terrific. I'll move ahead in that direction then. It makes things _so_
much cleaner, and doesn't introduce new build modes ("should the
generated _ship go into the build directory or the source directory?
what kind of artifact is it? how to address $(srcdir) vs $(src) in
that context? bla bla") that really over complicate things.

Jason


Re: .S_shipped unnecessary?

2018-11-08 Thread Ard Biesheuvel
(+ Masahiro, kbuild ml)

On 8 November 2018 at 21:37, Jason A. Donenfeld  wrote:
> Hi Ard, Eric, and others,
>
> As promised, the next Zinc patchset will have less generated code! After a
> bit of work with Andy and Samuel, I'll be bundling the perlasm.
>

Wonderful! Any problems doing that for x86_64 ?

> One thing I'm wondering about, though, is the wisdom behind the current
> .S_shipped pattern. Usually the _shipped is for big firmware blobs that are
> hard (or impossible) to build independently. But in this case, the .S is
> generated from the .pl significantly faster than gcc even compiles a basic
> C file. And, since perl is needed to build the kernel anyway, it's not like
> it will be impossible to find the right tools. Rather than clutter up commits
> with the .pl _and_ the .S_shipped, what would you think if I just generated
> the .S each time as an ordinary build artifact. AFAICT, this is fairly usual,
> and it's hard to see downsides. Hence, why I'm writing this email: are there
> any downsides to that?
>

I agree 100%. When I added this the first time, it was at the request
of the ARM maintainer, who was reluctant to rely on Perl for some
reason.

Recently, we have had to add a kludge to prevent spurious rebuilds of
the .S_shipped files as well.

I'd be perfectly happy to get rid of this entirely, and always
generate the .S from the .pl, which to me is kind of the point of
carrying these files in the first place.

Masahiro: do you see any problems with this?


Re: [PATCH] crypto/simd: correctly take reqsize of wrapped skcipher into account

2018-11-08 Thread Ard Biesheuvel
On 8 November 2018 at 23:55, Ard Biesheuvel  wrote:
> The simd wrapper's skcipher request context structure consists
> of a single subrequest whose size is taken from the subordinate
> skcipher. However, in simd_skcipher_init(), the reqsize that is
> retrieved is not from the subordinate skcipher but from the
> cryptd request structure, whose size is completely unrelated to
> the actual wrapped skcipher.
>
> Reported-by: Qian Cai 
> Signed-off-by: Ard Biesheuvel 
> ---
>  crypto/simd.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/crypto/simd.c b/crypto/simd.c
> index ea7240be3001..2f3d6e897afc 100644
> --- a/crypto/simd.c
> +++ b/crypto/simd.c
> @@ -125,7 +125,7 @@ static int simd_skcipher_init(struct crypto_skcipher *tfm)
> ctx->cryptd_tfm = cryptd_tfm;
>
> reqsize = sizeof(struct skcipher_request);
> -   reqsize += crypto_skcipher_reqsize(_tfm->base);
> +   reqsize += crypto_skcipher_reqsize(cryptd_skcipher_child(cryptd_tfm));
>

This should be

reqsize += max(crypto_skcipher_reqsize(_tfm->base);
   crypto_skcipher_reqsize(cryptd_skcipher_child(cryptd_tfm)));

since the cryptd path in simd still needs some space in the subreq for
the completion.


[PATCH] crypto/simd: correctly take reqsize of wrapped skcipher into account

2018-11-08 Thread Ard Biesheuvel
The simd wrapper's skcipher request context structure consists
of a single subrequest whose size is taken from the subordinate
skcipher. However, in simd_skcipher_init(), the reqsize that is
retrieved is not from the subordinate skcipher but from the
cryptd request structure, whose size is completely unrelated to
the actual wrapped skcipher.

Reported-by: Qian Cai 
Signed-off-by: Ard Biesheuvel 
---
 crypto/simd.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/crypto/simd.c b/crypto/simd.c
index ea7240be3001..2f3d6e897afc 100644
--- a/crypto/simd.c
+++ b/crypto/simd.c
@@ -125,7 +125,7 @@ static int simd_skcipher_init(struct crypto_skcipher *tfm)
ctx->cryptd_tfm = cryptd_tfm;
 
reqsize = sizeof(struct skcipher_request);
-   reqsize += crypto_skcipher_reqsize(_tfm->base);
+   reqsize += crypto_skcipher_reqsize(cryptd_skcipher_child(cryptd_tfm));
 
crypto_skcipher_set_reqsize(tfm, reqsize);
 
-- 
2.19.1



.S_shipped unnecessary?

2018-11-08 Thread Jason A. Donenfeld
Hi Ard, Eric, and others,

As promised, the next Zinc patchset will have less generated code! After a
bit of work with Andy and Samuel, I'll be bundling the perlasm.

One thing I'm wondering about, though, is the wisdom behind the current
.S_shipped pattern. Usually the _shipped is for big firmware blobs that are
hard (or impossible) to build independently. But in this case, the .S is
generated from the .pl significantly faster than gcc even compiles a basic
C file. And, since perl is needed to build the kernel anyway, it's not like
it will be impossible to find the right tools. Rather than clutter up commits
with the .pl _and_ the .S_shipped, what would you think if I just generated
the .S each time as an ordinary build artifact. AFAICT, this is fairly usual,
and it's hard to see downsides. Hence, why I'm writing this email: are there
any downsides to that?

Thanks,
Jason


[PATCH 5/5] crypto: caam/qi2 - add support for Chacha20 + Poly1305

2018-11-08 Thread Horia Geantă
Add support for Chacha20 + Poly1305 combined AEAD:
-generic (rfc7539)
-IPsec (rfc7634 - known as rfc7539esp in the kernel)

Signed-off-by: Horia Geantă 
---
 drivers/crypto/caam/caamalg.c  |   4 +-
 drivers/crypto/caam/caamalg_desc.c |  24 ++-
 drivers/crypto/caam/caamalg_desc.h |   3 +-
 drivers/crypto/caam/caamalg_qi2.c  | 129 -
 4 files changed, 154 insertions(+), 6 deletions(-)

diff --git a/drivers/crypto/caam/caamalg.c b/drivers/crypto/caam/caamalg.c
index cbaeb264a261..523565ce0060 100644
--- a/drivers/crypto/caam/caamalg.c
+++ b/drivers/crypto/caam/caamalg.c
@@ -527,13 +527,13 @@ static int chachapoly_set_sh_desc(struct crypto_aead 
*aead)
 
desc = ctx->sh_desc_enc;
cnstr_shdsc_chachapoly(desc, >cdata, >adata, ivsize,
-  ctx->authsize, true);
+  ctx->authsize, true, false);
dma_sync_single_for_device(jrdev, ctx->sh_desc_enc_dma,
   desc_bytes(desc), ctx->dir);
 
desc = ctx->sh_desc_dec;
cnstr_shdsc_chachapoly(desc, >cdata, >adata, ivsize,
-  ctx->authsize, false);
+  ctx->authsize, false, false);
dma_sync_single_for_device(jrdev, ctx->sh_desc_dec_dma,
   desc_bytes(desc), ctx->dir);
 
diff --git a/drivers/crypto/caam/caamalg_desc.c 
b/drivers/crypto/caam/caamalg_desc.c
index 0eb2add7e4e2..7db1640d3577 100644
--- a/drivers/crypto/caam/caamalg_desc.c
+++ b/drivers/crypto/caam/caamalg_desc.c
@@ -1227,10 +1227,12 @@ EXPORT_SYMBOL(cnstr_shdsc_rfc4543_decap);
  * @ivsize: initialization vector size
  * @icvsize: integrity check value (ICV) size (truncated or full)
  * @encap: true if encapsulation, false if decapsulation
+ * @is_qi: true when called from caam/qi
  */
 void cnstr_shdsc_chachapoly(u32 * const desc, struct alginfo *cdata,
struct alginfo *adata, unsigned int ivsize,
-   unsigned int icvsize, const bool encap)
+   unsigned int icvsize, const bool encap,
+   const bool is_qi)
 {
u32 *key_jump_cmd, *wait_cmd;
u32 nfifo;
@@ -1267,6 +1269,26 @@ void cnstr_shdsc_chachapoly(u32 * const desc, struct 
alginfo *cdata,
 OP_ALG_DECRYPT);
}
 
+   if (is_qi) {
+   u32 *wait_load_cmd;
+   u32 ctx1_iv_off = is_ipsec ? 8 : 4;
+
+   /* REG3 = assoclen */
+   append_seq_load(desc, 4, LDST_CLASS_DECO |
+   LDST_SRCDST_WORD_DECO_MATH3 |
+   4 << LDST_OFFSET_SHIFT);
+
+   wait_load_cmd = append_jump(desc, JUMP_JSL | JUMP_TEST_ALL |
+   JUMP_COND_CALM | JUMP_COND_NCP |
+   JUMP_COND_NOP | JUMP_COND_NIP |
+   JUMP_COND_NIFP);
+   set_jump_tgt_here(desc, wait_load_cmd);
+
+   append_seq_load(desc, ivsize, LDST_CLASS_1_CCB |
+   LDST_SRCDST_BYTE_CONTEXT |
+   ctx1_iv_off << LDST_OFFSET_SHIFT);
+   }
+
/*
 * MAGIC with NFIFO
 * Read associated data from the input and send them to class1 and
diff --git a/drivers/crypto/caam/caamalg_desc.h 
b/drivers/crypto/caam/caamalg_desc.h
index a1a7b0e6889d..d5ca42ff961a 100644
--- a/drivers/crypto/caam/caamalg_desc.h
+++ b/drivers/crypto/caam/caamalg_desc.h
@@ -98,7 +98,8 @@ void cnstr_shdsc_rfc4543_decap(u32 * const desc, struct 
alginfo *cdata,
 
 void cnstr_shdsc_chachapoly(u32 * const desc, struct alginfo *cdata,
struct alginfo *adata, unsigned int ivsize,
-   unsigned int icvsize, const bool encap);
+   unsigned int icvsize, const bool encap,
+   const bool is_qi);
 
 void cnstr_shdsc_skcipher_encap(u32 * const desc, struct alginfo *cdata,
unsigned int ivsize, const bool is_rfc3686,
diff --git a/drivers/crypto/caam/caamalg_qi2.c 
b/drivers/crypto/caam/caamalg_qi2.c
index a9e264bb9629..2598640aa98b 100644
--- a/drivers/crypto/caam/caamalg_qi2.c
+++ b/drivers/crypto/caam/caamalg_qi2.c
@@ -462,7 +462,15 @@ static struct aead_edesc *aead_edesc_alloc(struct 
aead_request *req,
edesc->dst_nents = dst_nents;
edesc->iv_dma = iv_dma;
 
-   edesc->assoclen = cpu_to_caam32(req->assoclen);
+   if ((alg->caam.class1_alg_type & OP_ALG_ALGSEL_MASK) ==
+   OP_ALG_ALGSEL_CHACHA20 && ivsize != CHACHAPOLY_IV_SIZE)
+   /*
+* The associated data comes already with the IV but we need
+* to skip it when we authenticate or encrypt...
+*/
+   edesc->assoclen = cpu_to_caam32(req->assoclen - ivsize);
+   else

[PATCH 4/5] crypto: caam/jr - add support for Chacha20 + Poly1305

2018-11-08 Thread Horia Geantă
Add support for Chacha20 + Poly1305 combined AEAD:
-generic (rfc7539)
-IPsec (rfc7634 - known as rfc7539esp in the kernel)

Signed-off-by: Cristian Stoica 
Signed-off-by: Horia Geantă 
---
 drivers/crypto/caam/caamalg.c  | 221 -
 drivers/crypto/caam/caamalg_desc.c | 111 +++
 drivers/crypto/caam/caamalg_desc.h |   4 +
 drivers/crypto/caam/compat.h   |   1 +
 drivers/crypto/caam/desc.h |  15 +++
 drivers/crypto/caam/desc_constr.h  |   7 +-
 6 files changed, 354 insertions(+), 5 deletions(-)

diff --git a/drivers/crypto/caam/caamalg.c b/drivers/crypto/caam/caamalg.c
index 9f1414030bc2..cbaeb264a261 100644
--- a/drivers/crypto/caam/caamalg.c
+++ b/drivers/crypto/caam/caamalg.c
@@ -72,6 +72,8 @@
 #define AUTHENC_DESC_JOB_IO_LEN(AEAD_DESC_JOB_IO_LEN + \
 CAAM_CMD_SZ * 5)
 
+#define CHACHAPOLY_DESC_JOB_IO_LEN (AEAD_DESC_JOB_IO_LEN + CAAM_CMD_SZ * 6)
+
 #define DESC_MAX_USED_BYTES(CAAM_DESC_BYTES_MAX - DESC_JOB_IO_LEN)
 #define DESC_MAX_USED_LEN  (DESC_MAX_USED_BYTES / CAAM_CMD_SZ)
 
@@ -513,6 +515,61 @@ static int rfc4543_setauthsize(struct crypto_aead *authenc,
return 0;
 }
 
+static int chachapoly_set_sh_desc(struct crypto_aead *aead)
+{
+   struct caam_ctx *ctx = crypto_aead_ctx(aead);
+   struct device *jrdev = ctx->jrdev;
+   unsigned int ivsize = crypto_aead_ivsize(aead);
+   u32 *desc;
+
+   if (!ctx->cdata.keylen || !ctx->authsize)
+   return 0;
+
+   desc = ctx->sh_desc_enc;
+   cnstr_shdsc_chachapoly(desc, >cdata, >adata, ivsize,
+  ctx->authsize, true);
+   dma_sync_single_for_device(jrdev, ctx->sh_desc_enc_dma,
+  desc_bytes(desc), ctx->dir);
+
+   desc = ctx->sh_desc_dec;
+   cnstr_shdsc_chachapoly(desc, >cdata, >adata, ivsize,
+  ctx->authsize, false);
+   dma_sync_single_for_device(jrdev, ctx->sh_desc_dec_dma,
+  desc_bytes(desc), ctx->dir);
+
+   return 0;
+}
+
+static int chachapoly_setauthsize(struct crypto_aead *aead,
+ unsigned int authsize)
+{
+   struct caam_ctx *ctx = crypto_aead_ctx(aead);
+
+   if (authsize != POLY1305_DIGEST_SIZE)
+   return -EINVAL;
+
+   ctx->authsize = authsize;
+   return chachapoly_set_sh_desc(aead);
+}
+
+static int chachapoly_setkey(struct crypto_aead *aead, const u8 *key,
+unsigned int keylen)
+{
+   struct caam_ctx *ctx = crypto_aead_ctx(aead);
+   unsigned int ivsize = crypto_aead_ivsize(aead);
+   unsigned int saltlen = CHACHAPOLY_IV_SIZE - ivsize;
+
+   if (keylen != CHACHA20_KEY_SIZE + saltlen) {
+   crypto_aead_set_flags(aead, CRYPTO_TFM_RES_BAD_KEY_LEN);
+   return -EINVAL;
+   }
+
+   ctx->cdata.key_virt = key;
+   ctx->cdata.keylen = keylen - saltlen;
+
+   return chachapoly_set_sh_desc(aead);
+}
+
 static int aead_setkey(struct crypto_aead *aead,
   const u8 *key, unsigned int keylen)
 {
@@ -1031,6 +1088,40 @@ static void init_gcm_job(struct aead_request *req,
/* End of blank commands */
 }
 
+static void init_chachapoly_job(struct aead_request *req,
+   struct aead_edesc *edesc, bool all_contig,
+   bool encrypt)
+{
+   struct crypto_aead *aead = crypto_aead_reqtfm(req);
+   unsigned int ivsize = crypto_aead_ivsize(aead);
+   unsigned int assoclen = req->assoclen;
+   u32 *desc = edesc->hw_desc;
+   u32 ctx_iv_off = 4;
+
+   init_aead_job(req, edesc, all_contig, encrypt);
+
+   if (ivsize != CHACHAPOLY_IV_SIZE) {
+   /* IPsec specific: CONTEXT1[223:128] = {NONCE, IV} */
+   ctx_iv_off += 4;
+
+   /*
+* The associated data comes already with the IV but we need
+* to skip it when we authenticate or encrypt...
+*/
+   assoclen -= ivsize;
+   }
+
+   append_math_add_imm_u32(desc, REG3, ZERO, IMM, assoclen);
+
+   /*
+* For IPsec load the IV further in the same register.
+* For RFC7539 simply load the 12 bytes nonce in a single operation
+*/
+   append_load_as_imm(desc, req->iv, ivsize, LDST_CLASS_1_CCB |
+  LDST_SRCDST_BYTE_CONTEXT |
+  ctx_iv_off << LDST_OFFSET_SHIFT);
+}
+
 static void init_authenc_job(struct aead_request *req,
 struct aead_edesc *edesc,
 bool all_contig, bool encrypt)
@@ -1289,6 +1380,72 @@ static int gcm_encrypt(struct aead_request *req)
return ret;
 }
 
+static int chachapoly_encrypt(struct aead_request *req)
+{
+   struct aead_edesc *edesc;
+   struct crypto_aead *aead = 

[PATCH 0/5] crypto: caam - add support for Era 10

2018-11-08 Thread Horia Geantă
This patch set adds support for CAAM Era 10, currently used in LX2160A SoC:
-new register mapping: some registers/fields are deprecated and moved
to different locations, mainly version registers
-algorithms
chacha20 (over DPSECI - Data Path SEC Interface on fsl-mc bus)
rfc7539(chacha20,poly1305) (over both DPSECI and Job Ring Interface)
rfc7539esp(chacha20,poly1305) (over both DPSECI and Job Ring Interface)

Note: the patch set is generated on top of cryptodev-2.6, however testing
was performed based on linux-next (tag: next-20181108) - which includes
LX2160A platform support + manually updating LX2160A dts with:
-fsl-mc bus DT node
-missing dma-ranges property in soc DT node

Cristian Stoica (1):
  crypto: export CHACHAPOLY_IV_SIZE

Horia Geantă (4):
  crypto: caam - add register map changes cf. Era 10
  crypto: caam/qi2 - add support for ChaCha20
  crypto: caam/jr - add support for Chacha20 + Poly1305
  crypto: caam/qi2 - add support for Chacha20 + Poly1305

 crypto/chacha20poly1305.c  |   2 -
 drivers/crypto/caam/caamalg.c  | 266 ++---
 drivers/crypto/caam/caamalg_desc.c | 139 ++-
 drivers/crypto/caam/caamalg_desc.h |   5 +
 drivers/crypto/caam/caamalg_qi.c   |  37 --
 drivers/crypto/caam/caamalg_qi2.c  | 156 +-
 drivers/crypto/caam/caamhash.c |  20 ++-
 drivers/crypto/caam/caampkc.c  |  10 +-
 drivers/crypto/caam/caamrng.c  |  10 +-
 drivers/crypto/caam/compat.h   |   2 +
 drivers/crypto/caam/ctrl.c |  28 +++-
 drivers/crypto/caam/desc.h |  28 
 drivers/crypto/caam/desc_constr.h  |   7 +-
 drivers/crypto/caam/regs.h |  74 +--
 include/crypto/chacha20.h  |   1 +
 15 files changed, 724 insertions(+), 61 deletions(-)

-- 
2.16.2



[PATCH 1/5] crypto: caam - add register map changes cf. Era 10

2018-11-08 Thread Horia Geantă
Era 10 changes the register map.

The updates that affect the drivers:
-new version registers are added
-DBG_DBG[deco_state] field is moved to a new register -
DBG_EXEC[19:16] @ 8_0E3Ch.

Signed-off-by: Horia Geantă 
---
 drivers/crypto/caam/caamalg.c| 47 +
 drivers/crypto/caam/caamalg_qi.c | 37 +++-
 drivers/crypto/caam/caamhash.c   | 20 ---
 drivers/crypto/caam/caampkc.c| 10 --
 drivers/crypto/caam/caamrng.c| 10 +-
 drivers/crypto/caam/ctrl.c   | 28 +++
 drivers/crypto/caam/desc.h   |  7 
 drivers/crypto/caam/regs.h   | 74 ++--
 8 files changed, 184 insertions(+), 49 deletions(-)

diff --git a/drivers/crypto/caam/caamalg.c b/drivers/crypto/caam/caamalg.c
index 869f092432de..9f1414030bc2 100644
--- a/drivers/crypto/caam/caamalg.c
+++ b/drivers/crypto/caam/caamalg.c
@@ -3135,7 +3135,7 @@ static int __init caam_algapi_init(void)
struct device *ctrldev;
struct caam_drv_private *priv;
int i = 0, err = 0;
-   u32 cha_vid, cha_inst, des_inst, aes_inst, md_inst;
+   u32 aes_vid, aes_inst, des_inst, md_vid, md_inst;
unsigned int md_limit = SHA512_DIGEST_SIZE;
bool registered = false;
 
@@ -3168,14 +3168,34 @@ static int __init caam_algapi_init(void)
 * Register crypto algorithms the device supports.
 * First, detect presence and attributes of DES, AES, and MD blocks.
 */
-   cha_vid = rd_reg32(>ctrl->perfmon.cha_id_ls);
-   cha_inst = rd_reg32(>ctrl->perfmon.cha_num_ls);
-   des_inst = (cha_inst & CHA_ID_LS_DES_MASK) >> CHA_ID_LS_DES_SHIFT;
-   aes_inst = (cha_inst & CHA_ID_LS_AES_MASK) >> CHA_ID_LS_AES_SHIFT;
-   md_inst = (cha_inst & CHA_ID_LS_MD_MASK) >> CHA_ID_LS_MD_SHIFT;
+   if (priv->era < 10) {
+   u32 cha_vid, cha_inst;
+
+   cha_vid = rd_reg32(>ctrl->perfmon.cha_id_ls);
+   aes_vid = cha_vid & CHA_ID_LS_AES_MASK;
+   md_vid = (cha_vid & CHA_ID_LS_MD_MASK) >> CHA_ID_LS_MD_SHIFT;
+
+   cha_inst = rd_reg32(>ctrl->perfmon.cha_num_ls);
+   des_inst = (cha_inst & CHA_ID_LS_DES_MASK) >>
+  CHA_ID_LS_DES_SHIFT;
+   aes_inst = cha_inst & CHA_ID_LS_AES_MASK;
+   md_inst = (cha_inst & CHA_ID_LS_MD_MASK) >> CHA_ID_LS_MD_SHIFT;
+   } else {
+   u32 aesa, mdha;
+
+   aesa = rd_reg32(>ctrl->vreg.aesa);
+   mdha = rd_reg32(>ctrl->vreg.mdha);
+
+   aes_vid = (aesa & CHA_VER_VID_MASK) >> CHA_VER_VID_SHIFT;
+   md_vid = (mdha & CHA_VER_VID_MASK) >> CHA_VER_VID_SHIFT;
+
+   des_inst = rd_reg32(>ctrl->vreg.desa) & CHA_VER_NUM_MASK;
+   aes_inst = aesa & CHA_VER_NUM_MASK;
+   md_inst = mdha & CHA_VER_NUM_MASK;
+   }
 
/* If MD is present, limit digest size based on LP256 */
-   if (md_inst && ((cha_vid & CHA_ID_LS_MD_MASK) == CHA_ID_LS_MD_LP256))
+   if (md_inst && md_vid  == CHA_VER_VID_MD_LP256)
md_limit = SHA256_DIGEST_SIZE;
 
for (i = 0; i < ARRAY_SIZE(driver_algs); i++) {
@@ -3196,10 +3216,10 @@ static int __init caam_algapi_init(void)
 * Check support for AES modes not available
 * on LP devices.
 */
-   if ((cha_vid & CHA_ID_LS_AES_MASK) == CHA_ID_LS_AES_LP)
-   if ((t_alg->caam.class1_alg_type & OP_ALG_AAI_MASK) ==
-OP_ALG_AAI_XTS)
-   continue;
+   if (aes_vid == CHA_VER_VID_AES_LP &&
+   (t_alg->caam.class1_alg_type & OP_ALG_AAI_MASK) ==
+   OP_ALG_AAI_XTS)
+   continue;
 
caam_skcipher_alg_init(t_alg);
 
@@ -3236,9 +3256,8 @@ static int __init caam_algapi_init(void)
 * Check support for AES algorithms not available
 * on LP devices.
 */
-   if ((cha_vid & CHA_ID_LS_AES_MASK) == CHA_ID_LS_AES_LP)
-   if (alg_aai == OP_ALG_AAI_GCM)
-   continue;
+   if (aes_vid  == CHA_VER_VID_AES_LP && alg_aai == OP_ALG_AAI_GCM)
+   continue;
 
/*
 * Skip algorithms requiring message digests
diff --git a/drivers/crypto/caam/caamalg_qi.c b/drivers/crypto/caam/caamalg_qi.c
index 23c9fc4975f8..c0d55310aade 100644
--- a/drivers/crypto/caam/caamalg_qi.c
+++ b/drivers/crypto/caam/caamalg_qi.c
@@ -2462,7 +2462,7 @@ static int __init caam_qi_algapi_init(void)
struct device *ctrldev;
struct caam_drv_private *priv;
int i = 0, err = 0;
-   u32 cha_vid, cha_inst, des_inst, aes_inst, md_inst;
+   u32 aes_vid, aes_inst, des_inst, md_vid, md_inst;
unsigned int md_limit = SHA512_DIGEST_SIZE;
bool registered = false;
 
@@ -2497,14 

[PATCH 3/5] crypto: export CHACHAPOLY_IV_SIZE

2018-11-08 Thread Horia Geantă
From: Cristian Stoica 

Move CHACHAPOLY_IV_SIZE to header file, so it can be reused.

Signed-off-by: Cristian Stoica 
Signed-off-by: Horia Geantă 
---
 crypto/chacha20poly1305.c | 2 --
 include/crypto/chacha20.h | 1 +
 2 files changed, 1 insertion(+), 2 deletions(-)

diff --git a/crypto/chacha20poly1305.c b/crypto/chacha20poly1305.c
index 600afa99941f..f9dd5453046a 100644
--- a/crypto/chacha20poly1305.c
+++ b/crypto/chacha20poly1305.c
@@ -22,8 +22,6 @@
 
 #include "internal.h"
 
-#define CHACHAPOLY_IV_SIZE 12
-
 struct chachapoly_instance_ctx {
struct crypto_skcipher_spawn chacha;
struct crypto_ahash_spawn poly;
diff --git a/include/crypto/chacha20.h b/include/crypto/chacha20.h
index f76302d99e2b..2d3129442a52 100644
--- a/include/crypto/chacha20.h
+++ b/include/crypto/chacha20.h
@@ -13,6 +13,7 @@
 #define CHACHA20_IV_SIZE   16
 #define CHACHA20_KEY_SIZE  32
 #define CHACHA20_BLOCK_SIZE64
+#define CHACHAPOLY_IV_SIZE 12
 
 struct chacha20_ctx {
u32 key[8];
-- 
2.16.2



[PATCH 2/5] crypto: caam/qi2 - add support for ChaCha20

2018-11-08 Thread Horia Geantă
Add support for ChaCha20 skcipher algorithm.

Signed-off-by: Carmen Iorga 
Signed-off-by: Horia Geantă 
---
 drivers/crypto/caam/caamalg_desc.c |  6 --
 drivers/crypto/caam/caamalg_qi2.c  | 27 +--
 drivers/crypto/caam/compat.h   |  1 +
 drivers/crypto/caam/desc.h |  6 ++
 4 files changed, 36 insertions(+), 4 deletions(-)

diff --git a/drivers/crypto/caam/caamalg_desc.c 
b/drivers/crypto/caam/caamalg_desc.c
index 1a6f0da14106..d850590079a2 100644
--- a/drivers/crypto/caam/caamalg_desc.c
+++ b/drivers/crypto/caam/caamalg_desc.c
@@ -1228,7 +1228,8 @@ static inline void skcipher_append_src_dst(u32 *desc)
  * @desc: pointer to buffer used for descriptor construction
  * @cdata: pointer to block cipher transform definitions
  * Valid algorithm values - one of OP_ALG_ALGSEL_{AES, DES, 3DES} ANDed
- * with OP_ALG_AAI_CBC or OP_ALG_AAI_CTR_MOD128.
+ * with OP_ALG_AAI_CBC or OP_ALG_AAI_CTR_MOD128
+ *- OP_ALG_ALGSEL_CHACHA20
  * @ivsize: initialization vector size
  * @is_rfc3686: true when ctr(aes) is wrapped by rfc3686 template
  * @ctx1_iv_off: IV offset in CONTEXT1 register
@@ -1293,7 +1294,8 @@ EXPORT_SYMBOL(cnstr_shdsc_skcipher_encap);
  * @desc: pointer to buffer used for descriptor construction
  * @cdata: pointer to block cipher transform definitions
  * Valid algorithm values - one of OP_ALG_ALGSEL_{AES, DES, 3DES} ANDed
- * with OP_ALG_AAI_CBC or OP_ALG_AAI_CTR_MOD128.
+ * with OP_ALG_AAI_CBC or OP_ALG_AAI_CTR_MOD128
+ *- OP_ALG_ALGSEL_CHACHA20
  * @ivsize: initialization vector size
  * @is_rfc3686: true when ctr(aes) is wrapped by rfc3686 template
  * @ctx1_iv_off: IV offset in CONTEXT1 register
diff --git a/drivers/crypto/caam/caamalg_qi2.c 
b/drivers/crypto/caam/caamalg_qi2.c
index 7d8ac0222fa3..a9e264bb9629 100644
--- a/drivers/crypto/caam/caamalg_qi2.c
+++ b/drivers/crypto/caam/caamalg_qi2.c
@@ -816,7 +816,9 @@ static int skcipher_setkey(struct crypto_skcipher 
*skcipher, const u8 *key,
u32 *desc;
u32 ctx1_iv_off = 0;
const bool ctr_mode = ((ctx->cdata.algtype & OP_ALG_AAI_MASK) ==
-  OP_ALG_AAI_CTR_MOD128);
+  OP_ALG_AAI_CTR_MOD128) &&
+  ((ctx->cdata.algtype & OP_ALG_ALGSEL_MASK) !=
+  OP_ALG_ALGSEL_CHACHA20);
const bool is_rfc3686 = alg->caam.rfc3686;
 
print_hex_dump_debug("key in @" __stringify(__LINE__)": ",
@@ -1494,7 +1496,23 @@ static struct caam_skcipher_alg driver_algs[] = {
.ivsize = AES_BLOCK_SIZE,
},
.caam.class1_alg_type = OP_ALG_ALGSEL_AES | OP_ALG_AAI_XTS,
-   }
+   },
+   {
+   .skcipher = {
+   .base = {
+   .cra_name = "chacha20",
+   .cra_driver_name = "chacha20-caam-qi2",
+   .cra_blocksize = 1,
+   },
+   .setkey = skcipher_setkey,
+   .encrypt = skcipher_encrypt,
+   .decrypt = skcipher_decrypt,
+   .min_keysize = CHACHA20_KEY_SIZE,
+   .max_keysize = CHACHA20_KEY_SIZE,
+   .ivsize = CHACHA20_IV_SIZE,
+   },
+   .caam.class1_alg_type = OP_ALG_ALGSEL_CHACHA20,
+   },
 };
 
 static struct caam_aead_alg driver_aeads[] = {
@@ -4908,6 +4926,11 @@ static int dpaa2_caam_probe(struct fsl_mc_device 
*dpseci_dev)
alg_sel == OP_ALG_ALGSEL_AES)
continue;
 
+   /* Skip CHACHA20 algorithms if not supported by device */
+   if (alg_sel == OP_ALG_ALGSEL_CHACHA20 &&
+   !priv->sec_attr.ccha_acc_num)
+   continue;
+
t_alg->caam.dev = dev;
caam_skcipher_alg_init(t_alg);
 
diff --git a/drivers/crypto/caam/compat.h b/drivers/crypto/caam/compat.h
index 9604ff7a335e..a5081b4050b6 100644
--- a/drivers/crypto/caam/compat.h
+++ b/drivers/crypto/caam/compat.h
@@ -36,6 +36,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
diff --git a/drivers/crypto/caam/desc.h b/drivers/crypto/caam/desc.h
index ec1ef06049b4..9d117e51629f 100644
--- a/drivers/crypto/caam/desc.h
+++ b/drivers/crypto/caam/desc.h
@@ -1159,6 +1159,7 @@
 #define OP_ALG_ALGSEL_KASUMI   (0x70 << OP_ALG_ALGSEL_SHIFT)
 #define OP_ALG_ALGSEL_CRC  (0x90 << OP_ALG_ALGSEL_SHIFT)
 #define OP_ALG_ALGSEL_SNOW_F9  (0xA0 << OP_ALG_ALGSEL_SHIFT)
+#define OP_ALG_ALGSEL_CHACHA20 (0xD0 << OP_ALG_ALGSEL_SHIFT)
 
 #define OP_ALG_AAI_SHIFT   4
 #define OP_ALG_AAI_MASK(0x1ff << OP_ALG_AAI_SHIFT)
@@ -1206,6 +1207,11 @@
 #define OP_ALG_AAI_RNG4_AI (0x80 << OP_ALG_AAI_SHIFT)
 #define OP_ALG_AAI_RNG4_SK (0x100 << 

Re: [RFC PATCH 1/4] kconfig: add as-instr macro to scripts/Kconfig.include

2018-11-07 Thread Vladimir Murzin
On 07/11/18 14:55, Will Deacon wrote:
> On Wed, Nov 07, 2018 at 09:40:05AM +, Vladimir Murzin wrote:
>> There are cases where the whole feature, for instance arm64/lse or
>> arm/crypto, can depend on assembler. Current practice is to report
>> buildtime that selected feature is not supported, which can be quite
>> annoying...
> 
> Why is it annoying? You still end up with a working kernel.

.config doesn't really represent if option was built or not, annoying
part is digging build logs (if anyone's saved them at all!) or relevant
parts of dmesg (if option throws anything in there and which not always
part of reports).

> 
>> It'd nicer if we can check assembler first and opt-in feature
>> visibility in Kconfig.
>>
>> Cc: Masahiro Yamada 
>> Cc: Will Deacon 
>> Cc: Marc Zyngier 
>> Cc: Ard Biesheuvel 
>> Signed-off-by: Vladimir Murzin 
>> ---
>>  scripts/Kconfig.include | 4 
>>  1 file changed, 4 insertions(+)
> 
> One issue I have with doing the check like this is that if somebody sends
> you a .config with e.g. ARM64_LSE_ATOMICS=y and you try to build a kernel
> using that .config and an old toolchain, the option is silently dropped.

I see... at least we have some tools like ./scripts/diffconfig

> 
> I think the diagnostic is actually useful in this case.

Fully agree on diagnostic side, any suggestions how it can be improved?

Cheers
Vladimir

> 
> Will
> 



Re: [RFC PATCH 1/4] kconfig: add as-instr macro to scripts/Kconfig.include

2018-11-07 Thread Will Deacon
On Wed, Nov 07, 2018 at 09:40:05AM +, Vladimir Murzin wrote:
> There are cases where the whole feature, for instance arm64/lse or
> arm/crypto, can depend on assembler. Current practice is to report
> buildtime that selected feature is not supported, which can be quite
> annoying...

Why is it annoying? You still end up with a working kernel.

> It'd nicer if we can check assembler first and opt-in feature
> visibility in Kconfig.
> 
> Cc: Masahiro Yamada 
> Cc: Will Deacon 
> Cc: Marc Zyngier 
> Cc: Ard Biesheuvel 
> Signed-off-by: Vladimir Murzin 
> ---
>  scripts/Kconfig.include | 4 
>  1 file changed, 4 insertions(+)

One issue I have with doing the check like this is that if somebody sends
you a .config with e.g. ARM64_LSE_ATOMICS=y and you try to build a kernel
using that .config and an old toolchain, the option is silently dropped.

I think the diagnostic is actually useful in this case.

Will


[RFC PATCH 2/4] arm64: lse: expose dependency on gas via Kconfig

2018-11-07 Thread Vladimir Murzin
So we can simply hide LSE support if dependency is not satisfied.

Cc: Will Deacon 
Signed-off-by: Vladimir Murzin 
---
 arch/arm64/Kconfig  |  1 +
 arch/arm64/Makefile | 13 ++---
 arch/arm64/include/asm/atomic.h |  2 +-
 arch/arm64/include/asm/lse.h|  6 +++---
 arch/arm64/kernel/cpufeature.c  |  4 ++--
 5 files changed, 9 insertions(+), 17 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 964f682..7978aee 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -1072,6 +1072,7 @@ config ARM64_PAN
 config ARM64_LSE_ATOMICS
bool "Atomic instructions"
default y
+   depends on $(as-instr,.arch_extension lse)
help
  As part of the Large System Extensions, ARMv8.1 introduces new
  atomic instructions that are designed specifically to scale in
diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile
index b4e994c..3054757 100644
--- a/arch/arm64/Makefile
+++ b/arch/arm64/Makefile
@@ -32,15 +32,6 @@ endif
 
 KBUILD_DEFCONFIG := defconfig
 
-# Check for binutils support for specific extensions
-lseinstr := $(call as-instr,.arch_extension lse,-DCONFIG_AS_LSE=1)
-
-ifeq ($(CONFIG_ARM64_LSE_ATOMICS), y)
-  ifeq ($(lseinstr),)
-$(warning LSE atomics not supported by binutils)
-  endif
-endif
-
 ifeq ($(CONFIG_ARM64), y)
 brokengasinst := $(call as-instr,1:\n.inst 0\n.rept . - 
1b\n\nnop\n.endr\n,,-DCONFIG_BROKEN_GAS_INST=1)
 
@@ -49,9 +40,9 @@ $(warning Detected assembler with broken .inst; disassembly 
will be unreliable)
   endif
 endif
 
-KBUILD_CFLAGS  += -mgeneral-regs-only $(lseinstr) $(brokengasinst)
+KBUILD_CFLAGS  += -mgeneral-regs-only $(brokengasinst)
 KBUILD_CFLAGS  += -fno-asynchronous-unwind-tables
-KBUILD_AFLAGS  += $(lseinstr) $(brokengasinst)
+KBUILD_AFLAGS  += $(brokengasinst)
 
 KBUILD_CFLAGS  += $(call cc-option,-mabi=lp64)
 KBUILD_AFLAGS  += $(call cc-option,-mabi=lp64)
diff --git a/arch/arm64/include/asm/atomic.h b/arch/arm64/include/asm/atomic.h
index 9bca54d..9d8d029 100644
--- a/arch/arm64/include/asm/atomic.h
+++ b/arch/arm64/include/asm/atomic.h
@@ -30,7 +30,7 @@
 
 #define __ARM64_IN_ATOMIC_IMPL
 
-#if defined(CONFIG_ARM64_LSE_ATOMICS) && defined(CONFIG_AS_LSE)
+#ifdef CONFIG_ARM64_LSE_ATOMICS
 #include 
 #else
 #include 
diff --git a/arch/arm64/include/asm/lse.h b/arch/arm64/include/asm/lse.h
index 8262325..1fd31c7 100644
--- a/arch/arm64/include/asm/lse.h
+++ b/arch/arm64/include/asm/lse.h
@@ -2,7 +2,7 @@
 #ifndef __ASM_LSE_H
 #define __ASM_LSE_H
 
-#if defined(CONFIG_AS_LSE) && defined(CONFIG_ARM64_LSE_ATOMICS)
+#ifdef CONFIG_ARM64_LSE_ATOMICS
 
 #include 
 #include 
@@ -36,7 +36,7 @@
ALTERNATIVE(llsc, lse, ARM64_HAS_LSE_ATOMICS)
 
 #endif /* __ASSEMBLER__ */
-#else  /* CONFIG_AS_LSE && CONFIG_ARM64_LSE_ATOMICS */
+#else  /* CONFIG_ARM64_LSE_ATOMICS */
 
 #ifdef __ASSEMBLER__
 
@@ -53,5 +53,5 @@
 #define ARM64_LSE_ATOMIC_INSN(llsc, lse)   llsc
 
 #endif /* __ASSEMBLER__ */
-#endif /* CONFIG_AS_LSE && CONFIG_ARM64_LSE_ATOMICS */
+#endif /* CONFIG_ARM64_LSE_ATOMICS */
 #endif /* __ASM_LSE_H */
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 74e9dcb..46f1bac 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -1170,7 +1170,7 @@ static void cpu_clear_disr(const struct 
arm64_cpu_capabilities *__unused)
.cpu_enable = cpu_enable_pan,
},
 #endif /* CONFIG_ARM64_PAN */
-#if defined(CONFIG_AS_LSE) && defined(CONFIG_ARM64_LSE_ATOMICS)
+#ifdef CONFIG_ARM64_LSE_ATOMICS
{
.desc = "LSE atomic instructions",
.capability = ARM64_HAS_LSE_ATOMICS,
@@ -1181,7 +1181,7 @@ static void cpu_clear_disr(const struct 
arm64_cpu_capabilities *__unused)
.sign = FTR_UNSIGNED,
.min_field_value = 2,
},
-#endif /* CONFIG_AS_LSE && CONFIG_ARM64_LSE_ATOMICS */
+#endif /* CONFIG_ARM64_LSE_ATOMICS */
{
.desc = "Software prefetching using PRFM",
.capability = ARM64_HAS_NO_HW_PREFETCH,
-- 
1.9.1



[RFC PATCH 1/4] kconfig: add as-instr macro to scripts/Kconfig.include

2018-11-07 Thread Vladimir Murzin
There are cases where the whole feature, for instance arm64/lse or
arm/crypto, can depend on assembler. Current practice is to report
buildtime that selected feature is not supported, which can be quite
annoying...

It'd nicer if we can check assembler first and opt-in feature
visibility in Kconfig.

Cc: Masahiro Yamada 
Cc: Will Deacon 
Cc: Marc Zyngier 
Cc: Ard Biesheuvel 
Signed-off-by: Vladimir Murzin 
---
 scripts/Kconfig.include | 4 
 1 file changed, 4 insertions(+)

diff --git a/scripts/Kconfig.include b/scripts/Kconfig.include
index dad5583..07c145c 100644
--- a/scripts/Kconfig.include
+++ b/scripts/Kconfig.include
@@ -22,6 +22,10 @@ success = $(if-success,$(1),y,n)
 # Return y if the compiler supports , n otherwise
 cc-option = $(success,$(CC) -Werror $(1) -E -x c /dev/null -o /dev/null)
 
+# $(as-instr,)
+# Return y if the assembler supports , n otherwise
+as-instr = $(success,printf "%b\n" "$(1)" | $(CC) -Werror -c -x assembler -o 
/dev/null -)
+
 # $(ld-option,)
 # Return y if the linker supports , n otherwise
 ld-option = $(success,$(LD) -v $(1))
-- 
1.9.1



[RFC PATCH 3/4] arm64: turn "broken gas inst" into real config option

2018-11-07 Thread Vladimir Murzin
So it is available everywhere and there is no need to keep
CONFIG_ARM64 workaround ;)

Cc: Marc Zyngier 
Signed-off-by: Vladimir Murzin 
---
 arch/arm64/Kconfig  | 3 +++
 arch/arm64/Makefile | 9 ++---
 2 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 7978aee..86fc357 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -287,6 +287,9 @@ config ARCH_SUPPORTS_UPROBES
 config ARCH_PROC_KCORE_TEXT
def_bool y
 
+config BROKEN_GAS_INST
+   def_bool y if !$(as-instr,1:\n.inst 0\n.rept . - 1b\n\nnop\n.endr\n)
+
 source "arch/arm64/Kconfig.platforms"
 
 menu "Bus support"
diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile
index 3054757..9860d3a 100644
--- a/arch/arm64/Makefile
+++ b/arch/arm64/Makefile
@@ -32,17 +32,12 @@ endif
 
 KBUILD_DEFCONFIG := defconfig
 
-ifeq ($(CONFIG_ARM64), y)
-brokengasinst := $(call as-instr,1:\n.inst 0\n.rept . - 
1b\n\nnop\n.endr\n,,-DCONFIG_BROKEN_GAS_INST=1)
-
-  ifneq ($(brokengasinst),)
+ifeq ($(CONFIG_BROKEN_GAS_INST),y)
 $(warning Detected assembler with broken .inst; disassembly will be unreliable)
-  endif
 endif
 
-KBUILD_CFLAGS  += -mgeneral-regs-only $(brokengasinst)
+KBUILD_CFLAGS  += -mgeneral-regs-only
 KBUILD_CFLAGS  += -fno-asynchronous-unwind-tables
-KBUILD_AFLAGS  += $(brokengasinst)
 
 KBUILD_CFLAGS  += $(call cc-option,-mabi=lp64)
 KBUILD_AFLAGS  += $(call cc-option,-mabi=lp64)
-- 
1.9.1



[RFC PATCH 0/4] Minor improvements over handling dependency on GAS

2018-11-07 Thread Vladimir Murzin
With recent changes in Kconfig processing it is now possible to expose
dependency on specific tools and supported options via Kconfig rather
than bury it deep in Makefile.

This small series try to address the case where the whole feature, for
instance arm64/lse or arm/crypto, depends on GAS.

Vladimir Murzin (4):
  kconfig: add as-instr macro to scripts/Kconfig.include
  arm64: lse: expose dependency on gas via Kconfig
  arm64: turn "broken gas inst" into real config option
  ARM: crypto: expose dependency on gas via Kconfig

 arch/arm/crypto/Kconfig | 31 +--
 arch/arm/crypto/Makefile| 31 ++-
 arch/arm64/Kconfig  |  4 
 arch/arm64/Makefile | 18 ++
 arch/arm64/include/asm/atomic.h |  2 +-
 arch/arm64/include/asm/lse.h|  6 +++---
 arch/arm64/kernel/cpufeature.c  |  4 ++--
 scripts/Kconfig.include |  4 
 8 files changed, 43 insertions(+), 57 deletions(-)

-- 
1.9.1



[RFC PATCH 4/4] ARM: crypto: expose dependency on gas via Kconfig

2018-11-07 Thread Vladimir Murzin
So we can advertise only those entries which dependency is satisfied.

Cc: Ard Biesheuvel 
Signed-off-by: Vladimir Murzin 
---
 arch/arm/crypto/Kconfig  | 31 +--
 arch/arm/crypto/Makefile | 31 ++-
 2 files changed, 27 insertions(+), 35 deletions(-)

diff --git a/arch/arm/crypto/Kconfig b/arch/arm/crypto/Kconfig
index ef0c7fe..f437a91f 100644
--- a/arch/arm/crypto/Kconfig
+++ b/arch/arm/crypto/Kconfig
@@ -9,6 +9,12 @@ menuconfig ARM_CRYPTO
 
 if ARM_CRYPTO
 
+config ARM_AS_HAS_CE
+   def_bool $(as-instr,.fpu crypto-neon-fp-armv8)
+
+config ARM_AS_HAS_CRC
+   def_bool $(as-instr,.arch armv8-a\n.arch_extension crc)
+
 config CRYPTO_SHA1_ARM
tristate "SHA1 digest algorithm (ARM-asm)"
select CRYPTO_SHA1
@@ -30,21 +36,21 @@ config CRYPTO_SHA1_ARM_NEON
 
 config CRYPTO_SHA1_ARM_CE
tristate "SHA1 digest algorithm (ARM v8 Crypto Extensions)"
-   depends on KERNEL_MODE_NEON
+   depends on KERNEL_MODE_NEON && ARM_AS_HAS_CE
select CRYPTO_SHA1_ARM
select CRYPTO_HASH
help
  SHA-1 secure hash standard (FIPS 180-1/DFIPS 180-2) implemented
- using special ARMv8 Crypto Extensions.
+ using special ARMv8 Crypto Extensions (need binutils 2.23 or higher).
 
 config CRYPTO_SHA2_ARM_CE
tristate "SHA-224/256 digest algorithm (ARM v8 Crypto Extensions)"
-   depends on KERNEL_MODE_NEON
+   depends on KERNEL_MODE_NEON && ARM_AS_HAS_CE
select CRYPTO_SHA256_ARM
select CRYPTO_HASH
help
  SHA-256 secure hash standard (DFIPS 180-2) implemented
- using special ARMv8 Crypto Extensions.
+ using special ARMv8 Crypto Extensions (need binutils 2.23 or higher).
 
 config CRYPTO_SHA256_ARM
tristate "SHA-224/256 digest algorithm (ARM-asm and NEON)"
@@ -87,16 +93,16 @@ config CRYPTO_AES_ARM_BS
 
 config CRYPTO_AES_ARM_CE
tristate "Accelerated AES using ARMv8 Crypto Extensions"
-   depends on KERNEL_MODE_NEON
+   depends on KERNEL_MODE_NEON && ARM_AS_HAS_CE
select CRYPTO_BLKCIPHER
select CRYPTO_SIMD
help
  Use an implementation of AES in CBC, CTR and XTS modes that uses
- ARMv8 Crypto Extensions
+ ARMv8 Crypto Extensions (need binutils 2.23 or higher)
 
 config CRYPTO_GHASH_ARM_CE
tristate "PMULL-accelerated GHASH using NEON/ARMv8 Crypto Extensions"
-   depends on KERNEL_MODE_NEON
+   depends on KERNEL_MODE_NEON && ARM_AS_HAS_CE
select CRYPTO_HASH
select CRYPTO_CRYPTD
select CRYPTO_GF128MUL
@@ -104,17 +110,22 @@ config CRYPTO_GHASH_ARM_CE
  Use an implementation of GHASH (used by the GCM AEAD chaining mode)
  that uses the 64x64 to 128 bit polynomial multiplication (vmull.p64)
  that is part of the ARMv8 Crypto Extensions, or a slower variant that
- uses the vmull.p8 instruction that is part of the basic NEON ISA.
+ uses the vmull.p8 instruction that is part of the basic NEON ISA (need
+ binutils 2.23 or higher).
 
 config CRYPTO_CRCT10DIF_ARM_CE
tristate "CRCT10DIF digest algorithm using PMULL instructions"
-   depends on KERNEL_MODE_NEON && CRC_T10DIF
+   depends on KERNEL_MODE_NEON && CRC_T10DIF  && ARM_AS_HAS_CE
select CRYPTO_HASH
+   help
+ Need binutils 2.23 or higher
 
 config CRYPTO_CRC32_ARM_CE
tristate "CRC32(C) digest algorithm using CRC and/or PMULL instructions"
-   depends on KERNEL_MODE_NEON && CRC32
+   depends on KERNEL_MODE_NEON && CRC32 && ARM_AS_HAS_CRC
select CRYPTO_HASH
+   help
+ Need binutils 2.23 or higher
 
 config CRYPTO_CHACHA20_NEON
tristate "NEON accelerated ChaCha20 symmetric cipher"
diff --git a/arch/arm/crypto/Makefile b/arch/arm/crypto/Makefile
index bd5bcee..e897327 100644
--- a/arch/arm/crypto/Makefile
+++ b/arch/arm/crypto/Makefile
@@ -11,32 +11,13 @@ obj-$(CONFIG_CRYPTO_SHA256_ARM) += sha256-arm.o
 obj-$(CONFIG_CRYPTO_SHA512_ARM) += sha512-arm.o
 obj-$(CONFIG_CRYPTO_CHACHA20_NEON) += chacha20-neon.o
 
-ce-obj-$(CONFIG_CRYPTO_AES_ARM_CE) += aes-arm-ce.o
-ce-obj-$(CONFIG_CRYPTO_SHA1_ARM_CE) += sha1-arm-ce.o
-ce-obj-$(CONFIG_CRYPTO_SHA2_ARM_CE) += sha2-arm-ce.o
-ce-obj-$(CONFIG_CRYPTO_GHASH_ARM_CE) += ghash-arm-ce.o
-ce-obj-$(CONFIG_CRYPTO_CRCT10DIF_ARM_CE) += crct10dif-arm-ce.o
-crc-obj-$(CONFIG_CRYPTO_CRC32_ARM_CE) += crc32-arm-ce.o
+obj-$(CONFIG_CRYPTO_AES_ARM_CE) += aes-arm-ce.o
+obj-$(CONFIG_CRYPTO_SHA1_ARM_CE) += sha1-arm-ce.o
+obj-$(CONFIG_CRYPTO_SHA2_ARM_CE) += sha2-arm-ce.o
+obj-$(CONFIG_CRYPTO_GHASH_ARM_CE) += ghash-arm-ce.o
+obj-$(CONFIG_CRYPTO_CRCT10DIF_ARM_CE) += crct10dif-arm-ce.o
 
-ifneq ($(crc-obj-y)$(crc-obj-m),)
-ifeq ($(call as-instr,.arch armv8-a\n.arch_extension crc,y,n),y)
-ce-obj-y += $(crc-obj-y)
-ce-obj-m += $(crc-obj-m)
-else
-$(warning These CRC Extensions modules need binutils 2.23 or higher)
-$(warning $(crc-obj-y) $(crc-obj-m))
-endif
-endif
-

Re: [PATCH 1/2] crypto: fix cfb mode decryption

2018-11-01 Thread Dmitry Eremin-Solenikov
чт, 1 нояб. 2018 г. в 11:41, Herbert Xu :
>
> On Thu, Nov 01, 2018 at 11:32:37AM +0300, Dmitry Eremin-Solenikov wrote:
> >
> > Since 4.20 pull went into Linus'es tree, any change of getting these two 
> > patches
> > in crypto tree?
>
> These aren't critical enough for the current mainline so they will
> go in at the next merge window.

Thank you.


-- 
With best wishes
Dmitry


Re: [PATCH 1/2] crypto: fix cfb mode decryption

2018-11-01 Thread Herbert Xu
On Thu, Nov 01, 2018 at 11:32:37AM +0300, Dmitry Eremin-Solenikov wrote:
>
> Since 4.20 pull went into Linus'es tree, any change of getting these two 
> patches
> in crypto tree?

These aren't critical enough for the current mainline so they will
go in at the next merge window.

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: [PATCH 1/2] crypto: fix cfb mode decryption

2018-11-01 Thread Dmitry Eremin-Solenikov
Hello,

вс, 21 окт. 2018 г. в 11:07, James Bottomley
:
>
> On Sun, 2018-10-21 at 09:05 +0200, Ard Biesheuvel wrote:
> > (+ James)
>
> Thanks!
>
> > On 20 October 2018 at 01:01, Dmitry Eremin-Solenikov
> >  wrote:
> > > crypto_cfb_decrypt_segment() incorrectly XOR'ed generated keystream
> > > with
> > > IV, rather than with data stream, resulting in incorrect
> > > decryption.
> > > Test vectors will be added in the next patch.
> > >
> > > Signed-off-by: Dmitry Eremin-Solenikov 
> > > Cc: sta...@vger.kernel.org
> > > ---
> > >  crypto/cfb.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/crypto/cfb.c b/crypto/cfb.c
> > > index a0d68c09e1b9..fd4e8500e121 100644
> > > --- a/crypto/cfb.c
> > > +++ b/crypto/cfb.c
> > > @@ -144,7 +144,7 @@ static int crypto_cfb_decrypt_segment(struct
> > > skcipher_walk *walk,
> > >
> > > do {
> > > crypto_cfb_encrypt_one(tfm, iv, dst);
> > > -   crypto_xor(dst, iv, bsize);
> > > +   crypto_xor(dst, src, bsize);
>
> This does look right.  I think the reason the TPM code works is that it
> always does encrypt/decrypt in-place, which is a separate piece of the
> code which appears to be correct.

Since 4.20 pull went into Linus'es tree, any change of getting these two patches
in crypto tree?

-- 
With best wishes
Dmitry


Hello

2018-10-28 Thread Mr Fawaz Al Saleh



Please can you contact me for a transaction.


Re: [PATCH v4 2/7] tpm2-sessions: Add full HMAC and encrypt/decrypt session handling

2018-10-25 Thread Jarkko Sakkinen

On Wed, 24 Oct 2018, James Bottomley wrote:

+static void KDFa(u8 *key, int keylen, const char *label, u8 *u,
+u8 *v, int bytes, u8 *out)


Should this be in lower case? I would rename it as tpm_kdfa().


This one is defined as KDFa() in the standards and it's not TPM
specific (although some standards refer to it as KDFA).  I'm not averse
to making them tpm_kdfe() and tpm_kdfa() but I was hoping that one day
the crypto subsystem would need them and we could move them in there
because KDFs are the new shiny in crypto primitives (TLS 1.2 started
using them, for instance).


I care more about tracing and debugging than naming and having 'tpm_' in
front of every TPM function makes tracing a lean process. AFAIK using
upper case letters is against kernel coding conventions. I'm not sure
why this would make an exception on that.


Why doesn't it matter here?


Because, as the comment says, it eventually gets overwritten by running
ecdh to derive the two co-ordinates.  (pointers to these two
uninitialized areas are passed into the ecdh destination sg list).


Oh, I just misunderstood the comment. Wouldn't it be easier to say that
the data is initialized later?


+   buf_len = crypto_ecdh_key_len();
+   if (sizeof(encoded_key) < buf_len) {
+   dev_err(>dev, "salt buffer too small needs
%d\n",
+   buf_len);
+   goto out;
+   }


In what situation this can happen? Can sizeof(encoded_key) >=
buf_len?


Yes, but only if someone is trying to crack your ecdh.  One of the
security issues in ecdh is if someone makes a very specific point
choice (usually in the cofactor space) that has a very short period,
the attacker can guess the input to KDFe.  In this case if TPM genie
provided a specially crafted attack EC point, we'd detect it here
because the resulting buffer would be too short.


Right. Thank you for the explanation. Here some kind of comment might
not be a bad idea...


In general this function should have a clear explanation what it does
and maybe less these one character variables but instead variables
with more documenting names. Explain in high level what algorithms
are used and how the salt is calculated.


I'll try, but this is a rather complex function.


Understood. I do not expect perfection here and we can improve
documetation later on.

For anyone wanting to review James' patches and w/o much experience on
EC, I recommend reading this article:

https://arstechnica.com/information-technology/2013/10/a-relatively-easy-to-understand-primer-on-elliptic-curve-cryptography/

I read it few years ago and refreshed my memory few days ago by
re-reading it.




+
+/**
+ * tpm_buf_append_hmac_session() append a TPM session element
+ * @buf: The buffer to be appended
+ * @auth: the auth structure allocated by
tpm2_start_auth_session()
+ * @attributes: The session attributes
+ * @passphrase: The session authority (NULL if none)
+ * @passphraselen: The length of the session authority (0 if none)


The alignment.


the alignment of what?


We generally have parameter descriptions tab-aligned.


Why there would be trailing zeros?


Because TPM 1.2 mandated zero padded fixed size passphrases so the TPM
2.0 standard specifies a way of converting these to variable size
strings by eliminating the zero padding.


Ok.


James


I'm also looking forward for the CONTEXT_GAP patch based on the
yesterdays discussion. We do want it and I was stupid not to take it
couple years ago :-) Thanks.

/Jarkko


Re: [PATCH v4 0/7] add integrity and security to TPM2 transactions

2018-10-25 Thread Jarkko Sakkinen

On Wed, 24 Oct 2018, James Bottomley wrote:

On Wed, 2018-10-24 at 02:51 +0300, Jarkko Sakkinen wrote:

I would consider sending first a patch set that would iterate the
existing session stuff to be ready for this i.e. merge in two
iterations (emphasis on the word "consider"). We can probably merge
the groundwork quite fast.


I realise we're going to have merge conflicts on the later ones, so why
don't we do this: I'll still send as one series, but you apply the ones
you think are precursors and I'll rebase and resend the rest?

James


Works for me and now I think after yesterdays dicussions etc. that this
should be merged as one series.

/Jarkko


Re: [PATCH v4 2/7] tpm2-sessions: Add full HMAC and encrypt/decrypt session handling

2018-10-24 Thread James Bottomley
On Wed, 2018-10-24 at 02:48 +0300, Jarkko Sakkinen wrote:
> On Mon, 22 Oct 2018, James Bottomley wrote:
> > [...]

I'll tidy up the descriptions.

> These all sould be combined with the existing session stuff inside
> tpm2-cmd.c and not have duplicate infrastructures. The file name
> should be tpm2-session.c (we neither have tpm2-cmds.c).

You mean move tpm2_buf_append_auth() into the new sessions file as well
... sure, I can do that.

[...]
> > +
> > +/*
> > + * assume hash sha256 and nonces u, v of size SHA256_DIGEST_SIZE
> > but
> > + * otherwise standard KDFa.  Note output is in bytes not bits.
> > + */
> > +static void KDFa(u8 *key, int keylen, const char *label, u8 *u,
> > +u8 *v, int bytes, u8 *out)
> 
> Should this be in lower case? I would rename it as tpm_kdfa().

This one is defined as KDFa() in the standards and it's not TPM
specific (although some standards refer to it as KDFA).  I'm not averse
to making them tpm_kdfe() and tpm_kdfa() but I was hoping that one day
the crypto subsystem would need them and we could move them in there
because KDFs are the new shiny in crypto primitives (TLS 1.2 started
using them, for instance).

> > +{
> > +   u32 counter;
> > +   const __be32 bits = cpu_to_be32(bytes * 8);
> > +
> > +   for (counter = 1; bytes > 0; bytes -= SHA256_DIGEST_SIZE,
> > counter++,
> > +out += SHA256_DIGEST_SIZE) {
> 
> Only one counter is actually used for anything so this is overly
> complicated and IMHO it is ok to call the counter just 'i'. Maybe
> just:
> 
> for (i = 1; (bytes - (i - 1) * SHA256_DIGEST_SIZE) > 0; i++) {
> 
> > +   SHASH_DESC_ON_STACK(desc, sha256_hash);
> > +   __be32 c = cpu_to_be32(counter);
> > +
> > +   hmac_init(desc, key, keylen);
> > +   crypto_shash_update(desc, (u8 *), sizeof(c));
> > +   crypto_shash_update(desc, label, strlen(label)+1);
> > +   crypto_shash_update(desc, u, SHA256_DIGEST_SIZE);
> > +   crypto_shash_update(desc, v, SHA256_DIGEST_SIZE);
> > +   crypto_shash_update(desc, (u8 *),
> > sizeof(bits));
> > +   hmac_final(desc, key, keylen, out);
> > +   }
> > +}
> > +
> > +/*
> > + * Somewhat of a bastardization of the real KDFe.  We're assuming
> > + * we're working with known point sizes for the input parameters
> > and
> > + * the hash algorithm is fixed at sha256.  Because we know that
> > the
> > + * point size is 32 bytes like the hash size, there's no need to
> > loop
> > + * in this KDF.
> > + */
> > +static void KDFe(u8 z[EC_PT_SZ], const char *str, u8 *pt_u, u8
> > *pt_v,
> > +u8 *keyout)
> > +{
> > +   SHASH_DESC_ON_STACK(desc, sha256_hash);
> > +   /*
> > +* this should be an iterative counter, but because we
> > know
> > +*  we're only taking 32 bytes for the point using a
> > sha256
> > +*  hash which is also 32 bytes, there's only one loop
> > +*/
> > +   __be32 c = cpu_to_be32(1);
> > +
> > +   desc->tfm = sha256_hash;
> > +   desc->flags = CRYPTO_TFM_REQ_MAY_SLEEP;
> > +
> > +   crypto_shash_init(desc);
> > +   /* counter (BE) */
> > +   crypto_shash_update(desc, (u8 *), sizeof(c));
> > +   /* secret value */
> > +   crypto_shash_update(desc, z, EC_PT_SZ);
> > +   /* string including trailing zero */
> > +   crypto_shash_update(desc, str, strlen(str)+1);
> > +   crypto_shash_update(desc, pt_u, EC_PT_SZ);
> > +   crypto_shash_update(desc, pt_v, EC_PT_SZ);
> > +   crypto_shash_final(desc, keyout);
> > +}
> > +
> > +static void tpm_buf_append_salt(struct tpm_buf *buf, struct
> > tpm_chip *chip,
> > +   struct tpm2_auth *auth)
> 
> Given the complexity of this function and some not that obvious
> choices in the implementation (coordinates), it would make sense to
> document this function.

I'll try to beef up the salting description

> > +{
> > +   struct crypto_kpp *kpp;
> > +   struct kpp_request *req;
> > +   struct scatterlist s[2], d[1];
> > +   struct ecdh p = {0};
> > +   u8 encoded_key[EC_PT_SZ], *x, *y;
> 
> Why you use one character variable name 'p' and longer name
> 'encoded_key'?
> 
> > +   unsigned int buf_len;
> > +   u8 *secret;
> > +
> > +   secret = kmalloc(EC_PT_SZ, GFP_KERNEL);
> > +   if (!secret)
> > +   return;
> > +
> > +   p.curve_id = ECC_CURVE_NIST_P256;
> 
> Could this be set already in the initialization?

I'm never sure about designated initializers, but I think, after
looking them up again, it will zero fill unmentioned elements.

> > +
> > +   /* secret is two sized points */
> > +   tpm_buf_append_u16(buf, (EC_PT_SZ + 2)*2);
> 
> White space missing. Should be "(EC_PT_SZ + 2) * 2". The comment is a
> bit obscure (maybe, do not have any specific suggestion how to make
> it less obscure).
> 
> > +   /*
> > +* we cheat here and append uninitialized data to form
> > +* the points.  All we care about is getting the two
> > +* co-ordinate pointers, which will be used to overwrite
> > +* the uninitialized data
> > +*/
> 
> 

Re: [PATCH v4 2/7] tpm2-sessions: Add full HMAC and encrypt/decrypt session handling

2018-10-24 Thread Jarkko Sakkinen

On Tue, 23 Oct 2018, Ard Biesheuvel wrote:

On 23 October 2018 at 04:01, James Bottomley
 wrote:

On Mon, 2018-10-22 at 19:19 -0300, Ard Biesheuvel wrote:
[...]

+static void hmac_init(struct shash_desc *desc, u8 *key, int
keylen)
+{
+   u8 pad[SHA256_BLOCK_SIZE];
+   int i;
+
+   desc->tfm = sha256_hash;
+   desc->flags = CRYPTO_TFM_REQ_MAY_SLEEP;


I don't think this actually does anything in the shash API
implementation, so you can drop this.


OK, I find crypto somewhat hard to follow.  There were bits I had to
understand, like when I wrote the CFB implementation or when I fixed
the ECDH scatterlist handling, but I've got to confess, in time
honoured tradition I simply copied this from EVM crypto without
actually digging into the code to understand why.



Yeah, it is notoriously hard to use, and we should try to improve that.


James,

I would hope (already said in my review) to use longer than one
character variable names for most of the stuff. I did not quite
understand why you decided to use 'counter' for obvious counter
variable and one character names for non-obvious stuff :-)

I'm not sure where the 'encoded' exactly comes in the variable
name 'encoded_key' especially in the context of these cryptic
names.

/Jarkko


Re: [PATCH v4 0/7] add integrity and security to TPM2 transactions

2018-10-24 Thread James Bottomley
On Wed, 2018-10-24 at 02:51 +0300, Jarkko Sakkinen wrote:
> I would consider sending first a patch set that would iterate the
> existing session stuff to be ready for this i.e. merge in two
> iterations (emphasis on the word "consider"). We can probably merge
> the groundwork quite fast.

I realise we're going to have merge conflicts on the later ones, so why
don't we do this: I'll still send as one series, but you apply the ones
you think are precursors and I'll rebase and resend the rest?

James



Re: [PATCH v4 5/7] trusted keys: Add session encryption protection to the seal/unseal path

2018-10-23 Thread Jarkko Sakkinen

The tag in the short description does not look at all. Should be either
"tpm:" or "keys, trusted:".

On Mon, 22 Oct 2018, James Bottomley wrote:

If some entity is snooping the TPM bus, the can see the data going in
to be sealed and the data coming out as it is unsealed.  Add parameter
and response encryption to these cases to ensure that no secrets are
leaked even if the bus is snooped.

As part of doing this conversion it was discovered that policy
sessions can't work with HMAC protected authority because of missing
pieces (the tpm Nonce).  I've added code to work the same way as
before, which will result in potential authority exposure (while still
adding security for the command and the returned blob), and a fixme to
redo the API to get rid of this security hole.

Signed-off-by: James Bottomley 
---
drivers/char/tpm/tpm2-cmd.c | 155 
1 file changed, 98 insertions(+), 57 deletions(-)

diff --git a/drivers/char/tpm/tpm2-cmd.c b/drivers/char/tpm/tpm2-cmd.c
index 22f1c7bee173..a8655cd535d1 100644
--- a/drivers/char/tpm/tpm2-cmd.c
+++ b/drivers/char/tpm/tpm2-cmd.c
@@ -425,7 +425,9 @@ int tpm2_seal_trusted(struct tpm_chip *chip,
{
unsigned int blob_len;
struct tpm_buf buf;
+   struct tpm_buf t2b;
u32 hash;
+   struct tpm2_auth *auth;
int i;
int rc;

@@ -439,45 +441,56 @@ int tpm2_seal_trusted(struct tpm_chip *chip,
if (i == ARRAY_SIZE(tpm2_hash_map))
return -EINVAL;

-   rc = tpm_buf_init(, TPM2_ST_SESSIONS, TPM2_CC_CREATE);
+   rc = tpm2_start_auth_session(chip, );
if (rc)
return rc;

-   tpm_buf_append_u32(, options->keyhandle);
-   tpm2_buf_append_auth(, TPM2_RS_PW,
-NULL /* nonce */, 0,
-0 /* session_attributes */,
-options->keyauth /* hmac */,
-TPM_DIGEST_SIZE);
+   rc = tpm_buf_init(, TPM2_ST_SESSIONS, TPM2_CC_CREATE);
+   if (rc) {
+   tpm2_end_auth_session(auth);
+   return rc;
+   }
+
+   rc = tpm_buf_init_2b();
+   if (rc) {
+   tpm_buf_destroy();
+   tpm2_end_auth_session(auth);
+   return rc;
+   }

+   tpm_buf_append_name(, auth, options->keyhandle, NULL);
+   tpm_buf_append_hmac_session(, auth, TPM2_SA_DECRYPT,
+   options->keyauth, TPM_DIGEST_SIZE);
/* sensitive */
-   tpm_buf_append_u16(, 4 + TPM_DIGEST_SIZE + payload->key_len + 1);
+   tpm_buf_append_u16(, TPM_DIGEST_SIZE);
+   tpm_buf_append(, options->blobauth, TPM_DIGEST_SIZE);
+   tpm_buf_append_u16(, payload->key_len + 1);
+   tpm_buf_append(, payload->key, payload->key_len);
+   tpm_buf_append_u8(, payload->migratable);

-   tpm_buf_append_u16(, TPM_DIGEST_SIZE);
-   tpm_buf_append(, options->blobauth, TPM_DIGEST_SIZE);
-   tpm_buf_append_u16(, payload->key_len + 1);
-   tpm_buf_append(, payload->key, payload->key_len);
-   tpm_buf_append_u8(, payload->migratable);
+   tpm_buf_append_2b(, );

/* public */
-   tpm_buf_append_u16(, 14 + options->policydigest_len);
-   tpm_buf_append_u16(, TPM2_ALG_KEYEDHASH);
-   tpm_buf_append_u16(, hash);
+   tpm_buf_append_u16(, TPM2_ALG_KEYEDHASH);
+   tpm_buf_append_u16(, hash);

/* policy */
if (options->policydigest_len) {
-   tpm_buf_append_u32(, 0);
-   tpm_buf_append_u16(, options->policydigest_len);
-   tpm_buf_append(, options->policydigest,
+   tpm_buf_append_u32(, 0);
+   tpm_buf_append_u16(, options->policydigest_len);
+   tpm_buf_append(, options->policydigest,
   options->policydigest_len);
} else {
-   tpm_buf_append_u32(, TPM2_OA_USER_WITH_AUTH);
-   tpm_buf_append_u16(, 0);
+   tpm_buf_append_u32(, TPM2_OA_USER_WITH_AUTH);
+   tpm_buf_append_u16(, 0);
}

/* public parameters */
-   tpm_buf_append_u16(, TPM2_ALG_NULL);
-   tpm_buf_append_u16(, 0);
+   tpm_buf_append_u16(, TPM2_ALG_NULL);
+   /* unique (zero) */
+   tpm_buf_append_u16(, 0);
+
+   tpm_buf_append_2b(, );

/* outside info */
tpm_buf_append_u16(, 0);
@@ -490,8 +503,11 @@ int tpm2_seal_trusted(struct tpm_chip *chip,
goto out;
}

-   rc = tpm_transmit_cmd(chip, NULL, buf.data, PAGE_SIZE, 4, 0,
- "sealing data");
+   tpm_buf_fill_hmac_session(, auth);
+
+   rc = tpm_transmit_cmd(chip, >kernel_space, buf.data,
+ PAGE_SIZE, 4, 0, "sealing data");
+   rc = tpm_buf_check_hmac_response(, auth, rc);
if (rc)
goto out;

@@ -509,6 +525,7 @@ int tpm2_seal_trusted(struct tpm_chip *chip,
payload->blob_len = blob_len;


Re: [PATCH v4 0/7] add integrity and security to TPM2 transactions

2018-10-23 Thread Jarkko Sakkinen

I would consider sending first a patch set that would iterate the existing
session stuff to be ready for this i.e. merge in two iterations 
(emphasis on the word "consider"). We can probably merge the groundwork

quite fast.

/Jarkko

On Mon, 22 Oct 2018, James Bottomley wrote:

By now, everybody knows we have a problem with the TPM2_RS_PW easy
button on TPM2 in that transactions on the TPM bus can be intercepted
and altered.  The way to fix this is to use real sessions for HMAC
capabilities to ensure integrity and to use parameter and response
encryption to ensure confidentiality of the data flowing over the TPM
bus.

This patch series is about adding a simple API which can ensure the
above properties as a layered addition to the existing TPM handling
code.  This series now includes protections for PCR extend, getting
random numbers from the TPM and data sealing and unsealing.  It
therefore eliminates all uses of TPM2_RS_PW in the kernel and adds
encryption protection to sensitive data flowing into and out of the
TPM.

In the third version I added data sealing and unsealing protection,
apart from one API based problem which means that the way trusted keys
were protected it's not currently possible to HMAC protect an authority
that comes with a policy, so the API will have to be extended to fix
that case

In this fourth version, I tidy up some of the code and add more
security features, the most notable is that we now calculate the NULL
seed name and compare our calculation to the value returned in
TPM2_ReadPublic, which means we now can't be spoofed.  This version
also gives a sysfs variable for the null seed which userspace can use
to run a key certification operation to prove that the TPM was always
secure when communicating with the kernel.

I've verified this using the test suite in the last patch on a VM
connected to a tpm2 emulator.  I also instrumented the emulator to make
sure the sensitive data was properly encrypted.

James

---


James Bottomley (7):
 tpm-buf: create new functions for handling TPM buffers
 tpm2-sessions: Add full HMAC and encrypt/decrypt session handling
 tpm2: add hmac checks to tpm2_pcr_extend()
 tpm2: add session encryption protection to tpm2_get_random()
 trusted keys: Add session encryption protection to the seal/unseal
   path
 tpm: add the null key name as a tpm2 sysfs variable
 tpm2-sessions: NOT FOR COMMITTING add sessions testing

drivers/char/tpm/Kconfig  |3 +
drivers/char/tpm/Makefile |3 +-
drivers/char/tpm/tpm-buf.c|  191 ++
drivers/char/tpm/tpm-chip.c   |1 +
drivers/char/tpm/tpm-sysfs.c  |   27 +-
drivers/char/tpm/tpm.h|  129 ++--
drivers/char/tpm/tpm2-cmd.c   |  248 ---
drivers/char/tpm/tpm2-sessions-test.c |  360 ++
drivers/char/tpm/tpm2-sessions.c  | 1188 +
drivers/char/tpm/tpm2-sessions.h  |   57 ++
10 files changed, 2027 insertions(+), 180 deletions(-)
create mode 100644 drivers/char/tpm/tpm-buf.c
create mode 100644 drivers/char/tpm/tpm2-sessions-test.c
create mode 100644 drivers/char/tpm/tpm2-sessions.c
create mode 100644 drivers/char/tpm/tpm2-sessions.h

--
2.16.4




  1   2   3   4   5   6   7   8   9   10   >