[PATCH 1/4] crypto: sha1_generic - add cra_priority

2018-06-29 Thread Eric Biggers
From: Eric Biggers 

sha1-generic had a cra_priority of 0, so it wasn't possible to have a
lower priority SHA-1 implementation, as is desired for sha1_mb which is
only useful under certain workloads and is otherwise extremely slow.
Change it to priority 100, which is the priority used for many of the
other generic algorithms.

Signed-off-by: Eric Biggers 
---
 crypto/sha1_generic.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/crypto/sha1_generic.c b/crypto/sha1_generic.c
index 6877cbb9105f..a3d701632ca2 100644
--- a/crypto/sha1_generic.c
+++ b/crypto/sha1_generic.c
@@ -76,6 +76,7 @@ static struct shash_alg alg = {
.base   =   {
.cra_name   =   "sha1",
.cra_driver_name=   "sha1-generic",
+   .cra_priority   =   100,
.cra_flags  =   CRYPTO_ALG_TYPE_SHASH,
.cra_blocksize  =   SHA1_BLOCK_SIZE,
.cra_module =   THIS_MODULE,
-- 
2.18.0.399.gad0ab374a1-goog



[PATCH 2/4] crypto: sha256_generic - add cra_priority

2018-06-29 Thread Eric Biggers
From: Eric Biggers 

sha256-generic and sha224-generic had a cra_priority of 0, so it wasn't
possible to have a lower priority SHA-256 or SHA-224 implementation, as
is desired for sha256_mb which is only useful under certain workloads
and is otherwise extremely slow.  Change them to priority 100, which is
the priority used for many of the other generic algorithms.

Signed-off-by: Eric Biggers 
---
 crypto/sha256_generic.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/crypto/sha256_generic.c b/crypto/sha256_generic.c
index 8f9c47e1a96e..dfcb7beb73a7 100644
--- a/crypto/sha256_generic.c
+++ b/crypto/sha256_generic.c
@@ -271,6 +271,7 @@ static struct shash_alg sha256_algs[2] = { {
.base   =   {
.cra_name   =   "sha256",
.cra_driver_name=   "sha256-generic",
+   .cra_priority   =   100,
.cra_flags  =   CRYPTO_ALG_TYPE_SHASH,
.cra_blocksize  =   SHA256_BLOCK_SIZE,
.cra_module =   THIS_MODULE,
@@ -285,6 +286,7 @@ static struct shash_alg sha256_algs[2] = { {
.base   =   {
.cra_name   =   "sha224",
.cra_driver_name=   "sha224-generic",
+   .cra_priority   =   100,
.cra_flags  =   CRYPTO_ALG_TYPE_SHASH,
.cra_blocksize  =   SHA224_BLOCK_SIZE,
.cra_module =   THIS_MODULE,
-- 
2.18.0.399.gad0ab374a1-goog



[PATCH 3/4] crypto: sha512_generic - add cra_priority

2018-06-29 Thread Eric Biggers
From: Eric Biggers 

sha512-generic and sha384-generic had a cra_priority of 0, so it wasn't
possible to have a lower priority SHA-512 or SHA-384 implementation, as
is desired for sha512_mb which is only useful under certain workloads
and is otherwise extremely slow.  Change them to priority 100, which is
the priority used for many of the other generic algorithms.

Signed-off-by: Eric Biggers 
---
 crypto/sha512_generic.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/crypto/sha512_generic.c b/crypto/sha512_generic.c
index eba965d18bfc..c92efac0f060 100644
--- a/crypto/sha512_generic.c
+++ b/crypto/sha512_generic.c
@@ -171,6 +171,7 @@ static struct shash_alg sha512_algs[2] = { {
.base   =   {
.cra_name   =   "sha512",
.cra_driver_name =  "sha512-generic",
+   .cra_priority   =   100,
.cra_flags  =   CRYPTO_ALG_TYPE_SHASH,
.cra_blocksize  =   SHA512_BLOCK_SIZE,
.cra_module =   THIS_MODULE,
@@ -185,6 +186,7 @@ static struct shash_alg sha512_algs[2] = { {
.base   =   {
.cra_name   =   "sha384",
.cra_driver_name =  "sha384-generic",
+   .cra_priority   =   100,
.cra_flags  =   CRYPTO_ALG_TYPE_SHASH,
.cra_blocksize  =   SHA384_BLOCK_SIZE,
.cra_module =   THIS_MODULE,
-- 
2.18.0.399.gad0ab374a1-goog



[PATCH 4/4] crypto: x86/sha-mb - decrease priority of multibuffer algorithms

2018-06-29 Thread Eric Biggers
From: Eric Biggers 

With all the crypto modules enabled on x86, and with a CPU that supports
AVX-2 but not SHA-NI instructions (e.g. Haswell, Broadwell, Skylake),
the "multibuffer" implementations of SHA-1, SHA-256, and SHA-512 are the
highest priority.  However, these implementations only perform well when
many hash requests are being submitted concurrently, filling all 8 AVX-2
lanes.  Otherwise, they are incredibly slow, as they waste time waiting
for more requests to arrive before proceeding to execute each request.

For example, here are the speeds I see hashing 4096-byte buffers with a
single thread on a Haswell-based processor:

genericavx2  mb (multibuffer)
---  
sha1602 MB/s   997 MB/s  0.61 MB/s
sha256  228 MB/s   412 MB/s  0.61 MB/s
sha512  312 MB/s   559 MB/s  0.61 MB/s

So, the multibuffer implementation is 500 to 1000 times slower than the
other implementations.  Note that with smaller buffers or more update()s
per digest, the difference would be even greater.

I believe the vast majority of people are in the boat where the
multibuffer code is much slower, and only a small minority are doing the
highly parallel, hashing-intensive, latency-flexible workloads (maybe
IPsec on servers?) where the multibuffer code may be beneficial.  Yet,
people often aren't familiar with all the crypto config options and so
the multibuffer code may inadvertently be built into the kernel.

Also the multibuffer code apparently hasn't been very well tested,
seeing as it was sometimes computing the wrong SHA-256 digest.

So, let's make the multibuffer algorithms low priority.  Users who want
to use them can either request them explicitly by driver name, or use
NETLINK_CRYPTO (crypto_user) to increase their priority at runtime.

Signed-off-by: Eric Biggers 
---
 arch/x86/crypto/sha1-mb/sha1_mb.c | 9 -
 arch/x86/crypto/sha256-mb/sha256_mb.c | 9 -
 arch/x86/crypto/sha512-mb/sha512_mb.c | 9 -
 3 files changed, 24 insertions(+), 3 deletions(-)

diff --git a/arch/x86/crypto/sha1-mb/sha1_mb.c 
b/arch/x86/crypto/sha1-mb/sha1_mb.c
index e17655ffde79..4b2430274935 100644
--- a/arch/x86/crypto/sha1-mb/sha1_mb.c
+++ b/arch/x86/crypto/sha1-mb/sha1_mb.c
@@ -871,7 +871,14 @@ static struct ahash_alg sha1_mb_async_alg = {
.base = {
.cra_name   = "sha1",
.cra_driver_name= "sha1_mb",
-   .cra_priority   = 200,
+   /*
+* Low priority, since with few concurrent hash requests
+* this is extremely slow due to the flush delay.  Users
+* whose workloads would benefit from this can request
+* it explicitly by driver name, or can increase its
+* priority at runtime using NETLINK_CRYPTO.
+*/
+   .cra_priority   = 50,
.cra_flags  = CRYPTO_ALG_TYPE_AHASH | 
CRYPTO_ALG_ASYNC,
.cra_blocksize  = SHA1_BLOCK_SIZE,
.cra_type   = _ahash_type,
diff --git a/arch/x86/crypto/sha256-mb/sha256_mb.c 
b/arch/x86/crypto/sha256-mb/sha256_mb.c
index 4c46ac1b6653..4c07f6c12c37 100644
--- a/arch/x86/crypto/sha256-mb/sha256_mb.c
+++ b/arch/x86/crypto/sha256-mb/sha256_mb.c
@@ -870,7 +870,14 @@ static struct ahash_alg sha256_mb_async_alg = {
.base = {
.cra_name   = "sha256",
.cra_driver_name= "sha256_mb",
-   .cra_priority   = 200,
+   /*
+* Low priority, since with few concurrent hash requests
+* this is extremely slow due to the flush delay.  Users
+* whose workloads would benefit from this can request
+* it explicitly by driver name, or can increase its
+* priority at runtime using NETLINK_CRYPTO.
+*/
+   .cra_priority   = 50,
.cra_flags  = CRYPTO_ALG_TYPE_AHASH |
CRYPTO_ALG_ASYNC,
.cra_blocksize  = SHA256_BLOCK_SIZE,
diff --git a/arch/x86/crypto/sha512-mb/sha512_mb.c 
b/arch/x86/crypto/sha512-mb/sha512_mb.c
index 39e2bbdc1836..6a8c31581604 100644
--- a/arch/x86/crypto/sha512-mb/sha512_mb.c
+++ b/arch/x86/crypto/sha512-mb/sha512_mb.c
@@ -904,7 +904,14 @@ static struct ahash_alg sha512_mb_async_alg = {
.base = {
.cra_name   = "sha512",
.cra_driver_name= "sha512_mb",
-   

[PATCH 0/4] crypto: decrease priority of multibuffer SHA algorithms

2018-06-29 Thread Eric Biggers
From: Eric Biggers 

I found that not only was sha256_mb sometimes computing the wrong digest
(fixed by a separately sent patch), but under normal workloads it's
hundreds of times slower than sha256-avx2, due to the flush delay.  The
same applies to sha1_mb and sha512_mb.  Yet, currently these can be the
highest priority implementations and therefore used by default.
Therefore, this series decreases their priority so that users have to
more explicitly opt-in to using them.

Note that I don't believe the status quo of just having them behind
kernel config options is sufficient, since people often aren't familiar
with all the crypto options and error on the side of enabling too many.
And it's especially unexpected that enabling an "optimized"
implementation would actually make things 1000 times slower.

Eric Biggers (4):
  crypto: sha1_generic - add cra_priority
  crypto: sha256_generic - add cra_priority
  crypto: sha512_generic - add cra_priority
  crypto: x86/sha-mb - decrease priority of multibuffer algorithms

 arch/x86/crypto/sha1-mb/sha1_mb.c | 9 -
 arch/x86/crypto/sha256-mb/sha256_mb.c | 9 -
 arch/x86/crypto/sha512-mb/sha512_mb.c | 9 -
 crypto/sha1_generic.c | 1 +
 crypto/sha256_generic.c   | 2 ++
 crypto/sha512_generic.c   | 2 ++
 6 files changed, 29 insertions(+), 3 deletions(-)

-- 
2.18.0.399.gad0ab374a1-goog



[PATCH] crypto: MAINTAINERS - fix file path for SHA multibuffer code

2018-06-29 Thread Eric Biggers
From: Eric Biggers 

"arch/x86/crypto/sha*-mb" needs a trailing slash, since it refers to
directories.  Otherwise get_maintainer.pl doesn't find the entry.

Signed-off-by: Eric Biggers 
---
 MAINTAINERS | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 6cfd16790add..2a39dcaa79a0 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -7345,7 +7345,7 @@ M:Megha Dey 
 R: Tim Chen 
 L: linux-crypto@vger.kernel.org
 S: Supported
-F: arch/x86/crypto/sha*-mb
+F: arch/x86/crypto/sha*-mb/
 F: crypto/mcryptd.c
 
 INTEL TELEMETRY DRIVER
-- 
2.18.0.399.gad0ab374a1-goog



[PATCH] crypto: x86/sha256-mb - fix digest copy in sha256_mb_mgr_get_comp_job_avx2()

2018-06-29 Thread Eric Biggers
From: Eric Biggers 

There is a copy-paste error where sha256_mb_mgr_get_comp_job_avx2()
copies the SHA-256 digest state from sha256_mb_mgr::args::digest to
job_sha256::result_digest.  Consequently, the sha256_mb algorithm
sometimes calculates the wrong digest.  Fix it.

Reproducer using AF_ALG:

#include 
#include 
#include 
#include 
#include 
#include 

static const __u8 expected[32] =
"\xad\x7f\xac\xb2\x58\x6f\xc6\xe9\x66\xc0\x04\xd7\xd1\xd1\x6b\x02"
"\x4f\x58\x05\xff\x7c\xb4\x7c\x7a\x85\xda\xbd\x8b\x48\x89\x2c\xa7";

int main()
{
int fd;
struct sockaddr_alg addr = {
.salg_type = "hash",
.salg_name = "sha256_mb",
};
__u8 data[4096] = { 0 };
__u8 digest[32];
int ret;
int i;

fd = socket(AF_ALG, SOCK_SEQPACKET, 0);
bind(fd, (void *), sizeof(addr));
fork();
fd = accept(fd, 0, 0);
do {
ret = write(fd, data, 4096);
assert(ret == 4096);
ret = read(fd, digest, 32);
assert(ret == 32);
} while (memcmp(digest, expected, 32) == 0);

printf("wrong digest: ");
for (i = 0; i < 32; i++)
printf("%02x", digest[i]);
printf("\n");
}

Output was:

wrong digest: 
ad7facb2ffef7cb47c7a85dabd8b48892ca7

Fixes: 172b1d6b5a93 ("crypto: sha256-mb - fix ctx pointer and digest copy")
Cc:  # v4.8+
Signed-off-by: Eric Biggers 
---
 arch/x86/crypto/sha256-mb/sha256_mb_mgr_flush_avx2.S | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/crypto/sha256-mb/sha256_mb_mgr_flush_avx2.S 
b/arch/x86/crypto/sha256-mb/sha256_mb_mgr_flush_avx2.S
index 16c4ccb1f154..d2364c55bbde 100644
--- a/arch/x86/crypto/sha256-mb/sha256_mb_mgr_flush_avx2.S
+++ b/arch/x86/crypto/sha256-mb/sha256_mb_mgr_flush_avx2.S
@@ -265,7 +265,7 @@ ENTRY(sha256_mb_mgr_get_comp_job_avx2)
vpinsrd $1, _args_digest+1*32(state, idx, 4), %xmm0, %xmm0
vpinsrd $2, _args_digest+2*32(state, idx, 4), %xmm0, %xmm0
vpinsrd $3, _args_digest+3*32(state, idx, 4), %xmm0, %xmm0
-   vmovd   _args_digest(state , idx, 4) , %xmm0
+   vmovd   _args_digest+4*32(state, idx, 4), %xmm1
vpinsrd $1, _args_digest+5*32(state, idx, 4), %xmm1, %xmm1
vpinsrd $2, _args_digest+6*32(state, idx, 4), %xmm1, %xmm1
vpinsrd $3, _args_digest+7*32(state, idx, 4), %xmm1, %xmm1
-- 
2.18.0.399.gad0ab374a1-goog



Re: [PATCH 3/5] crypto: testmgr - Improve compression/decompression test

2018-06-29 Thread Jan Glauber
Hi Eric,

sorry for my late response.

On Fri, Jun 22, 2018 at 08:12:26PM -0700, Eric Biggers wrote:
> Hi Jan,
> 
> On Fri, Jun 22, 2018 at 04:37:20PM +0200, Jan Glauber wrote:
> > While commit 336073840a87 ("crypto: testmgr - Allow different compression 
> > results")
> > allowed to test non-generic compression algorithms there are some corner
> > cases that would not be detected in test_comp().
> >
> > For example if input -> compression -> decompression would all yield
> > the same bytes the test would still pass.
> >
> > Improve the compression test by using the generic variant (if available)
> > to decompress the compressed test vector from the non-generic
> > algorithm.
> >
> > Suggested-by: Herbert Xu 
> > Signed-off-by: Jan Glauber 
> > ---
> >  crypto/testmgr.c | 23 ++-
> >  1 file changed, 22 insertions(+), 1 deletion(-)
> >
> > diff --git a/crypto/testmgr.c b/crypto/testmgr.c
> > index d1d99843cce4..cfb5fe4c5ccf 100644
> > --- a/crypto/testmgr.c
> > +++ b/crypto/testmgr.c
> > @@ -1346,6 +1346,7 @@ static int test_comp(struct crypto_comp *tfm,
> >int ctcount, int dtcount)
> >  {
> 
> Any particular reason for not updating test_acomp() too?

No, the same logic could be applied there too, I'll try that.

> >   const char *algo = crypto_tfm_alg_driver_name(crypto_comp_tfm(tfm));
> > + const char *name = crypto_tfm_alg_name(crypto_comp_tfm(tfm));
> >   char *output, *decomp_output;
> >   unsigned int i;
> >   int ret;
> > @@ -1363,6 +1364,8 @@ static int test_comp(struct crypto_comp *tfm,
> >   for (i = 0; i < ctcount; i++) {
> >   int ilen;
> >   unsigned int dlen = COMP_BUF_SIZE;
> > + struct crypto_comp *tfm_decomp = NULL;
> > + char *gname;
> >
> >   memset(output, 0, sizeof(COMP_BUF_SIZE));
> >   memset(decomp_output, 0, sizeof(COMP_BUF_SIZE));
> > @@ -1377,9 +1380,27 @@ static int test_comp(struct crypto_comp *tfm,
> >   goto out;
> >   }
> >
> > + /*
> > +  * If compression of a non-generic algorithm was tested try to
> > +  * decompress using the generic variant.
> > +  */
> > + if (!strstr(algo, "generic")) {
> 
> That's a pretty sloppy string comparison.  It matches "generic" anywhere in 
> the
> string, like "foogenericbar".  It should just be "-generic" at the end, right?

I think the maintainers should scream if someone misuses generic...

> Like:
> 
> static bool is_generic_driver(const char *driver_name)
> {
> size_t len = strlen(driver_name);
> 
> return len >= 8 && !strcmp(_name[len - 8], "-generic");
> }

... but I agree that this is cleaner.

> > + /* Construct name from cra_name + "-generic" */
> > + gname = kmalloc(strlen(name) + 9, GFP_KERNEL);
> > + strncpy(gname, name, strlen(name));
> > + strncpy(gname + strlen(name), "-generic", 9);
> > +
> > + tfm_decomp = crypto_alloc_comp(gname, 0, 0);
> > + kfree(gname);
> 
> If you're going to allocate memory here you need to check for error (note:
> kasprintf() would make building the string a bit cleaner).  But algorithm 
> names
> are limited anyway, so a better way may be:
> 
> char generic_name[CRYPTO_MAX_ALG_NAME];
> 
> if (snprintf(generic_name, "%s-generic", name) <
> sizeof(generic_name))
> tfm_decomp = crypto_alloc_comp(gname, 0, 0);
> 

OK (for the tests the stack usage can probably be ignored).

> > + }
> > +
> > + /* If there is no generic variant use the same tfm as before. 
> > */
> > + if (!tfm_decomp || IS_ERR(tfm_decomp))
> > + tfm_decomp = tfm;
> > +
> 
> if (!IS_ERR_OR_NULL(tfm_decomp))

OK.

> >   ilen = dlen;
> >   dlen = COMP_BUF_SIZE;
> > - ret = crypto_comp_decompress(tfm, output,
> > + ret = crypto_comp_decompress(tfm_decomp, output,
> >ilen, decomp_output, );
> 
> Shouldn't you decompress with both tfms, not just the generic one?

Good idea.

> It's also weird that each 'struct comp_testvec' in 'ctemplate[]' has an
> 'output', but it's never used.  The issue seems to be that there are separate
> test vectors for compression and decompression, but you really only need one
> set.  It would have the '.uncompressed' and '.compressed' data.  From that, 
> you
> could compress the '.uncompressed' data with the tfm under test, and 
> decompress
> that result with both the tfm under test and the generic tfm.  Then, you could
> decompress the '.compressed' data with the tfm under test and verify it 
> matches
> the '.uncompressed' data.  (I did something similar for symmetric ciphers in
> commit 92a4c9fef34c.)

I did the patches before your change. I'll see if the same can