Re: [PATCH V7 5/7] crypto: AES CBC multi-buffer glue code

2018-01-09 Thread Megha Dey
On Thu, 2017-08-03 at 13:27 +0800, Herbert Xu wrote:
> On Tue, Jul 25, 2017 at 07:09:58PM -0700, Megha Dey wrote:
> >
> > +/* notify the caller of progress ; request still stays in queue */
> > +
> > +static void notify_callback(struct mcryptd_skcipher_request_ctx *rctx,
> > +   struct mcryptd_alg_cstate *cstate,
> > +   int err)
> > +{
> > +   struct skcipher_request *req = cast_mcryptd_ctx_to_req(rctx);
> > +
> > +   local_bh_disable();
> > +   rctx->complete(>base, err);
> > +   local_bh_enable();
> > +}
> 
> Please explain why you have this crazy construct that does async
> operations behind the crypto API's back while pretending to be sync
> by always returning zero?
> 
> Why is this even needed now that you have switched the underlying
> implementation to be async?

Hi Herbert,

In the next version, I have removed this construct. After giving it some
thought, I realise this is actually incorrect. Hopefully now, both the
outer and inner algorithm are async. 
> 
> > +   /* from mcryptd, we need to callback */
> > +   if (irqs_disabled())
> > +   rctx->complete(>base, err);
> > +   else {
> > +   local_bh_disable();
> > +   rctx->complete(>base, err);
> > +   local_bh_enable();
> > +   }
> 
> I complained about this the first time around and yet this crap is
> still there.

Sorry about that, will get rid of the context check in the next version.
> 
> Cheers,




Re: [PATCH V7 5/7] crypto: AES CBC multi-buffer glue code

2017-08-02 Thread Herbert Xu
On Tue, Jul 25, 2017 at 07:09:58PM -0700, Megha Dey wrote:
>
> +/* notify the caller of progress ; request still stays in queue */
> +
> +static void notify_callback(struct mcryptd_skcipher_request_ctx *rctx,
> + struct mcryptd_alg_cstate *cstate,
> + int err)
> +{
> + struct skcipher_request *req = cast_mcryptd_ctx_to_req(rctx);
> +
> + local_bh_disable();
> + rctx->complete(>base, err);
> + local_bh_enable();
> +}

Please explain why you have this crazy construct that does async
operations behind the crypto API's back while pretending to be sync
by always returning zero?

Why is this even needed now that you have switched the underlying
implementation to be async?

> + /* from mcryptd, we need to callback */
> + if (irqs_disabled())
> + rctx->complete(>base, err);
> + else {
> + local_bh_disable();
> + rctx->complete(>base, err);
> + local_bh_enable();
> + }

I complained about this the first time around and yet this crap is
still there.

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


[PATCH V7 5/7] crypto: AES CBC multi-buffer glue code

2017-07-25 Thread Megha Dey
This patch introduces the multi-buffer job manager which is responsible
for submitting scatter-gather buffers from several AES CBC jobs
to the multi-buffer algorithm. The glue code interfaces with the
underlying algorithm that handles 8 data streams of AES CBC encryption
in parallel. AES key expansion and CBC decryption requests are performed
in a manner similar to the existing AESNI Intel glue driver.

The outline of the algorithm for AES CBC encryption requests is
sketched below:

Any driver requesting the crypto service will place an async crypto
request on the workqueue.  The multi-buffer crypto daemon will pull an
AES CBC encryption request from work queue and put each request in an
empty data lane for multi-buffer crypto computation.  When all the empty
lanes are filled, computation will commence on the jobs in parallel and
the job with the shortest remaining buffer will get completed and be
returned. To prevent prolonged stall, when no new jobs arrive, we will
flush workqueue of jobs after a maximum allowable delay has elapsed.

To accommodate the fragmented nature of scatter-gather, we will keep
submitting the next scatter-buffer fragment for a job for multi-buffer
computation until a job is completed and no more buffer fragments
remain. At that time we will pull a new job to fill the now empty data slot.
We check with the multibuffer scheduler to see if there are other
completed jobs to prevent extraneous delay in returning any completed
jobs.

This multi-buffer algorithm should be used for cases where we get at
least 8 streams of crypto jobs submitted at a reasonably high rate.
For low crypto job submission rate and low number of data streams, this
algorithm will not be beneficial. The reason is at low rate, we do not
fill out the data lanes before flushing the jobs instead of processing
them with all the data lanes full. We will miss the benefit of parallel
computation, and adding delay to the processing of the crypto job at the
same time.  Some tuning of the maximum latency parameter may be needed
to get the best performance.

Originally-by: Chandramouli Narayanan 
Signed-off-by: Megha Dey 
Acked-by: Tim Chen 
---
 arch/x86/crypto/Makefile|   1 +
 arch/x86/crypto/aes-cbc-mb/Makefile |  22 +
 arch/x86/crypto/aes-cbc-mb/aes_cbc_mb.c | 720 
 3 files changed, 743 insertions(+)
 create mode 100644 arch/x86/crypto/aes-cbc-mb/Makefile
 create mode 100644 arch/x86/crypto/aes-cbc-mb/aes_cbc_mb.c

diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile
index 34b3fa2..cc556a7 100644
--- a/arch/x86/crypto/Makefile
+++ b/arch/x86/crypto/Makefile
@@ -33,6 +33,7 @@ obj-$(CONFIG_CRYPTO_CRC32_PCLMUL) += crc32-pclmul.o
 obj-$(CONFIG_CRYPTO_SHA256_SSSE3) += sha256-ssse3.o
 obj-$(CONFIG_CRYPTO_SHA512_SSSE3) += sha512-ssse3.o
 obj-$(CONFIG_CRYPTO_CRCT10DIF_PCLMUL) += crct10dif-pclmul.o
+obj-$(CONFIG_CRYPTO_AES_CBC_MB) += aes-cbc-mb/
 obj-$(CONFIG_CRYPTO_POLY1305_X86_64) += poly1305-x86_64.o
 
 # These modules require assembler to support AVX.
diff --git a/arch/x86/crypto/aes-cbc-mb/Makefile 
b/arch/x86/crypto/aes-cbc-mb/Makefile
new file mode 100644
index 000..b642bd8
--- /dev/null
+++ b/arch/x86/crypto/aes-cbc-mb/Makefile
@@ -0,0 +1,22 @@
+#
+# Arch-specific CryptoAPI modules.
+#
+
+avx_supported := $(call as-instr,vpxor %xmm0$(comma)%xmm0$(comma)%xmm0,yes,no)
+
+# we need decryption and key expansion routine symbols
+# if either AESNI_NI_INTEL or AES_CBC_MB is a module
+
+ifeq ($(CONFIG_CRYPTO_AES_NI_INTEL),m)
+   dec_support := ../aesni-intel_asm.o
+endif
+ifeq ($(CONFIG_CRYPTO_AES_CBC_MB),m)
+   dec_support := ../aesni-intel_asm.o
+endif
+
+ifeq ($(avx_supported),yes)
+   obj-$(CONFIG_CRYPTO_AES_CBC_MB) += aes-cbc-mb.o
+   aes-cbc-mb-y := $(dec_support) aes_cbc_mb.o aes_mb_mgr_init.o \
+   mb_mgr_inorder_x8_asm.o mb_mgr_ooo_x8_asm.o \
+   aes_cbc_enc_x8.o
+endif
diff --git a/arch/x86/crypto/aes-cbc-mb/aes_cbc_mb.c 
b/arch/x86/crypto/aes-cbc-mb/aes_cbc_mb.c
new file mode 100644
index 000..341cbcb
--- /dev/null
+++ b/arch/x86/crypto/aes-cbc-mb/aes_cbc_mb.c
@@ -0,0 +1,720 @@
+/*
+ * Multi buffer AES CBC algorithm glue code
+ *
+ *
+ * This file is provided under a dual BSD/GPLv2 license.  When using or
+ * redistributing this file, you may do so under either license.
+ *
+ * GPL LICENSE SUMMARY
+ *
+ * Copyright(c) 2016 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * Contact Information:
+