Re: [PATCH 02/13] dmaengine: Introduce dma_request_slave_channel_compat_reason()

2015-11-19 Thread Peter Ujfalusi
On 11/18/2015 05:46 PM, Andy Shevchenko wrote:
> On Wed, Nov 18, 2015 at 4:21 PM, Peter Ujfalusi  wrote:
>> Hi Vinod,
>>
>> bringing this old thread back to life as I just started to work on this.
> 
> What I remember we need to convert drivers to use new API meanwhile it
> is good to keep old one to avoid patch storm which does nothing useful
> (IIRC Russel's opinion).

I tend to agree. But we need to start converting the users at some point
either way.
Another issue is the fact that the current dmaengine API is using all the good
names I can think of ;)

> On the other hand there are a lot of drivers that are used on the set
> of platforms starting from legacy and abandoned ones (like AVR32) to
> relatively new and newest.
> 
> And I'm not a fan of those thousands of API calls either.
> 

-- 
Péter
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/5] crypto: Multi-buffer encryptioin infrastructure support

2015-11-19 Thread Herbert Xu
On Wed, Nov 18, 2015 at 06:39:30PM -0800, Tim Chen wrote:
>
> The __cbc-aes-aesni-mb algorithm is marked as internal algorithm 
> with flag CRYPTO_ALG_INTERNAL, so it should not be picked up by other
> algorithms and should only be invoked from mcryptd.

OK I guess that's fine then.

> Anyway, I've udpated the aes_cbc_mb code with ablkcipher helper.
> So I will be posting the new series with ablkcipher walk
> after testing is done.

Yes I think what you have is a very special case.  As you said
the ablkcipher interface should be fairly low in overhead so I
think it makes sense to use that instead of blkcipher.

> The __sha1-mb works in tandem with the outer layer of mcryptd 
> aysnc algorithm. It does the completion for the outer 
> async algorithm.  So as far as mcryptd is concerned, the
> inner algorithm is synchronous in the sense that it is done
> once it dispatch the job to __sha1-mb and don't have to worry about it.
> I don't think mcryptd check for the return value from __sha1-mb
> so it should be okay to return 0 instead of -EINPROGRESS.  
> I'll double check that.

If it can never return EINPROGRESS then we should probably remove
the code in it that says "return -EINPROGRESS".

Thanks,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] drivers/crypto/qat/qat_common/Makefile: fix typo in clean-files

2015-11-19 Thread Jim Davis
A typo in the Makefile leaves qat_rsaprivkey-asn1.h hanging around.

Signed-off-by: Jim Davis 
---
 drivers/crypto/qat/qat_common/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/crypto/qat/qat_common/Makefile 
b/drivers/crypto/qat/qat_common/Makefile
index 9e9e196c6d51..12f40a38687e 100644
--- a/drivers/crypto/qat/qat_common/Makefile
+++ b/drivers/crypto/qat/qat_common/Makefile
@@ -4,7 +4,7 @@ $(obj)/qat_rsaprivkey-asn1.o: $(obj)/qat_rsaprivkey-asn1.c \
  $(obj)/qat_rsaprivkey-asn1.h
 
 clean-files += qat_rsapubkey-asn1.c qat_rsapubkey-asn1.h
-clean-files += qat_rsaprivkey-asn1.c qat_rsapvivkey-asn1.h
+clean-files += qat_rsaprivkey-asn1.c qat_rsaprivkey-asn1.h
 
 obj-$(CONFIG_CRYPTO_DEV_QAT) += intel_qat.o
 intel_qat-objs := adf_cfg.o \
-- 
2.6.2.195.g0c4dd78

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 4/5] crypto: AES CBC by8 encryption

2015-11-19 Thread Tim Chen

This patch introduces the assembly routine to do a by8 AES CBC encryption
in support of the AES CBC multi-buffer implementation.

Encryption of 8 data streams of a key size are done simultaneously.

Originally-by: Chandramouli Narayanan 
Signed-off-by: Tim Chen 
---
 arch/x86/crypto/aes-cbc-mb/aes_cbc_enc_x8.S | 774 
 1 file changed, 774 insertions(+)
 create mode 100644 arch/x86/crypto/aes-cbc-mb/aes_cbc_enc_x8.S

diff --git a/arch/x86/crypto/aes-cbc-mb/aes_cbc_enc_x8.S 
b/arch/x86/crypto/aes-cbc-mb/aes_cbc_enc_x8.S
new file mode 100644
index 000..eaffc28
--- /dev/null
+++ b/arch/x86/crypto/aes-cbc-mb/aes_cbc_enc_x8.S
@@ -0,0 +1,774 @@
+/*
+ * AES CBC by8 multibuffer optimization (x86_64)
+ * This file implements 128/192/256 bit AES CBC encryption
+ *
+ *
+ * This file is provided under a dual BSD/GPLv2 license.  When using or
+ * redistributing this file, you may do so under either license.
+ *
+ * GPL LICENSE SUMMARY
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * Contact Information:
+ * James Guilford 
+ * Sean Gulley 
+ * Tim Chen 
+ *
+ * BSD LICENSE
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * Neither the name of Intel Corporation nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ */
+#include 
+
+/* stack size needs to be an odd multiple of 8 for alignment */
+
+#define AES_KEYSIZE_12816
+#define AES_KEYSIZE_19224
+#define AES_KEYSIZE_25632
+
+#define XMM_SAVE_SIZE  16*10
+#define GPR_SAVE_SIZE  8*9
+#define STACK_SIZE (XMM_SAVE_SIZE + GPR_SAVE_SIZE)
+
+#define GPR_SAVE_REG   %rsp
+#define GPR_SAVE_AREA  %rsp + XMM_SAVE_SIZE
+#define LEN_AREA_OFFSETXMM_SAVE_SIZE + 8*8
+#define LEN_AREA_REG   %rsp
+#define LEN_AREA   %rsp + XMM_SAVE_SIZE + 8*8
+
+#define IN_OFFSET  0
+#define OUT_OFFSET 8*8
+#define KEYS_OFFSET16*8
+#define IV_OFFSET  24*8
+
+
+#define IDX%rax
+#define TMP%rbx
+#define ARG%rdi
+#define LEN%rsi
+
+#define KEYS0  %r14
+#define KEYS1  %r15
+#define KEYS2  %rbp
+#define KEYS3  %rdx
+#define KEYS4  %rcx
+#define KEYS5  %r8
+#define KEYS6  %r9
+#define KEYS7  %r10
+
+#define IN0%r11
+#define IN2%r12
+#define IN4%r13
+#define IN6LEN
+
+#define XDATA0 %xmm0
+#define XDATA1 %xmm1
+#define XDATA2 %xmm2
+#define XDATA3 %xmm3
+#define XDATA4 %xmm4
+#define XDATA5 %xmm5
+#define XDATA6 %xmm6
+#define XDATA7 %xmm7
+
+#define XKEY0_3%xmm8
+#define XKEY1_4%xmm9
+#define XKEY2_5%xmm10
+#define XKEY3_6%xmm11
+#define XKEY4_7%xmm12
+#define XKEY5_8%xmm13
+#define XKEY6_9%xmm14
+#define XTMP   %xmm15
+
+#defineMOVDQ movdqu /* assume buffers not aligned */
+#define CONCAT(a, b)   a##b
+#define INPUT_REG_SUFX 1   /* IN */
+#define XDATA_REG_SUFX 2   /* XDAT */
+#define KEY_REG_SUFX   3   /* KEY */
+#define XMM_REG_SUFX   4   /* XMM */
+
+/*
+ * To avoid positional parameter errors while compiling
+ * three registers 

[PATCH v3 5/5] crypto: AES CBC multi-buffer glue code

2015-11-19 Thread Tim Chen

This patch introduces the multi-buffer job manager which is responsible
for submitting scatter-gather buffers from several AES CBC jobs
to the multi-buffer algorithm. The glue code interfaces with the
underlying algorithm that handles 8 data streams of AES CBC encryption
in parallel. AES key expansion and CBC decryption requests are performed
in a manner similar to the existing AESNI Intel glue driver.

The outline of the algorithm for AES CBC encryption requests is
sketched below:

Any driver requesting the crypto service will place an async crypto
request on the workqueue.  The multi-buffer crypto daemon will pull an
AES CBC encryption request from work queue and put each request in an
empty data lane for multi-buffer crypto computation.  When all the empty
lanes are filled, computation will commence on the jobs in parallel and
the job with the shortest remaining buffer will get completed and be
returned. To prevent prolonged stall, when no new jobs arrive, we will
flush workqueue of jobs after a maximum allowable delay has elapsed.

To accommodate the fragmented nature of scatter-gather, we will keep
submitting the next scatter-buffer fragment for a job for multi-buffer
computation until a job is completed and no more buffer fragments remain.
At that time we will pull a new job to fill the now empty data slot.
We check with the multibuffer scheduler to see if there are other
completed jobs to prevent extraneous delay in returning any completed
jobs.

This multi-buffer algorithm should be used for cases where we get at
least 8 streams of crypto jobs submitted at a reasonably high rate.
For low crypto job submission rate and low number of data streams, this
algorithm will not be beneficial. The reason is at low rate, we do not
fill out the data lanes before flushing the jobs instead of processing
them with all the data lanes full.  We will miss the benefit of parallel
computation, and adding delay to the processing of the crypto job at the
same time.  Some tuning of the maximum latency parameter may be needed
to get the best performance.

Originally-by: Chandramouli Narayanan 
Signed-off-by: Tim Chen 
---
 arch/x86/crypto/Makefile|   1 +
 arch/x86/crypto/aes-cbc-mb/Makefile |  22 +
 arch/x86/crypto/aes-cbc-mb/aes_cbc_mb.c | 827 
 include/crypto/mcryptd.h|   2 +-
 4 files changed, 851 insertions(+), 1 deletion(-)
 create mode 100644 arch/x86/crypto/aes-cbc-mb/Makefile
 create mode 100644 arch/x86/crypto/aes-cbc-mb/aes_cbc_mb.c

diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile
index b9b912a..000db49 100644
--- a/arch/x86/crypto/Makefile
+++ b/arch/x86/crypto/Makefile
@@ -33,6 +33,7 @@ obj-$(CONFIG_CRYPTO_CRC32_PCLMUL) += crc32-pclmul.o
 obj-$(CONFIG_CRYPTO_SHA256_SSSE3) += sha256-ssse3.o
 obj-$(CONFIG_CRYPTO_SHA512_SSSE3) += sha512-ssse3.o
 obj-$(CONFIG_CRYPTO_CRCT10DIF_PCLMUL) += crct10dif-pclmul.o
+obj-$(CONFIG_CRYPTO_AES_CBC_MB) += aes-cbc-mb/
 obj-$(CONFIG_CRYPTO_POLY1305_X86_64) += poly1305-x86_64.o
 
 # These modules require assembler to support AVX.
diff --git a/arch/x86/crypto/aes-cbc-mb/Makefile 
b/arch/x86/crypto/aes-cbc-mb/Makefile
new file mode 100644
index 000..b642bd8
--- /dev/null
+++ b/arch/x86/crypto/aes-cbc-mb/Makefile
@@ -0,0 +1,22 @@
+#
+# Arch-specific CryptoAPI modules.
+#
+
+avx_supported := $(call as-instr,vpxor %xmm0$(comma)%xmm0$(comma)%xmm0,yes,no)
+
+# we need decryption and key expansion routine symbols
+# if either AESNI_NI_INTEL or AES_CBC_MB is a module
+
+ifeq ($(CONFIG_CRYPTO_AES_NI_INTEL),m)
+   dec_support := ../aesni-intel_asm.o
+endif
+ifeq ($(CONFIG_CRYPTO_AES_CBC_MB),m)
+   dec_support := ../aesni-intel_asm.o
+endif
+
+ifeq ($(avx_supported),yes)
+   obj-$(CONFIG_CRYPTO_AES_CBC_MB) += aes-cbc-mb.o
+   aes-cbc-mb-y := $(dec_support) aes_cbc_mb.o aes_mb_mgr_init.o \
+   mb_mgr_inorder_x8_asm.o mb_mgr_ooo_x8_asm.o \
+   aes_cbc_enc_x8.o
+endif
diff --git a/arch/x86/crypto/aes-cbc-mb/aes_cbc_mb.c 
b/arch/x86/crypto/aes-cbc-mb/aes_cbc_mb.c
new file mode 100644
index 000..f824e18
--- /dev/null
+++ b/arch/x86/crypto/aes-cbc-mb/aes_cbc_mb.c
@@ -0,0 +1,827 @@
+/*
+ * Multi buffer AES CBC algorithm glue code
+ *
+ *
+ * This file is provided under a dual BSD/GPLv2 license.  When using or
+ * redistributing this file, you may do so under either license.
+ *
+ * GPL LICENSE SUMMARY
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * 

[PATCH v3 0/5] crypto: x86 AES-CBC encryption with multibuffer

2015-11-19 Thread Tim Chen

In this patch series, we introduce AES CBC encryption that is parallelized
on x86_64 cpu with XMM registers. The multi-buffer technique encrypt 8
data streams in parallel with SIMD instructions. Decryption is handled
as in the existing AESNI Intel CBC implementation which can already
parallelize decryption even for a single data stream.

Please see the multi-buffer whitepaper for details of the technique:
http://www.intel.com/content/www/us/en/communications/communications-ia-multi-buffer-paper.html

It is important that any driver uses this algorithm properly for scenarios
where we have many data streams that can fill up the data lanes most of
the time.  It shouldn't be used when only a single data stream is expected
mostly. Otherwise we may incurr extra delays when we have frequent gaps in
data lanes, causing us to wait till data come in to fill the data lanes
before initiating encryption.  We may have to wait for flush operations
to commence when no new data come in after some wait time. However we
keep this extra delay to a minimum by opportunistically flushing the
unfinished jobs if crypto daemon is the only active task running on a cpu.

By using this technique, we saw a throughput increase of up to 5.7x under
optimal conditions when we have fully loaded encryption jobs filling up
all the data lanes.

Change Log:
v3
1. Use ablkcipher_walk helpers to walk the scatter gather list
and eliminated needs to modify blkcipher_walk for multibuffer cipher

v2
1. Update cpu feature check to make sure SSE is supported
2. Fix up unloading of aes-cbc-mb module to properly free memory


Tim Chen (5):
  crypto: Multi-buffer encryptioin infrastructure support
  crypto: AES CBC multi-buffer data structures
  crypto: AES CBC multi-buffer scheduler
  crypto: AES CBC by8 encryption
  crypto: AES CBC multi-buffer glue code

 arch/x86/crypto/Makefile   |   1 +
 arch/x86/crypto/aes-cbc-mb/Makefile|  22 +
 arch/x86/crypto/aes-cbc-mb/aes_cbc_enc_x8.S| 774 +++
 arch/x86/crypto/aes-cbc-mb/aes_cbc_mb.c| 827 +
 arch/x86/crypto/aes-cbc-mb/aes_cbc_mb_ctx.h|  96 +++
 arch/x86/crypto/aes-cbc-mb/aes_cbc_mb_mgr.h| 131 
 arch/x86/crypto/aes-cbc-mb/aes_mb_mgr_init.c   | 145 
 arch/x86/crypto/aes-cbc-mb/mb_mgr_datastruct.S | 270 +++
 arch/x86/crypto/aes-cbc-mb/mb_mgr_inorder_x8_asm.S | 222 ++
 arch/x86/crypto/aes-cbc-mb/mb_mgr_ooo_x8_asm.S | 416 +++
 arch/x86/crypto/aes-cbc-mb/reg_sizes.S | 125 
 crypto/Kconfig |  16 +
 crypto/mcryptd.c   | 256 ++-
 include/crypto/algapi.h|   1 +
 include/crypto/mcryptd.h   |  36 +
 15 files changed, 3337 insertions(+), 1 deletion(-)
 create mode 100644 arch/x86/crypto/aes-cbc-mb/Makefile
 create mode 100644 arch/x86/crypto/aes-cbc-mb/aes_cbc_enc_x8.S
 create mode 100644 arch/x86/crypto/aes-cbc-mb/aes_cbc_mb.c
 create mode 100644 arch/x86/crypto/aes-cbc-mb/aes_cbc_mb_ctx.h
 create mode 100644 arch/x86/crypto/aes-cbc-mb/aes_cbc_mb_mgr.h
 create mode 100644 arch/x86/crypto/aes-cbc-mb/aes_mb_mgr_init.c
 create mode 100644 arch/x86/crypto/aes-cbc-mb/mb_mgr_datastruct.S
 create mode 100644 arch/x86/crypto/aes-cbc-mb/mb_mgr_inorder_x8_asm.S
 create mode 100644 arch/x86/crypto/aes-cbc-mb/mb_mgr_ooo_x8_asm.S
 create mode 100644 arch/x86/crypto/aes-cbc-mb/reg_sizes.S

-- 
1.7.11.7


--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 1/5] crypto: Multi-buffer encryptioin infrastructure support

2015-11-19 Thread Tim Chen

In this patch, the infrastructure needed to support multibuffer
encryption implementation is added:

a) Enhace mcryptd daemon to support blkcipher requests.

b) Update configuration to include multi-buffer encryption build support.

For an introduction to the multi-buffer implementation, please see
http://www.intel.com/content/www/us/en/communications/communications-ia-multi-buffer-paper.html

Originally-by: Chandramouli Narayanan 
Signed-off-by: Tim Chen 
---
 crypto/Kconfig   |  16 +++
 crypto/mcryptd.c | 256 ++-
 include/crypto/algapi.h  |   1 +
 include/crypto/mcryptd.h |  36 +++
 4 files changed, 308 insertions(+), 1 deletion(-)

diff --git a/crypto/Kconfig b/crypto/Kconfig
index 7240821..6b51084 100644
--- a/crypto/Kconfig
+++ b/crypto/Kconfig
@@ -888,6 +888,22 @@ config CRYPTO_AES_NI_INTEL
  ECB, CBC, LRW, PCBC, XTS. The 64 bit version has additional
  acceleration for CTR.
 
+config CRYPTO_AES_CBC_MB
+   tristate "AES CBC algorithm (x86_64 Multi-Buffer, Experimental)"
+   depends on X86 && 64BIT
+   select CRYPTO_ABLK_HELPER
+   select CRYPTO_MCRYPTD
+   help
+ AES CBC encryption implemented using multi-buffer technique.
+ This algorithm computes on multiple data lanes concurrently with
+ SIMD instructions for better throughput.  It should only be
+ used when there is significant work to generate many separate
+ crypto requests that keep all the data lanes filled to get
+ the performance benefit.  If the data lanes are unfilled, a
+ flush operation will be initiated after some delay to process
+ the exisiting crypto jobs, adding some extra latency at low
+ load case.
+
 config CRYPTO_AES_SPARC64
tristate "AES cipher algorithms (SPARC64)"
depends on SPARC64
diff --git a/crypto/mcryptd.c b/crypto/mcryptd.c
index fe5b495a..01f747c 100644
--- a/crypto/mcryptd.c
+++ b/crypto/mcryptd.c
@@ -116,8 +116,28 @@ static int mcryptd_enqueue_request(struct mcryptd_queue 
*queue,
return err;
 }
 
+static int mcryptd_enqueue_blkcipher_request(struct mcryptd_queue *queue,
+ struct crypto_async_request *request,
+ struct mcryptd_blkcipher_request_ctx *rctx)
+{
+   int cpu, err;
+   struct mcryptd_cpu_queue *cpu_queue;
+
+   cpu = get_cpu();
+   cpu_queue = this_cpu_ptr(queue->cpu_queue);
+   rctx->tag.cpu = cpu;
+
+   err = crypto_enqueue_request(_queue->queue, request);
+   pr_debug("enqueue request: cpu %d cpu_queue %p request %p\n",
+cpu, cpu_queue, request);
+   queue_work_on(cpu, kcrypto_wq, _queue->work);
+   put_cpu();
+
+   return err;
+}
+
 /*
- * Try to opportunisticlly flush the partially completed jobs if
+ * Try to opportunistically flush the partially completed jobs if
  * crypto daemon is the only task running.
  */
 static void mcryptd_opportunistic_flush(void)
@@ -225,6 +245,130 @@ static inline struct mcryptd_queue 
*mcryptd_get_queue(struct crypto_tfm *tfm)
return ictx->queue;
 }
 
+static int mcryptd_blkcipher_setkey(struct crypto_ablkcipher *parent,
+  const u8 *key, unsigned int keylen)
+{
+   struct mcryptd_blkcipher_ctx *ctx = crypto_ablkcipher_ctx(parent);
+   struct crypto_blkcipher *child = ctx->child;
+   int err;
+
+   crypto_blkcipher_clear_flags(child, CRYPTO_TFM_REQ_MASK);
+   crypto_blkcipher_set_flags(child, crypto_ablkcipher_get_flags(parent) &
+ CRYPTO_TFM_REQ_MASK);
+   err = crypto_blkcipher_setkey(child, key, keylen);
+   crypto_ablkcipher_set_flags(parent, crypto_blkcipher_get_flags(child) &
+   CRYPTO_TFM_RES_MASK);
+   return err;
+}
+
+static void mcryptd_blkcipher_crypt(struct ablkcipher_request *req,
+  struct crypto_blkcipher *child,
+  int err,
+  int (*crypt)(struct blkcipher_desc *desc,
+   struct scatterlist *dst,
+   struct scatterlist *src,
+   unsigned int len))
+{
+   struct mcryptd_blkcipher_request_ctx *rctx;
+   struct blkcipher_desc desc;
+
+   rctx = ablkcipher_request_ctx(req);
+
+   if (unlikely(err == -EINPROGRESS))
+   goto out;
+
+   /* set up the blkcipher request to work on */
+   desc.tfm = child;
+   desc.info = req->info;
+   desc.flags = CRYPTO_TFM_REQ_MAY_SLEEP;
+   rctx->desc = desc;
+
+   /*
+* pass addr of descriptor stored in the request context
+* so that the callee can get to the request context
+*/
+   err = crypt(>desc, 

[PATCH v3 2/5] crypto: AES CBC multi-buffer data structures

2015-11-19 Thread Tim Chen

This patch introduces the data structures and prototypes of functions
needed for doing AES CBC encryption using multi-buffer. Included are
the structures of the multi-buffer AES CBC job, job scheduler in C and
data structure defines in x86 assembly code.

Originally-by: Chandramouli Narayanan 
Signed-off-by: Tim Chen 
---
 arch/x86/crypto/aes-cbc-mb/aes_cbc_mb_ctx.h|  96 +
 arch/x86/crypto/aes-cbc-mb/aes_cbc_mb_mgr.h| 131 
 arch/x86/crypto/aes-cbc-mb/mb_mgr_datastruct.S | 270 +
 arch/x86/crypto/aes-cbc-mb/reg_sizes.S | 125 
 4 files changed, 622 insertions(+)
 create mode 100644 arch/x86/crypto/aes-cbc-mb/aes_cbc_mb_ctx.h
 create mode 100644 arch/x86/crypto/aes-cbc-mb/aes_cbc_mb_mgr.h
 create mode 100644 arch/x86/crypto/aes-cbc-mb/mb_mgr_datastruct.S
 create mode 100644 arch/x86/crypto/aes-cbc-mb/reg_sizes.S

diff --git a/arch/x86/crypto/aes-cbc-mb/aes_cbc_mb_ctx.h 
b/arch/x86/crypto/aes-cbc-mb/aes_cbc_mb_ctx.h
new file mode 100644
index 000..5493f83
--- /dev/null
+++ b/arch/x86/crypto/aes-cbc-mb/aes_cbc_mb_ctx.h
@@ -0,0 +1,96 @@
+/*
+ * Header file for multi buffer AES CBC algorithm manager
+ * that deals with 8 buffers at a time
+ *
+ *
+ * This file is provided under a dual BSD/GPLv2 license.  When using or
+ * redistributing this file, you may do so under either license.
+ *
+ * GPL LICENSE SUMMARY
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * Contact Information:
+ * James Guilford 
+ * Sean Gulley 
+ * Tim Chen 
+ *
+ * BSD LICENSE
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * Neither the name of Intel Corporation nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ */
+#ifndef __AES_CBC_MB_CTX_H
+#define __AES_CBC_MB_CTX_H
+
+
+#include 
+
+#include "aes_cbc_mb_mgr.h"
+
+#define CBC_ENCRYPT0x01
+#define CBC_DECRYPT0x02
+#define CBC_START  0x04
+#define CBC_DONE   0x08
+
+#define CBC_CTX_STS_IDLE   0x00
+#define CBC_CTX_STS_PROCESSING 0x01
+#define CBC_CTX_STS_LAST   0x02
+#define CBC_CTX_STS_COMPLETE   0x04
+
+enum cbc_ctx_error {
+   CBC_CTX_ERROR_NONE   =  0,
+   CBC_CTX_ERROR_INVALID_FLAGS  = -1,
+   CBC_CTX_ERROR_ALREADY_PROCESSING = -2,
+   CBC_CTX_ERROR_ALREADY_COMPLETED  = -3,
+};
+
+#define cbc_ctx_init(ctx, nbytes, op) \
+   do { \
+   (ctx)->flag = (op) | CBC_START; \
+   (ctx)->nbytes = nbytes; \
+   } while (0)
+
+/* AESNI routines to perform cbc decrypt and key expansion */
+
+asmlinkage void aesni_cbc_dec(struct crypto_aes_ctx *ctx, u8 *out,
+ const u8 *in, unsigned int len, u8 *iv);
+asmlinkage int aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key,
+unsigned int key_len);
+
+#endif /* __AES_CBC_MB_CTX_H */
diff --git a/arch/x86/crypto/aes-cbc-mb/aes_cbc_mb_mgr.h 
b/arch/x86/crypto/aes-cbc-mb/aes_cbc_mb_mgr.h
new file mode 100644
index 000..0def82e
--- /dev/null
+++ 

Re: [PATCH 02/13] dmaengine: Introduce dma_request_slave_channel_compat_reason()

2015-11-19 Thread Arnd Bergmann
On Thursday 19 November 2015 12:34:22 Peter Ujfalusi wrote:
> 
> I think we can go with a single API, but I don't really like that:
> dma_request_channel(dev, name, *mask, fn, fn_param);
> 
> This would cover all current uses being legacy, DT/ACPI, compat, etc:
> dma_request_channel(NULL, NULL, , fn, fn_param); /* Legacy slave */
> dma_request_channel(NULL, NULL, , NULL, NULL); /* memcpy. etc */
> dma_request_channel(dev, name, NULL, NULL, NULL); /* DT/ACPI, current slave */
> dma_request_channel(dev, name, , fn, fn_param); /* current compat */
> 
> Note, that we need "const dma_cap_mask_t *mask" to be able to make the mask
> optional.

Right, that would work, but I also don't really like it.

> If we have two main APIs, one to request slave channels and one to get any
> channel with given capability
> dma_request_slave_channel(NULL, NULL, , fn, fn_param); /* Legacy slave */
> dma_request_slave_channel(dev, name, NULL, NULL, NULL); /* DT/ACPI, current
>slave */
> dma_request_slave_channel(dev, name, , fn, fn_param); /* current compat*/
> 
> This way we can omit the mask also in cases when the client only want to get
> DMA_SLAVE, we can just build up the mask within the function. If the mask is
> provided we would copy the bits from the provided mask, so for example if you
> want to have DMA_SLAVE+DMA_CYCLIC, the driver only needs to pass DMA_CYCLIC,
> the DMA_SLAVE is going to be set anyways.

I think it's more logical here to have mask=NULL mean that we want DMA_SLAVE,
but otherwise pass the full mask as DMA_SLAVE|DMA_CYCLIC etc.

> dma_request_channel(mask); /* memcpy. etc, non slave mostly */
> 
> Not sure how to name this as reusing existing (good, descriptive) function
> names would mean changes all over the kernel to start off this.
> 
> Not used and
> request_dma_channel(); /* as _irq/_mem_region/_resource, etc */
> request_dma();
> dma_channel_request();

dma_request_slavechan();
dma_request_slave();
dma_request_mask();

> All in all, not sure which way would be better...

I think I would prefer the simplest API to have only the dev+name
arguments, as we tend to move that way for all platforms anyway, and it
seems silly to have all drivers pass three NULL arguments all the time.
At the moment, there are 139 references to dma_request_slave_channel_*
in the kernel, and only 46 of them are dma_request_slave_channel_compat.
Out of those 46, a couple can already be converted back to use
dma_request_slave_channel() because the platform now only supports
devicetree based boots and will not go back to platform data.

How about something like

extern struct dma_chan *
__dma_request_chan(struct device *dev, const char *name,
const dma_cap_mask_t *mask, dma_filter_fn fn, void 
*fn_param);

static inline struct dma_chan *
dma_request_slavechan(struct device *dev, const char *name)
{
return __dma_request_chan(dev, name, NULL, NULL, NULL);
}

static inline struct dma_chan *
dma_request_chan(const dma_cap_mask_t *mask)
{
return __dma_request_chan(NULL, NULL, mask, NULL, NULL);
}

That way the vast majority of drivers can use one of the two nice interfaces
and the rest can be converted to use __dma_request_chan().

On a related topic, we had in the past considered providing a way for
platform code to register a lookup table of some sort, to associate
a device/name pair with a configuration. That would let us use the
simplified dma_request_slavechan(dev, name) pair everywhere. We could
use the same method that we have for clk_register_clkdevs() or
pinctrl_register_map().

Something like either

static struct dma_chan_map myplatform_dma_map[] = {
{ .devname = "omap-aes0", .slave = "tx", .filter = omap_dma_filter_fn, 
.arg = (void *)65, },
{ .devname = "omap-aes0", .slave = "rx", .filter = omap_dma_filter_fn, 
.arg = (void *)66, },
};

or

static struct dma_chan_map myplatform_dma_map[] = {
{ .devname = "omap-aes0", .slave = "tx", .master = "omap-dma-engine0", 
.req = 65, },
{ .devname = "omap-aes0", .slave = "rx", .master = "omap-dma-engine0", 
.req = 66, },
};

we could even allow a combination of the two, so the simple case just specifies
master and req number, which requires changes to the dmaengine driver, but we 
could
also do a mass-conversion to the .filter/.arg variant.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] crypto: picoxcell: set [src|dst]_nents and nents as signed int

2015-11-19 Thread LABBE Corentin
The unsigned int variables [src|dst]_nents and nents can be assigned
signed value (-EINVAL) from sg_nents_for_len().
Furthermore they are used only by dma_map_sg and dma_unmap_sg which wait
for an signed int, so they must be set as int.

Fixes: f051f95eb47b ("crypto: picoxcell - check return value of 
sg_nents_for_len")
Reported-by: Dan Carpenter 
Signed-off-by: LABBE Corentin 
---
 drivers/crypto/picoxcell_crypto.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/crypto/picoxcell_crypto.c 
b/drivers/crypto/picoxcell_crypto.c
index a9c6367..15b5e39 100644
--- a/drivers/crypto/picoxcell_crypto.c
+++ b/drivers/crypto/picoxcell_crypto.c
@@ -289,10 +289,11 @@ static struct spacc_ddt *spacc_sg_to_ddt(struct 
spacc_engine *engine,
 enum dma_data_direction dir,
 dma_addr_t *ddt_phys)
 {
-   unsigned nents, mapped_ents;
+   unsigned mapped_ents;
struct scatterlist *cur;
struct spacc_ddt *ddt;
int i;
+   int nents;
 
nents = sg_nents_for_len(payload, nbytes);
if (nents < 0) {
@@ -326,7 +327,7 @@ static int spacc_aead_make_ddts(struct aead_request *areq)
struct spacc_engine *engine = req->engine;
struct spacc_ddt *src_ddt, *dst_ddt;
unsigned total;
-   unsigned int src_nents, dst_nents;
+   int src_nents, dst_nents;
struct scatterlist *cur;
int i, dst_ents, src_ents;
 
-- 
2.4.10

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] crypto: sahara: set nb_[in|out]_sg as signed int

2015-11-19 Thread LABBE Corentin
The two unsigned int variables nb_in_sg and nb_out_sg can be assigned
signed value (-EINVAL) from sg_nents_for_len().
Furthermore they are used only by dma_map_sg and dma_unmap_sg which wait
for an signed int, so they must be set as int.

Fixes: 6c2b74d4774f ("crypto: sahara - check return value of sg_nents_for_len")
Reported-by: Dan Carpenter 
Signed-off-by: LABBE Corentin 
---
 drivers/crypto/sahara.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/crypto/sahara.c b/drivers/crypto/sahara.c
index ea9f56a..cc738f3 100644
--- a/drivers/crypto/sahara.c
+++ b/drivers/crypto/sahara.c
@@ -228,9 +228,9 @@ struct sahara_dev {
 
size_t  total;
struct scatterlist  *in_sg;
-   unsigned intnb_in_sg;
+   int nb_in_sg;
struct scatterlist  *out_sg;
-   unsigned intnb_out_sg;
+   int nb_out_sg;
 
u32 error;
 };
-- 
2.4.10

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 02/13] dmaengine: Introduce dma_request_slave_channel_compat_reason()

2015-11-19 Thread Peter Ujfalusi
On 11/18/2015 05:07 PM, Arnd Bergmann wrote:
> On Wednesday 18 November 2015 16:41:35 Peter Ujfalusi wrote:
>> On 11/18/2015 04:29 PM, Arnd Bergmann wrote:
>>> On Wednesday 18 November 2015 16:21:26 Peter Ujfalusi wrote:
 2. non slave channel requests, where only the functionality matters, like
 memcpy, interleaved, memset, etc.
 We could have a simple:
 dma_request_channel(mask);

 But looking at the drivers using dmaengine legacy dma_request_channel() 
 API:
 Some sets DMA_INTERRUPT or DMA_PRIVATE or DMA_SG along with DMA_SLAVE:
 drivers/misc/carma/carma-fpga.c 
 DMA_INTERRUPT|DMA_SLAVE|DMA_SG
 drivers/misc/carma/carma-fpga-program.c DMA_MEMCPY|DMA_SLAVE|DMA_SG
 drivers/media/platform/soc_camera/mx3_camera.c  DMA_SLAVE|DMA_PRIVATE
 sound/soc/intel/common/sst-firmware.c   DMA_SLAVE|DMA_MEMCPY

 as examples.
 Not sure how valid are these...
> 
> I just had a look myself. carma has been removed fortunately in linux-next,
> so we don't have to worry about that any more.
> 
> I assume that the sst-firmware.c case is a mistake, it should just use a
> plain DMA_SLAVE and not DMA_MEMCPY.
> 
> Aside from these, everyone else uses either DMA_CYCLIC in addition to
> DMA_SLAVE, which seems valid, or they use DMA_PRIVATE, which I think is
> redundant in slave drivers and can be removed.

Yep, CYCLIC. How could I forgot that ;)

>>> It's usually not much harder to separate out the legacy case from
>>> the normal dma_request_slave_channel_reason(), so those drivers don't
>>> really need to use the unified compat API.
>>
>> The current dma_request_slave_channel()/_reason() is not the 'legacy' API.
>> Currently there is no way to get the reason why the dma channel request fails
>> when using the _compat() version of the API, which is used by drivers which
>> can be used in DT or in legacy mode as well. Sure, they all could have local
>> if(){}else{} for handling this, but it is not a nice thing.
>>
>> As it was discussed instead of adding the _reason() version for the _compat
>> call, we should simplify the dmaengine API for getting the channel and at the
>> same time we will have ERR_PTR returned instead of NULL.
> 
> What I meant was that we don't need to handle them with the unified
> simple interface. The users of DMA_CYCLIC can just keep using
> an internal helper that only deals with the legacy case, or use
> dma_request_slave() or whatever is the new API for the DT case.

I think we can go with a single API, but I don't really like that:
dma_request_channel(dev, name, *mask, fn, fn_param);

This would cover all current uses being legacy, DT/ACPI, compat, etc:
dma_request_channel(NULL, NULL, , fn, fn_param); /* Legacy slave */
dma_request_channel(NULL, NULL, , NULL, NULL); /* memcpy. etc */
dma_request_channel(dev, name, NULL, NULL, NULL); /* DT/ACPI, current slave */
dma_request_channel(dev, name, , fn, fn_param); /* current compat */

Note, that we need "const dma_cap_mask_t *mask" to be able to make the mask
optional.

If we have two main APIs, one to request slave channels and one to get any
channel with given capability
dma_request_slave_channel(NULL, NULL, , fn, fn_param); /* Legacy slave */
dma_request_slave_channel(dev, name, NULL, NULL, NULL); /* DT/ACPI, current
   slave */
dma_request_slave_channel(dev, name, , fn, fn_param); /* current compat*/

This way we can omit the mask also in cases when the client only want to get
DMA_SLAVE, we can just build up the mask within the function. If the mask is
provided we would copy the bits from the provided mask, so for example if you
want to have DMA_SLAVE+DMA_CYCLIC, the driver only needs to pass DMA_CYCLIC,
the DMA_SLAVE is going to be set anyways.

dma_request_channel(mask); /* memcpy. etc, non slave mostly */

Not sure how to name this as reusing existing (good, descriptive) function
names would mean changes all over the kernel to start off this.

Not used and
request_dma_channel(); /* as _irq/_mem_region/_resource, etc */
request_dma();
dma_channel_request();

All in all, not sure which way would be better...

-- 
Péter
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 1/4] lib/mpi: only require buffers as big as needed for the integer

2015-11-19 Thread Andrew Zaborowski
Since mpi_write_to_sgl and mpi_read_buffer explicitly left-align the
integers being written it makes no sense to require a buffer big enough for
the number + the leading zero bytes which are not written.  The error
returned also doesn't convey any information.  So instead require only the
size needed and return -EOVERFLOW to signal when buffer too short.

Signed-off-by: Andrew Zaborowski 
---
No changes since v1
---
 lib/mpi/mpicoder.c | 21 +
 1 file changed, 17 insertions(+), 4 deletions(-)

diff --git a/lib/mpi/mpicoder.c b/lib/mpi/mpicoder.c
index c7e0a70..074d2df 100644
--- a/lib/mpi/mpicoder.c
+++ b/lib/mpi/mpicoder.c
@@ -135,7 +135,9 @@ EXPORT_SYMBOL_GPL(mpi_read_from_buffer);
  * @buf:   bufer to which the output will be written to. Needs to be at
  * leaset mpi_get_size(a) long.
  * @buf_len:   size of the buf.
- * @nbytes:receives the actual length of the data written.
+ * @nbytes:receives the actual length of the data written on success and
+ * the data to-be-written on -EOVERFLOW in case buf_len was too
+ * small.
  * @sign:  if not NULL, it will be set to the sign of a.
  *
  * Return: 0 on success or error code in case of error
@@ -148,7 +150,7 @@ int mpi_read_buffer(MPI a, uint8_t *buf, unsigned buf_len, 
unsigned *nbytes,
unsigned int n = mpi_get_size(a);
int i, lzeros = 0;
 
-   if (buf_len < n || !buf || !nbytes)
+   if (!buf || !nbytes)
return -EINVAL;
 
if (sign)
@@ -163,6 +165,11 @@ int mpi_read_buffer(MPI a, uint8_t *buf, unsigned buf_len, 
unsigned *nbytes,
break;
}
 
+   if (buf_len < n - lzeros) {
+   *nbytes = n - lzeros;
+   return -EOVERFLOW;
+   }
+
p = buf;
*nbytes = n - lzeros;
 
@@ -332,7 +339,8 @@ EXPORT_SYMBOL_GPL(mpi_set_buffer);
  * @nbytes:in/out param - it has the be set to the maximum number of
  * bytes that can be written to sgl. This has to be at least
  * the size of the integer a. On return it receives the actual
- * length of the data written.
+ * length of the data written on success or the data that would
+ * be written if buffer was too small.
  * @sign:  if not NULL, it will be set to the sign of a.
  *
  * Return: 0 on success or error code in case of error
@@ -345,7 +353,7 @@ int mpi_write_to_sgl(MPI a, struct scatterlist *sgl, 
unsigned *nbytes,
unsigned int n = mpi_get_size(a);
int i, x, y = 0, lzeros = 0, buf_len;
 
-   if (!nbytes || *nbytes < n)
+   if (!nbytes)
return -EINVAL;
 
if (sign)
@@ -360,6 +368,11 @@ int mpi_write_to_sgl(MPI a, struct scatterlist *sgl, 
unsigned *nbytes,
break;
}
 
+   if (*nbytes < n - lzeros) {
+   *nbytes = n - lzeros;
+   return -EOVERFLOW;
+   }
+
*nbytes = n - lzeros;
buf_len = sgl->length;
p2 = sg_virt(sgl);
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/4] crypto: RSA padding algorithm

2015-11-19 Thread Andrew Zaborowski
This patch adds PKCS#1 v1.5 standard RSA padding as a separate template.
This way an RSA cipher with padding can be obtained by instantiating
"pkcs1pad(rsa)".  The reason for adding this is that RSA is almost
never used without this padding (or OAEP) so it will be needed for
either certificate work in the kernel or the userspace, and I also hear
that it is likely implemented by hardware RSA in which case hardware
implementations of the whole of pkcs1pad(rsa) can be provided.

Signed-off-by: Andrew Zaborowski 
---
v2: rename rsa-padding.c to rsa-pkcs1pad.c,
use a memset instead of a loop,
add a key size check in pkcs1pad_sign,
add a general comment about pkcs1pad_verify
v3: rewrite the initialisation to avoid an obsolete and less flexible
mechanism, now following the aead template initialisation.
---
 crypto/Makefile   |   1 +
 crypto/rsa-pkcs1pad.c | 604 ++
 crypto/rsa.c  |  16 +-
 include/crypto/internal/rsa.h |   2 +
 4 files changed, 622 insertions(+), 1 deletion(-)
 create mode 100644 crypto/rsa-pkcs1pad.c

diff --git a/crypto/Makefile b/crypto/Makefile
index f7aba92..2acdbbd 100644
--- a/crypto/Makefile
+++ b/crypto/Makefile
@@ -40,6 +40,7 @@ rsa_generic-y := rsapubkey-asn1.o
 rsa_generic-y += rsaprivkey-asn1.o
 rsa_generic-y += rsa.o
 rsa_generic-y += rsa_helper.o
+rsa_generic-y += rsa-pkcs1pad.o
 obj-$(CONFIG_CRYPTO_RSA) += rsa_generic.o
 
 cryptomgr-y := algboss.o testmgr.o
diff --git a/crypto/rsa-pkcs1pad.c b/crypto/rsa-pkcs1pad.c
new file mode 100644
index 000..8ee22a2
--- /dev/null
+++ b/crypto/rsa-pkcs1pad.c
@@ -0,0 +1,604 @@
+/*
+ * RSA padding templates.
+ *
+ * Copyright (c) 2015  Intel Corporation
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)
+ * any later version.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+struct pkcs1pad_ctx {
+   struct crypto_akcipher *child;
+
+   unsigned int key_size;
+};
+
+struct pkcs1pad_request {
+   struct akcipher_request child_req;
+
+   struct scatterlist in_sg[3], out_sg[2];
+   uint8_t *in_buf, *out_buf;
+};
+
+static int pkcs1pad_set_pub_key(struct crypto_akcipher *tfm, const void *key,
+   unsigned int keylen)
+{
+   struct pkcs1pad_ctx *ctx = akcipher_tfm_ctx(tfm);
+   int err, size;
+
+   err = crypto_akcipher_set_pub_key(ctx->child, key, keylen);
+
+   if (!err) {
+   /* Find out new modulus size from rsa implementation */
+   size = crypto_akcipher_maxsize(ctx->child);
+
+   ctx->key_size = size > 0 ? size : 0;
+   if (size <= 0)
+   err = size;
+   }
+
+   return err;
+}
+
+static int pkcs1pad_set_priv_key(struct crypto_akcipher *tfm, const void *key,
+   unsigned int keylen)
+{
+   struct pkcs1pad_ctx *ctx = akcipher_tfm_ctx(tfm);
+   int err, size;
+
+   err = crypto_akcipher_set_priv_key(ctx->child, key, keylen);
+
+   if (!err) {
+   /* Find out new modulus size from rsa implementation */
+   size = crypto_akcipher_maxsize(ctx->child);
+
+   ctx->key_size = size > 0 ? size : 0;
+   if (size <= 0)
+   err = size;
+   }
+
+   return err;
+}
+
+static int pkcs1pad_get_max_size(struct crypto_akcipher *tfm)
+{
+   struct pkcs1pad_ctx *ctx = akcipher_tfm_ctx(tfm);
+
+   /*
+* The maximum destination buffer size for the encrypt/sign operations
+* will be the same as for RSA, even though it's smaller for
+* decrypt/verify.
+*/
+
+   return ctx->key_size ?: -EINVAL;
+}
+
+static void pkcs1pad_sg_set_buf(struct scatterlist *sg, void *buf, size_t len,
+   struct scatterlist *next)
+{
+   int nsegs = next ? 1 : 0;
+
+   if (offset_in_page(buf) + len <= PAGE_SIZE) {
+   nsegs += 1;
+   sg_init_table(sg, nsegs);
+   sg_set_buf(sg, buf, len);
+   } else {
+   nsegs += 2;
+   sg_init_table(sg, nsegs);
+   sg_set_buf(sg + 0, buf, PAGE_SIZE - offset_in_page(buf));
+   sg_set_buf(sg + 1, buf + PAGE_SIZE - offset_in_page(buf),
+   offset_in_page(buf) + len - PAGE_SIZE);
+   }
+
+   if (next)
+   sg_chain(sg, nsegs, next);
+}
+
+static int pkcs1pad_encrypt_sign_complete(struct akcipher_request *req, int 
err)
+{
+   struct crypto_akcipher *tfm = crypto_akcipher_reqtfm(req);
+   struct pkcs1pad_ctx *ctx = akcipher_tfm_ctx(tfm);
+   struct pkcs1pad_request *req_ctx = akcipher_request_ctx(req);
+   uint8_t zeros[ctx->key_size - req_ctx->child_req.dst_len];
+
+   if 

[PATCH v3 2/4] crypto: rsa: only require output buffers as big as needed.

2015-11-19 Thread Andrew Zaborowski
rhe RSA operations explicitly left-align the integers being written
skipping any leading zero bytes, but still require the output buffers to
include just enough space for the integer + the leading zero bytes.
Since the size of integer + the leading zero bytes (i.e. the key modulus
size) can now be obtained more easily through crypto_akcipher_maxsize
change the operations to only require as big a buffer as actually needed
if the caller has that information.  The semantics for request->dst_len
don't change.

Signed-off-by: Andrew Zaborowski 
---
No changes since v1
---
 crypto/rsa.c | 24 
 1 file changed, 24 deletions(-)

diff --git a/crypto/rsa.c b/crypto/rsa.c
index 1093e04..58aad69 100644
--- a/crypto/rsa.c
+++ b/crypto/rsa.c
@@ -91,12 +91,6 @@ static int rsa_enc(struct akcipher_request *req)
goto err_free_c;
}
 
-   if (req->dst_len < mpi_get_size(pkey->n)) {
-   req->dst_len = mpi_get_size(pkey->n);
-   ret = -EOVERFLOW;
-   goto err_free_c;
-   }
-
ret = -ENOMEM;
m = mpi_read_raw_from_sgl(req->src, req->src_len);
if (!m)
@@ -136,12 +130,6 @@ static int rsa_dec(struct akcipher_request *req)
goto err_free_m;
}
 
-   if (req->dst_len < mpi_get_size(pkey->n)) {
-   req->dst_len = mpi_get_size(pkey->n);
-   ret = -EOVERFLOW;
-   goto err_free_m;
-   }
-
ret = -ENOMEM;
c = mpi_read_raw_from_sgl(req->src, req->src_len);
if (!c)
@@ -180,12 +168,6 @@ static int rsa_sign(struct akcipher_request *req)
goto err_free_s;
}
 
-   if (req->dst_len < mpi_get_size(pkey->n)) {
-   req->dst_len = mpi_get_size(pkey->n);
-   ret = -EOVERFLOW;
-   goto err_free_s;
-   }
-
ret = -ENOMEM;
m = mpi_read_raw_from_sgl(req->src, req->src_len);
if (!m)
@@ -225,12 +207,6 @@ static int rsa_verify(struct akcipher_request *req)
goto err_free_m;
}
 
-   if (req->dst_len < mpi_get_size(pkey->n)) {
-   req->dst_len = mpi_get_size(pkey->n);
-   ret = -EOVERFLOW;
-   goto err_free_m;
-   }
-
ret = -ENOMEM;
s = mpi_read_raw_from_sgl(req->src, req->src_len);
if (!s) {
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] crypto: add asynchronous compression support

2015-11-19 Thread Joonsoo Kim
Hello, Herbert.

> -Original Message-
> From: Herbert Xu [mailto:herb...@gondor.apana.org.au]
> Sent: Thursday, November 19, 2015 6:43 PM
> To: Li, Weigang
> Cc: linux-crypto@vger.kernel.org; Struk, Tadeusz; Joonsoo Kim; Sergey
> Senozhatsky
> Subject: Re: [PATCH] crypto: add asynchronous compression support
> 
> On Thu, Nov 19, 2015 at 05:52:41AM +, Li, Weigang wrote:
> >
> > After sync-up with Joonsoo Kim, we think it may be not feasible for a
> s/w implementation of the sg-list based asynchronous interface, we propose
> separate interfaces (patches) for acomp & ccomp. The reasons are:
> > 1. to support sg-list in the ccomp (like what shash/ahash did), the
> partial update is required, some algorithms do not support partial update
> (i.e., lzo), that means:
> 
> No this is not true.  For the ones that don't support partial
> updates you can always linearise the input and then feed it in
> as one chunk.  Because the overall interface you're proposing
> does not allow partial updates the underlying implementation
> doesn't need to do it either.  Only linearisation is necessary.

Linearization would be enough to use sg-list but it has a problem.
Linearization needs sleepable function such as vmap() and it makes
sync (de)compression in atomic context impossible. Currently, zram
did sync (de)compression in atomic context. It uses map_vm_area() which
isn't sleepable function to linearize two separate pages. This is possible
because zram already knows that maximum number of spread pages is just two
and have allocated vm area in advance. But, if we implement linearization
in general API, we can't be sure of request input size so we need
sleepable function, vmap().

And, this sleep could degrade performance.

> > 2. the format of output buffer (sg-list) will be different, e.g., the
> lzo need contain the "length" info for each block in the output sg-list in
> order to de-compression, while zlib doesn't need, then it is difficult to
> have a single async sg-list i/f.
> 
> I have no idea what you mean here.  Please explain.
> 
> > 3. to compress a sg-list buffer, the lzo also requires an intermediate
> buffer to save the output of a block, and copy it back to the sg-list
> output buffer, it will introduce the complexity and cost, we don't see
> value for sg-list support in a s/w compression.
> 
> Such an intermediate buffer would only  be needed if the SG list is
> actually non-linear.  So I don't see this as an issue.

Intermediate buffer size could vary greatly so it would be allocated and
Freed whenever requested. This could affect performance.

I think that supporting unified API has more loss than gain.
I'm not expert on this area so please let me know what I missed.

Thanks.

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] crypto: add asynchronous compression support

2015-11-19 Thread Herbert Xu
On Fri, Nov 20, 2015 at 03:04:47PM +0900, Joonsoo Kim wrote:
>
> Linearization would be enough to use sg-list but it has a problem.
> Linearization needs sleepable function such as vmap() and it makes
> sync (de)compression in atomic context impossible. Currently, zram
> did sync (de)compression in atomic context. It uses map_vm_area() which
> isn't sleepable function to linearize two separate pages. This is possible
> because zram already knows that maximum number of spread pages is just two
> and have allocated vm area in advance. But, if we implement linearization
> in general API, we can't be sure of request input size so we need
> sleepable function, vmap().
> 
> And, this sleep could degrade performance.

Obviously you would only perform linearisation where it's needed.
And if you are in an atomic context, then clearly linearisation
can only be attempted using kmalloc/alloc_pages with GFP_ATOMIC.

I don't understand your concern with zram because either zram is
already feeding us linear buffers in which case linearisation is
never needed.  Or it can only be used with algorithms that support
SG lists or partial updates, which we can easily mark using a flag.

> Intermediate buffer size could vary greatly so it would be allocated and
> Freed whenever requested. This could affect performance.

That's for the crypto user to figure out.  Either they should
supply a completely linear buffer if they want to be able to
support every algorithm in an efficient manner, or they will
either have to eat the cost of linearisation or only use algorithms
that can deal with SG lists efficiently.

We have the same problem with network drivers and it's dealt with
in exactly the same way.  An skb can be an SG list and will be
linearised when necessary.

> I think that supporting unified API has more loss than gain.

I disagree.  I have seen no valid reason so far for adding two
compression interfaces.

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] crypto: add asynchronous compression support

2015-11-19 Thread Li, Weigang

On 11/20/2015 2:19 PM, Herbert Xu wrote:

On Fri, Nov 20, 2015 at 03:04:47PM +0900, Joonsoo Kim wrote:


Linearization would be enough to use sg-list but it has a problem.
Linearization needs sleepable function such as vmap() and it makes
sync (de)compression in atomic context impossible. Currently, zram
did sync (de)compression in atomic context. It uses map_vm_area() which
isn't sleepable function to linearize two separate pages. This is possible
because zram already knows that maximum number of spread pages is just two
and have allocated vm area in advance. But, if we implement linearization
in general API, we can't be sure of request input size so we need
sleepable function, vmap().

And, this sleep could degrade performance.


Obviously you would only perform linearisation where it's needed.
And if you are in an atomic context, then clearly linearisation
can only be attempted using kmalloc/alloc_pages with GFP_ATOMIC.

I don't understand your concern with zram because either zram is
already feeding us linear buffers in which case linearisation is
never needed.  Or it can only be used with algorithms that support
SG lists or partial updates, which we can easily mark using a flag.


Intermediate buffer size could vary greatly so it would be allocated and
Freed whenever requested. This could affect performance.


That's for the crypto user to figure out.  Either they should
supply a completely linear buffer if they want to be able to
support every algorithm in an efficient manner, or they will
either have to eat the cost of linearisation or only use algorithms
that can deal with SG lists efficiently.

We have the same problem with network drivers and it's dealt with
in exactly the same way.  An skb can be an SG list and will be
linearised when necessary.


I think that supporting unified API has more loss than gain.


I disagree.  I have seen no valid reason so far for adding two
compression interfaces.

Cheers,


Thanks Herbert.

If we assume the sg-list can be linearized - no "holes" in the sg-list, 
all chunks in middle of the list are of PAGE_SIZE, it seems ok to 
support sg-list in the s/w implementation, linearize the sg-list and 
compress it as one chunk.

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 0/2] fix a possible NULL dereference

2015-11-19 Thread LABBE Corentin
Hello

The main goal of this patch series is to fix a possible NULL dereference.
Even if the probability of this case is very low, fixing it made
static analyzers happy.
In the same time it permits to remove a "cast that drop const qualifiers.

Regards

Changes since v1
- Use of_device_get_match_data
- Add the missing patch for constify atmel_nand_caps structures

LABBE Corentin (2):
  mtd: nand: atmel_nand: constify atmel_nand_caps structures
  mtd: nand: atmel_nand: fix a possible NULL dereference

 drivers/mtd/nand/atmel_nand.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

-- 
2.4.10

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html