Re: [PATCH 0/3] padata cpu awareness fixes

2017-10-02 Thread Mathias Krause
On 12 September 2017 at 11:07, Steffen Klassert
<steffen.klass...@secunet.com> wrote:
> On Fri, Sep 08, 2017 at 08:57:08PM +0200, Mathias Krause wrote:
>> Hi Steffen, Herbert,
>>
>> this series solves multiple issues of padata I ran into while trying to
>> make use of pcrypt for IPsec.
>>
>> The first issue is that the reorder timer callback may run on a CPU that
>> isn't part of the parallel set, as it runs on the CPU where the timer
>> interrupt gets handled. As a result, padata_reorder() may run on a CPU
>> with uninitialized per-cpu parallel queues, so the __this_cpu_read() in
>> padata_get_next() refers to an uninitialized cpu_index. However, as
>> per-cpu memory gets memset(0) on allocation time, it'll read 0. If the
>> next CPU index we're waiting for is 0 too, the compare will succeed and
>> we'll return with -ENODATA, making padata_reorder() think there's
>> nothing to do, stop any pending timer and return immediately. This is
>> wrong as the pending timer might have been the one to trigger the needed
>> reordering, but, as it was canceled, will never happen -- if no other
>> parallel requests arrive afterwards.
>>
>> That issue is addressed with the first two patches, ensuring we're not
>> using a bogus cpu_index value for the compare and thereby not wrongly
>> cancel a pending timer. The second patch then ensures the reorder timer
>> callback runs on the correct CPU or, at least, on a CPU from the
>> parallel set to generate forward progress.
>>
>> The third patch addresses a similar issues for the serialization
>> callback. We implicitly require padata_do_serial() to be called on the
>> very same CPU padata_do_parallel() was called on to ensure the correct
>> ordering of requests -- and correct functioning of padata, even. If the
>> algorithm we're parallelizing is asynchronous itself, that invariant
>> might not be true, as, e.g. crypto hardware may execute the callback on
>> the CPU its interrupt gets handled on which isn't necessarily the one
>> the request got issued on.
>>
>> To handle that issue we track the CPU we're supposed to handle the
>> request on and ensure we'll call padata_reorder() on that CPU when
>> padata_do_serial() gets called -- either by already running on the
>> corect CPU or by deferring the call to a worker.
>>
>> Please apply!
>>
>> Mathias Krause (3):
>>   padata: set cpu_index of unused CPUs to -1
>>   padata: ensure the reorder timer callback runs on the correct CPU
>>   padata: ensure padata_do_serial() runs on the correct CPU
>
> Looks good, thanks!
>
> Acked-by: Steffen Klassert <steffen.klass...@secunet.com>

Ping! Herbert, will these patches go through your tree or Steffen's?


[PATCH 3/3] padata: ensure padata_do_serial() runs on the correct CPU

2017-09-08 Thread Mathias Krause
If the algorithm we're parallelizing is asynchronous we might change
CPUs between padata_do_parallel() and padata_do_serial(). However, we
don't expect this to happen as we need to enqueue the padata object into
the per-cpu reorder queue we took it from, i.e. the same-cpu's parallel
queue.

Ensure we're not switching CPUs for a given padata object by tracking
the CPU within the padata object. If the serial callback gets called on
the wrong CPU, defer invoking padata_reorder() via a kernel worker on
the CPU we're expected to run on.

Signed-off-by: Mathias Krause <mini...@googlemail.com>
---
 include/linux/padata.h |2 ++
 kernel/padata.c|   20 +++-
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/include/linux/padata.h b/include/linux/padata.h
index 5c0175bbc179..5d13d25da2c8 100644
--- a/include/linux/padata.h
+++ b/include/linux/padata.h
@@ -37,6 +37,7 @@
  * @list: List entry, to attach to the padata lists.
  * @pd: Pointer to the internal control structure.
  * @cb_cpu: Callback cpu for serializatioon.
+ * @cpu: Cpu for parallelization.
  * @seq_nr: Sequence number of the parallelized data object.
  * @info: Used to pass information from the parallel to the serial function.
  * @parallel: Parallel execution function.
@@ -46,6 +47,7 @@ struct padata_priv {
struct list_headlist;
struct parallel_data*pd;
int cb_cpu;
+   int cpu;
int info;
void(*parallel)(struct padata_priv *padata);
void(*serial)(struct padata_priv *padata);
diff --git a/kernel/padata.c b/kernel/padata.c
index b4066147bce4..f262c9a4e70a 100644
--- a/kernel/padata.c
+++ b/kernel/padata.c
@@ -131,6 +131,7 @@ int padata_do_parallel(struct padata_instance *pinst,
padata->cb_cpu = cb_cpu;
 
target_cpu = padata_cpu_hash(pd);
+   padata->cpu = target_cpu;
queue = per_cpu_ptr(pd->pqueue, target_cpu);
 
spin_lock(>parallel.lock);
@@ -363,10 +364,21 @@ void padata_do_serial(struct padata_priv *padata)
int cpu;
struct padata_parallel_queue *pqueue;
struct parallel_data *pd;
+   int reorder_via_wq = 0;
 
pd = padata->pd;
 
cpu = get_cpu();
+
+   /* We need to run on the same CPU padata_do_parallel(.., padata, ..)
+* was called on -- or, at least, enqueue the padata object into the
+* correct per-cpu queue.
+*/
+   if (cpu != padata->cpu) {
+   reorder_via_wq = 1;
+   cpu = padata->cpu;
+   }
+
pqueue = per_cpu_ptr(pd->pqueue, cpu);
 
spin_lock(>reorder.lock);
@@ -376,7 +388,13 @@ void padata_do_serial(struct padata_priv *padata)
 
put_cpu();
 
-   padata_reorder(pd);
+   /* If we're running on the wrong CPU, call padata_reorder() via a
+* kernel worker.
+*/
+   if (reorder_via_wq)
+   queue_work_on(cpu, pd->pinst->wq, >reorder_work);
+   else
+   padata_reorder(pd);
 }
 EXPORT_SYMBOL(padata_do_serial);
 
-- 
1.7.10.4



[PATCH 2/3] padata: ensure the reorder timer callback runs on the correct CPU

2017-09-08 Thread Mathias Krause
The reorder timer function runs on the CPU where the timer interrupt was
handled which is not necessarily one of the CPUs of the 'pcpu' CPU mask
set.

Ensure the padata_reorder() callback runs on the correct CPU, which is
one in the 'pcpu' CPU mask set and, preferrably, the next expected one.
Do so by comparing the current CPU with the expected target CPU. If they
match, call padata_reorder() right away. If they differ, schedule a work
item on the target CPU that does the padata_reorder() call for us.

Signed-off-by: Mathias Krause <mini...@googlemail.com>
---
 include/linux/padata.h |2 ++
 kernel/padata.c|   43 ++-
 2 files changed, 44 insertions(+), 1 deletion(-)

diff --git a/include/linux/padata.h b/include/linux/padata.h
index 2f9c1f93b1ce..5c0175bbc179 100644
--- a/include/linux/padata.h
+++ b/include/linux/padata.h
@@ -85,6 +85,7 @@ struct padata_serial_queue {
  * @swork: work struct for serialization.
  * @pd: Backpointer to the internal control structure.
  * @work: work struct for parallelization.
+ * @reorder_work: work struct for reordering.
  * @num_obj: Number of objects that are processed by this cpu.
  * @cpu_index: Index of the cpu.
  */
@@ -93,6 +94,7 @@ struct padata_parallel_queue {
struct padata_listreorder;
struct parallel_data *pd;
struct work_structwork;
+   struct work_structreorder_work;
atomic_t  num_obj;
int   cpu_index;
 };
diff --git a/kernel/padata.c b/kernel/padata.c
index 1b9b4bac4a9b..b4066147bce4 100644
--- a/kernel/padata.c
+++ b/kernel/padata.c
@@ -275,11 +275,51 @@ static void padata_reorder(struct parallel_data *pd)
return;
 }
 
+static void invoke_padata_reorder(struct work_struct *work)
+{
+   struct padata_parallel_queue *pqueue;
+   struct parallel_data *pd;
+
+   local_bh_disable();
+   pqueue = container_of(work, struct padata_parallel_queue, reorder_work);
+   pd = pqueue->pd;
+   padata_reorder(pd);
+   local_bh_enable();
+}
+
 static void padata_reorder_timer(unsigned long arg)
 {
struct parallel_data *pd = (struct parallel_data *)arg;
+   unsigned int weight;
+   int target_cpu, cpu;
 
-   padata_reorder(pd);
+   cpu = get_cpu();
+
+   /* We don't lock pd here to not interfere with parallel processing
+* padata_reorder() calls on other CPUs. We just need any CPU out of
+* the cpumask.pcpu set. It would be nice if it's the right one but
+* it doesn't matter if we're off to the next one by using an outdated
+* pd->processed value.
+*/
+   weight = cpumask_weight(pd->cpumask.pcpu);
+   target_cpu = padata_index_to_cpu(pd, pd->processed % weight);
+
+   /* ensure to call the reorder callback on the correct CPU */
+   if (cpu != target_cpu) {
+   struct padata_parallel_queue *pqueue;
+   struct padata_instance *pinst;
+
+   /* The timer function is serialized wrt itself -- no locking
+* needed.
+*/
+   pinst = pd->pinst;
+   pqueue = per_cpu_ptr(pd->pqueue, target_cpu);
+   queue_work_on(target_cpu, pinst->wq, >reorder_work);
+   } else {
+   padata_reorder(pd);
+   }
+
+   put_cpu();
 }
 
 static void padata_serial_worker(struct work_struct *serial_work)
@@ -399,6 +439,7 @@ static void padata_init_pqueues(struct parallel_data *pd)
__padata_list_init(>reorder);
__padata_list_init(>parallel);
INIT_WORK(>work, padata_parallel_worker);
+   INIT_WORK(>reorder_work, invoke_padata_reorder);
atomic_set(>num_obj, 0);
}
 }
-- 
1.7.10.4



[PATCH 1/3] padata: set cpu_index of unused CPUs to -1

2017-09-08 Thread Mathias Krause
The parallel queue per-cpu data structure gets initialized only for CPUs
in the 'pcpu' CPU mask set. This is not sufficient as the reorder timer
may run on a different CPU and might wrongly decide it's the target CPU
for the next reorder item as per-cpu memory gets memset(0) and we might
be waiting for the first CPU in cpumask.pcpu, i.e. cpu_index 0.

Make the '__this_cpu_read(pd->pqueue->cpu_index) == next_queue->cpu_index'
compare in padata_get_next() fail in this case by initializing the
cpu_index member of all per-cpu parallel queues. Use -1 for unused ones.

Signed-off-by: Mathias Krause <mini...@googlemail.com>
---
 kernel/padata.c |8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/kernel/padata.c b/kernel/padata.c
index 868f947166d7..1b9b4bac4a9b 100644
--- a/kernel/padata.c
+++ b/kernel/padata.c
@@ -384,8 +384,14 @@ static void padata_init_pqueues(struct parallel_data *pd)
struct padata_parallel_queue *pqueue;
 
cpu_index = 0;
-   for_each_cpu(cpu, pd->cpumask.pcpu) {
+   for_each_possible_cpu(cpu) {
pqueue = per_cpu_ptr(pd->pqueue, cpu);
+
+   if (!cpumask_test_cpu(cpu, pd->cpumask.pcpu)) {
+   pqueue->cpu_index = -1;
+   continue;
+   }
+
pqueue->pd = pd;
pqueue->cpu_index = cpu_index;
cpu_index++;
-- 
1.7.10.4



[PATCH 0/3] padata cpu awareness fixes

2017-09-08 Thread Mathias Krause
Hi Steffen, Herbert,

this series solves multiple issues of padata I ran into while trying to
make use of pcrypt for IPsec.

The first issue is that the reorder timer callback may run on a CPU that
isn't part of the parallel set, as it runs on the CPU where the timer
interrupt gets handled. As a result, padata_reorder() may run on a CPU
with uninitialized per-cpu parallel queues, so the __this_cpu_read() in
padata_get_next() refers to an uninitialized cpu_index. However, as
per-cpu memory gets memset(0) on allocation time, it'll read 0. If the
next CPU index we're waiting for is 0 too, the compare will succeed and
we'll return with -ENODATA, making padata_reorder() think there's
nothing to do, stop any pending timer and return immediately. This is
wrong as the pending timer might have been the one to trigger the needed
reordering, but, as it was canceled, will never happen -- if no other
parallel requests arrive afterwards.

That issue is addressed with the first two patches, ensuring we're not
using a bogus cpu_index value for the compare and thereby not wrongly
cancel a pending timer. The second patch then ensures the reorder timer
callback runs on the correct CPU or, at least, on a CPU from the
parallel set to generate forward progress.

The third patch addresses a similar issues for the serialization
callback. We implicitly require padata_do_serial() to be called on the
very same CPU padata_do_parallel() was called on to ensure the correct
ordering of requests -- and correct functioning of padata, even. If the
algorithm we're parallelizing is asynchronous itself, that invariant
might not be true, as, e.g. crypto hardware may execute the callback on
the CPU its interrupt gets handled on which isn't necessarily the one
the request got issued on.

To handle that issue we track the CPU we're supposed to handle the
request on and ensure we'll call padata_reorder() on that CPU when
padata_do_serial() gets called -- either by already running on the
corect CPU or by deferring the call to a worker. 

Please apply!

Mathias Krause (3):
  padata: set cpu_index of unused CPUs to -1
  padata: ensure the reorder timer callback runs on the correct CPU
  padata: ensure padata_do_serial() runs on the correct CPU

 include/linux/padata.h |4 +++
 kernel/padata.c|   71 ++--
 2 files changed, 72 insertions(+), 3 deletions(-)

-- 
1.7.10.4



echainiv not working as supposed to be?

2016-09-06 Thread Mathias Krause
Hi Herbert,

I'm a little bit confused on how echainiv is supposed to work. I think
it's doing a few things wrongly, I just can't decide what's actually
wrong as the intended mode of operation is unclear to me. At least,
as-is, the code isn't quite operating as advertised in the comment at
the top of the file.

So, in hope you could enlighten me, here's my understanding of the
current _implementation_ of echainiv (assuming aligned req->iv, for
simplicity):

For each request it XORs the salt into the provided IV (req->iv). For
ESP the provided IV is the sequence number or, at least, parts of it.
The thus modified IV gets written into the *destination* buffer
(req->dst) which might be a different buffer than the source buffer
(not in the ESP case, though, as it passes the same sg for src and
dst). Afterwards, the per-cpu IV gets copied over into req->iv,
effectively overwriting the generated XORed value. Then the inner
crypto request gets initiated which may finish synchronously or
asynchronously. Either way, the per-cpu IV gets updated with the new
value from subreq->iv, which should be equal to req->iv in the normal
case.

Things that are unclear to me:

1/ Why is the XORed IV written to the destination buffer and not the
source buffer? Isn't the sub-request supposed to use the IV from
either req->src or req->iv -- and especially *not* from req->dst?

2/ Why does the XORed IV gets overwritten with the per-cpu IV prior
passing it on to the sub-request, making all possible IV locations in
req->src, req->dst and req->iv *all* be different?

Moreover, the behaviour of 2/ actually leads to the fact, that the
very first IV on each CPU will be a null IV -- not a salted sequence
number. All following IVs will be the result of the (or better "some")
sub-request's IV update, stored into the per-cpu variable -- still no
salted sequence numbers, though.

That behaviour is a bug, IMHO, as this clearly differs from what is
described in the comment. However, I'm uncertain on how to fix it, as
the intended mode of operation is unclear to me... Should only the
first IV of each crypto transform be the salted sequence number and
all following the result of the sub-request's IV update, therefore not
stored in a single per-cpu variable but some echainiv context specific
one?

3/ Why is echainiv using per-cpu IVs in the first place? Shouldn't
those be crypto context specific? Currently we're happily mixing IVs
from different transforms (with possibly different IV sizes!) with
each other via the per-cpu variables. That might be an "issue" if one
echainiv user can maliciously mess with the IVs of another user, e.g.
via AF_ALG.

So, should echainiv use a per-context per-cpu array instead that --
for simplicity -- gets initialised with random bytes and will be
updated as it's now, just not with a single global per-cpu array, but
a per-context one?

That would allow us to still have a lock-less IV generator but one
that cannot be accidentally / maliciously be tampered with by other
echainiv users.


Thanks,
Mathias
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] crypto: user - re-add size check for CRYPTO_MSG_GETALG

2016-06-22 Thread Mathias Krause
On 22 June 2016 at 21:03, Stephan Mueller <smuel...@chronox.de> wrote:
> Am Mittwoch, 22. Juni 2016, 20:29:37 schrieb Mathias Krause:
>
> Hi Mathias,
>
>> Commit 9aa867e46565 ("crypto: user - Add CRYPTO_MSG_DELRNG")
>> accidentally removed the minimum size check for CRYPTO_MSG_GETALG
>> netlink messages. This allows userland to send a truncated
>> CRYPTO_MSG_GETALG message as short as a netlink header only making
>> crypto_report() operate on uninitialized memory by accessing data
>> beyond the end of the netlink message.
>>
>> Fix this be re-adding the minimum required size of CRYPTO_MSG_GETALG
>> messages to the crypto_msg_min[] array.
>
> I was playing with the adding of the GETALG value as you did to track down the
> issue fixed with eed1e1afd8d542d9644534c1b712599b5d680007 ("crypto: user - no
> parsing of CRYPTO_MSG_GETALG") in the cryptodev-2.6 tree.

Oh, I haven't noticed this commit. :D Just looked at Linus' master and
crypto-2.6/master.

> It did not occur to me that it fixes another bug. Yet, with this addition, it
> would be possible to revert the patch eed1e1afd8d542d9644534c1b712599b5d680007
> as your patch fixes the issue too. But my fix can also stay as it does not
> hurt either.

Well, it does. Commit eed1e1afd8d542d9644534c1b712599b5d680007 is
really just a workaround for the underlying issue. It does not fix the
bug of the missing minimal size check for CRYPTO_MSG_GETALG, it just
disables any further size checks for CRYPTO_MSG_GETALG. Putting my
patch on top of yours will still not fix the issue of the missing
minimal size check as your patch explicitly excludes CRYPTO_MSG_GETALG
from any size checks. So it needs to be reverted, sorry :/

> What is your take on that?

I'd say to revert eed1e1afd8d542d9644534c1b712599b5d680007 and fix the
issue for real with my patch in crypto-2.6/master, i.e. the upcoming
v4.7. That adds back the size check so userland can't play tricks with
us.

The patch already contains the Cc to stable, so the fix will
eventually end up in the LTS kernel releases used by distros, so this
regression should be fixed in a few weeks.


Regards,
Mathias
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] crypto: user - re-add size check for CRYPTO_MSG_GETALG

2016-06-22 Thread Mathias Krause
Commit 9aa867e46565 ("crypto: user - Add CRYPTO_MSG_DELRNG")
accidentally removed the minimum size check for CRYPTO_MSG_GETALG
netlink messages. This allows userland to send a truncated
CRYPTO_MSG_GETALG message as short as a netlink header only making
crypto_report() operate on uninitialized memory by accessing data
beyond the end of the netlink message.

Fix this be re-adding the minimum required size of CRYPTO_MSG_GETALG
messages to the crypto_msg_min[] array.

Fixes: 9aa867e46565 ("crypto: user - Add CRYPTO_MSG_DELRNG")
Cc: sta...@vger.kernel.org  # v4.2
Signed-off-by: Mathias Krause <mini...@googlemail.com>
Cc: Steffen Klassert <steffen.klass...@secunet.com>
---
This should go on top of crypto-2.6/master.

 crypto/crypto_user.c |1 +
 1 file changed, 1 insertion(+)

diff --git a/crypto/crypto_user.c b/crypto/crypto_user.c
index 43fe85f20d57..7097a3395b25 100644
--- a/crypto/crypto_user.c
+++ b/crypto/crypto_user.c
@@ -455,6 +455,7 @@ static const int crypto_msg_min[CRYPTO_NR_MSGTYPES] = {
[CRYPTO_MSG_NEWALG  - CRYPTO_MSG_BASE] = MSGSIZE(crypto_user_alg),
[CRYPTO_MSG_DELALG  - CRYPTO_MSG_BASE] = MSGSIZE(crypto_user_alg),
[CRYPTO_MSG_UPDATEALG   - CRYPTO_MSG_BASE] = MSGSIZE(crypto_user_alg),
+   [CRYPTO_MSG_GETALG  - CRYPTO_MSG_BASE] = MSGSIZE(crypto_user_alg),
[CRYPTO_MSG_DELRNG  - CRYPTO_MSG_BASE] = 0,
 };
 
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] crypto: user - lock crypto_alg_list on alg dump

2016-02-01 Thread Mathias Krause
We miss to take the crypto_alg_sem semaphore when traversing the
crypto_alg_list for CRYPTO_MSG_GETALG dumps. This allows a race with
crypto_unregister_alg() removing algorithms from the list while we're
still traversing it, thereby leading to a use-after-free as show below:

[ 3482.071639] general protection fault:  [#1] SMP
[ 3482.075639] Modules linked in: aes_x86_64 glue_helper lrw ablk_helper cryptd 
gf128mul ipv6 pcspkr serio_raw virtio_net microcode virtio_pci virtio_ring 
virtio sr_mod cdrom [last unloaded: aesni_intel]
[ 3482.075639] CPU: 1 PID: 11065 Comm: crconf Not tainted 4.3.4-grsec+ #126
[ 3482.075639] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.7.5-20140531_083030-gandalf 04/01/2014
[ 3482.075639] task: 88001cd41a40 ti: 88001cd422c8 task.ti: 
88001cd422c8
[ 3482.075639] RIP: 0010:[]  [] 
strncpy+0x13/0x30
[ 3482.075639] RSP: 0018:88001f713b60  EFLAGS: 00010202
[ 3482.075639] RAX: 88001f6c4430 RBX: 88001f6c43a0 RCX: 88001f6c4430
[ 3482.075639] RDX: 0040 RSI: fefefefefefeff16 RDI: 88001f6c4430
[ 3482.075639] RBP: 88001f713b60 R08: 88001f6c4470 R09: 88001f6c4480
[ 3482.075639] R10: 0002 R11: 0246 R12: 88001ce2aa28
[ 3482.075639] R13: 88093700 R14: 88001f5e4bf8 R15: 3b20
[ 3482.075639] FS:  033826fa2700() GS:88001e90() 
knlGS:
[ 3482.075639] CS:  0010 DS:  ES:  CR0: 80050033
[ 3482.075639] CR2: ff600400 CR3: 139ec000 CR4: 001606f0
[ 3482.075639] Stack:
[ 3482.075639]  88001f713bd8 936ccd00 88001e5c4200 
88093700
[ 3482.075639]  88001f713bd0 938ef4bf  
3b20
[ 3482.075639]  88001f5e4bf8 88001f5e4848  
3b20
[ 3482.075639] Call Trace:
[ 3482.075639]  [] crypto_report_alg+0xc0/0x3e0
[ 3482.075639]  [] ? __alloc_skb+0x16f/0x300
[ 3482.075639]  [] crypto_dump_report+0x6a/0x90
[ 3482.075639]  [] netlink_dump+0x147/0x2e0
[ 3482.075639]  [] __netlink_dump_start+0x159/0x190
[ 3482.075639]  [] crypto_user_rcv_msg+0xc3/0x130
[ 3482.075639]  [] ? crypto_report_alg+0x3e0/0x3e0
[ 3482.075639]  [] ? alg_test_crc32c+0x120/0x120
[ 3482.075639]  [] ? __netlink_lookup+0xd5/0x120
[ 3482.075639]  [] ? crypto_add_alg+0x1d0/0x1d0
[ 3482.075639]  [] netlink_rcv_skb+0xe1/0x130
[ 3482.075639]  [] crypto_netlink_rcv+0x28/0x40
[ 3482.075639]  [] netlink_unicast+0x108/0x180
[ 3482.075639]  [] netlink_sendmsg+0x541/0x770
[ 3482.075639]  [] sock_sendmsg+0x21/0x40
[ 3482.075639]  [] SyS_sendto+0xf3/0x130
[ 3482.075639]  [] ? bad_area_nosemaphore+0x13/0x20
[ 3482.075639]  [] ? __do_page_fault+0x80/0x3a0
[ 3482.075639]  [] entry_SYSCALL_64_fastpath+0x12/0x6e
[ 3482.075639] Code: 88 4a ff 75 ed 5d 48 0f ba 2c 24 3f c3 66 66 2e 0f 1f 84 
00 00 00 00 00 55 48 85 d2 48 89 f8 48 89 f9 4c 8d 04 17 48 89 e5 74 15 <0f> b6 
16 80 fa 01 88 11 48 83 de ff 48 83 c1 01 4c 39 c1 75 eb
[ 3482.075639] RIP  [] strncpy+0x13/0x30

To trigger the race run the following loops simultaneously for a while:
  $ while : ; do modprobe aesni-intel; rmmod aesni-intel; done
  $ while : ; do crconf show all > /dev/null; done

Fix the race by taking the crypto_alg_sem read lock, thereby preventing
crypto_unregister_alg() from modifying the algorithm list during the
dump.

This bug has been detected by the PaX memory sanitize feature.

Signed-off-by: Mathias Krause <mini...@googlemail.com>
Cc: Steffen Klassert <steffen.klass...@secunet.com>
Cc: PaX Team <pagee...@freemail.hu>
---
 crypto/crypto_user.c |6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/crypto/crypto_user.c b/crypto/crypto_user.c
index 237f3795cfaa..43fe85f20d57 100644
--- a/crypto/crypto_user.c
+++ b/crypto/crypto_user.c
@@ -499,6 +499,7 @@ static int crypto_user_rcv_msg(struct sk_buff *skb, struct 
nlmsghdr *nlh)
if (link->dump == NULL)
return -EINVAL;
 
+   down_read(_alg_sem);
list_for_each_entry(alg, _alg_list, cra_list)
dump_alloc += CRYPTO_REPORT_MAXSIZE;
 
@@ -508,8 +509,11 @@ static int crypto_user_rcv_msg(struct sk_buff *skb, struct 
nlmsghdr *nlh)
.done = link->done,
.min_dump_alloc = dump_alloc,
};
-   return netlink_dump_start(crypto_nlsk, skb, nlh, );
+   err = netlink_dump_start(crypto_nlsk, skb, nlh, );
}
+   up_read(_alg_sem);
+
+   return err;
}
 
err = nlmsg_parse(nlh, crypto_msg_min[type], attrs, CRYPTOCFGA_MAX,
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BISECTED] 4943ba16 (include crypto- module prefix) breaks wifi

2015-02-17 Thread Mathias Krause
On 18 February 2015 at 07:34, George Spelvin li...@horizon.com wrote:

 It's trying to load modules named:

 crypto-ccm(aes)
 crypto-ccm(aes)-all
 crypto-ctr(aes)
 crypto-ctr(aes)-all

 depmod -n doesn't show any aliases with parens in their names,

That's okay. Also that it fails to load these as it'll fall back
trying to load modules for the templates in that case, as can be seen
in your log:

 Wed Feb 18 06:58:10 UTC 2015 /sbin/modprobe -q -- crypto-ctr

What's curious, however, that it only tries to load the template for
ctr, not for ccm. :/
Are the logs complete? Could you please simplify your /sbin/x/modprobe
wrapper to just output the modprobe call, as Kees suggested?

Also, could you please provide the output of depmod -n | grep
crypto-? There should be lines for crypto-ccm and crypto-ctr if you
build them as modules.


Mathias
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BISECTED] 4943ba16 (include crypto- module prefix) breaks wifi

2015-02-16 Thread Mathias Krause
On 17 February 2015 at 04:09, George Spelvin li...@horizon.com wrote:
 I discovered when (belatedly) testing 3.19-rc7 the other week that
 my laptop wifi was broken and would no longer associate.

 Apparently this is causing some necessary crypto algorithms to fail to
 load, breaking my wifi.

 Perhaps I'm displaying my ignorance of what's supposed to happen,
 but shouldn't make install have installed some files with names like
 /lib/modules/`uname r`/kernel/crypto/crypto-*.ko?

No, the module names do not change. They just got another module alias
with the crypto- prefix.

 Or is it something only I'm hitting because I have a lot of common
 crypto algorithms statically compiled into my kernel?

 CONFIG_CRYPTO_CBC=y
 CONFIG_CRYPTO_HMAC=y
 CONFIG_CRYPTO_MD5=y
 CONFIG_CRYPTO_SHA1=y
 CONFIG_CRYPTO_AES=y
 CONFIG_CRYPTO_AES_586=y
 CONFIG_CRYPTO_ARC4=y

 Trying this on kernel 4943ba16 produces instead an endless loop of:

 wlan1: SME: Trying to authenticate with aa:bb:cc:dd:ee:ff (SSID='FOO' 
 freq=24xx MHz)
 wlan1: Trying to associate with aa:bb:cc:dd:ee:ff (SSID='FOO' freq=24xx MHz)
 wlan1: Associated with aa:bb:cc:dd:ee:ff
 wlan1: WPA: Failed to set PTK to the driver (alg=3 keylen=16 
 bssid=aa:bb:cc:dd:ee:ff)
 wlan1: CTRL-EVENT-DISCONNECTED bssid=aa:bb:cc:dd:ee:ff reason=1


 The kernel logs are not particularly informative.

 They just go through the usual successful series, but end with

 wlan1: RxAssocResp from aa:bb:cc:dd:ee:ff (capab=0x431 status=0 aid=1)
 wlan1: associated
 wlan1: deauthenticating from 11:bb:cc:dd:ee:ff by local choice (Reason: 
 1=UNSPECIFIED)

 While successful connection ends before that last line.

Commit 4943ba16bbc2 was incomplete and could have caused regressions
like the above. Those should have been fixed with commits 4943ba16bbc2
+ 3e14dcf7cb80. However, those should be in v3.19-rc7 already, so I'm
not much of a help here :(


Mathias
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 6/6] crypto: x86/des3_ede - drop bogus module aliases

2015-01-11 Thread Mathias Krause
This module implements variations of des3_ede only. Drop the bogus
module aliases for des.

Cc: Jussi Kivilinna jussi.kivili...@iki.fi
Signed-off-by: Mathias Krause mini...@googlemail.com
---
 arch/x86/crypto/des3_ede_glue.c |2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/x86/crypto/des3_ede_glue.c b/arch/x86/crypto/des3_ede_glue.c
index 38a14f818ef1..d6fc59df 100644
--- a/arch/x86/crypto/des3_ede_glue.c
+++ b/arch/x86/crypto/des3_ede_glue.c
@@ -504,6 +504,4 @@ MODULE_LICENSE(GPL);
 MODULE_DESCRIPTION(Triple DES EDE Cipher Algorithm, asm optimized);
 MODULE_ALIAS_CRYPTO(des3_ede);
 MODULE_ALIAS_CRYPTO(des3_ede-asm);
-MODULE_ALIAS_CRYPTO(des);
-MODULE_ALIAS_CRYPTO(des-asm);
 MODULE_AUTHOR(Jussi Kivilinna jussi.kivili...@iki.fi);
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/6] crypto: sparc64/md5 - fix module description

2015-01-11 Thread Mathias Krause
MD5 is not SHA1.

Cc: David S. Miller da...@davemloft.net
Signed-off-by: Mathias Krause mini...@googlemail.com
---
 arch/sparc/crypto/md5_glue.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/sparc/crypto/md5_glue.c b/arch/sparc/crypto/md5_glue.c
index 64c7ff5f72a9..b688731d7ede 100644
--- a/arch/sparc/crypto/md5_glue.c
+++ b/arch/sparc/crypto/md5_glue.c
@@ -183,7 +183,7 @@ module_init(md5_sparc64_mod_init);
 module_exit(md5_sparc64_mod_fini);
 
 MODULE_LICENSE(GPL);
-MODULE_DESCRIPTION(MD5 Secure Hash Algorithm, sparc64 md5 opcode 
accelerated);
+MODULE_DESCRIPTION(MD5 Message Digest Algorithm, sparc64 md5 opcode 
accelerated);
 
 MODULE_ALIAS_CRYPTO(md5);
 
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/6] crypto: sparc64/camellia - fix module alias

2015-01-11 Thread Mathias Krause
The module alias should be camellia, not aes.

Cc: David S. Miller da...@davemloft.net
Signed-off-by: Mathias Krause mini...@googlemail.com
---
 arch/sparc/crypto/camellia_glue.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/sparc/crypto/camellia_glue.c 
b/arch/sparc/crypto/camellia_glue.c
index 641f55cb61c3..6bf2479a12fb 100644
--- a/arch/sparc/crypto/camellia_glue.c
+++ b/arch/sparc/crypto/camellia_glue.c
@@ -322,6 +322,6 @@ module_exit(camellia_sparc64_mod_fini);
 MODULE_LICENSE(GPL);
 MODULE_DESCRIPTION(Camellia Cipher Algorithm, sparc64 camellia opcode 
accelerated);
 
-MODULE_ALIAS_CRYPTO(aes);
+MODULE_ALIAS_CRYPTO(camellia);
 
 #include crop_devid.c
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/6] crypto: add missing crypto module aliases

2015-01-11 Thread Mathias Krause
Commit 5d26a105b5a7 (crypto: prefix module autoloading with crypto-)
changed the automatic module loading when requesting crypto algorithms
to prefix all module requests with crypto-. This requires all crypto
modules to have a crypto specific module alias even if their file name
would otherwise match the requested crypto algorithm.

Even though commit 5d26a105b5a7 added those aliases for a vast amount of
modules, it was missing a few. Add the required MODULE_ALIAS_CRYPTO
annotations to those files to make them get loaded automatically, again.
This fixes, e.g., requesting 'ecb(blowfish-generic)', which used to work
with kernels v3.18 and below.

Also change MODULE_ALIAS() lines to MODULE_ALIAS_CRYPTO(). The former
won't work for crypto modules any more.

Fixes: 5d26a105b5a7 (crypto: prefix module autoloading with crypto-)
Cc: Kees Cook keesc...@chromium.org
Signed-off-by: Mathias Krause mini...@googlemail.com
---
 arch/powerpc/crypto/sha1.c   |1 +
 arch/x86/crypto/sha-mb/sha1_mb.c |2 +-
 crypto/aes_generic.c |1 +
 crypto/ansi_cprng.c  |1 +
 crypto/blowfish_generic.c|1 +
 crypto/camellia_generic.c|1 +
 crypto/cast5_generic.c   |1 +
 crypto/cast6_generic.c   |1 +
 crypto/crc32c_generic.c  |1 +
 crypto/crct10dif_generic.c   |1 +
 crypto/des_generic.c |7 ---
 crypto/ghash-generic.c   |1 +
 crypto/krng.c|1 +
 crypto/salsa20_generic.c |1 +
 crypto/serpent_generic.c |1 +
 crypto/sha1_generic.c|1 +
 crypto/sha256_generic.c  |2 ++
 crypto/sha512_generic.c  |2 ++
 crypto/tea.c |1 +
 crypto/tgr192.c  |1 +
 crypto/twofish_generic.c |1 +
 crypto/wp512.c   |1 +
 22 files changed, 27 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/crypto/sha1.c b/arch/powerpc/crypto/sha1.c
index d3feba5a275f..c154cebc1041 100644
--- a/arch/powerpc/crypto/sha1.c
+++ b/arch/powerpc/crypto/sha1.c
@@ -154,4 +154,5 @@ module_exit(sha1_powerpc_mod_fini);
 MODULE_LICENSE(GPL);
 MODULE_DESCRIPTION(SHA1 Secure Hash Algorithm);
 
+MODULE_ALIAS_CRYPTO(sha1);
 MODULE_ALIAS_CRYPTO(sha1-powerpc);
diff --git a/arch/x86/crypto/sha-mb/sha1_mb.c b/arch/x86/crypto/sha-mb/sha1_mb.c
index a225a5ca1037..fd9f6b035b16 100644
--- a/arch/x86/crypto/sha-mb/sha1_mb.c
+++ b/arch/x86/crypto/sha-mb/sha1_mb.c
@@ -931,4 +931,4 @@ module_exit(sha1_mb_mod_fini);
 MODULE_LICENSE(GPL);
 MODULE_DESCRIPTION(SHA1 Secure Hash Algorithm, multi buffer accelerated);
 
-MODULE_ALIAS(sha1);
+MODULE_ALIAS_CRYPTO(sha1);
diff --git a/crypto/aes_generic.c b/crypto/aes_generic.c
index 9b3c54c1cbe8..3dd101144a58 100644
--- a/crypto/aes_generic.c
+++ b/crypto/aes_generic.c
@@ -1475,3 +1475,4 @@ module_exit(aes_fini);
 MODULE_DESCRIPTION(Rijndael (AES) Cipher Algorithm);
 MODULE_LICENSE(Dual BSD/GPL);
 MODULE_ALIAS_CRYPTO(aes);
+MODULE_ALIAS_CRYPTO(aes-generic);
diff --git a/crypto/ansi_cprng.c b/crypto/ansi_cprng.c
index b4485a108389..6f5bebc9bf01 100644
--- a/crypto/ansi_cprng.c
+++ b/crypto/ansi_cprng.c
@@ -477,3 +477,4 @@ MODULE_PARM_DESC(dbg, Boolean to enable debugging (0/1 == 
off/on));
 module_init(prng_mod_init);
 module_exit(prng_mod_fini);
 MODULE_ALIAS_CRYPTO(stdrng);
+MODULE_ALIAS_CRYPTO(ansi_cprng);
diff --git a/crypto/blowfish_generic.c b/crypto/blowfish_generic.c
index 7bd71f02d0dd..87b392a77a93 100644
--- a/crypto/blowfish_generic.c
+++ b/crypto/blowfish_generic.c
@@ -139,3 +139,4 @@ module_exit(blowfish_mod_fini);
 MODULE_LICENSE(GPL);
 MODULE_DESCRIPTION(Blowfish Cipher Algorithm);
 MODULE_ALIAS_CRYPTO(blowfish);
+MODULE_ALIAS_CRYPTO(blowfish-generic);
diff --git a/crypto/camellia_generic.c b/crypto/camellia_generic.c
index 1b74c5a3e891..a02286bf319e 100644
--- a/crypto/camellia_generic.c
+++ b/crypto/camellia_generic.c
@@ -1099,3 +1099,4 @@ module_exit(camellia_fini);
 MODULE_DESCRIPTION(Camellia Cipher Algorithm);
 MODULE_LICENSE(GPL);
 MODULE_ALIAS_CRYPTO(camellia);
+MODULE_ALIAS_CRYPTO(camellia-generic);
diff --git a/crypto/cast5_generic.c b/crypto/cast5_generic.c
index 84c86db67ec7..df5c72629383 100644
--- a/crypto/cast5_generic.c
+++ b/crypto/cast5_generic.c
@@ -550,3 +550,4 @@ module_exit(cast5_mod_fini);
 MODULE_LICENSE(GPL);
 MODULE_DESCRIPTION(Cast5 Cipher Algorithm);
 MODULE_ALIAS_CRYPTO(cast5);
+MODULE_ALIAS_CRYPTO(cast5-generic);
diff --git a/crypto/cast6_generic.c b/crypto/cast6_generic.c
index f408f0bd8de2..058c8d755d03 100644
--- a/crypto/cast6_generic.c
+++ b/crypto/cast6_generic.c
@@ -292,3 +292,4 @@ module_exit(cast6_mod_fini);
 MODULE_LICENSE(GPL);
 MODULE_DESCRIPTION(Cast6 Cipher Algorithm);
 MODULE_ALIAS_CRYPTO(cast6);
+MODULE_ALIAS_CRYPTO(cast6-generic);
diff --git a/crypto/crc32c_generic.c b/crypto/crc32c_generic.c
index 2a062025749d..06f1b60f02b2 100644
--- a/crypto/crc32c_generic.c
+++ b/crypto/crc32c_generic.c
@@ -171,4 +171,5

[PATCH] crypto: aesni - fix by8 variant for 128 bit keys

2014-12-30 Thread Mathias Krause
The by8 counter mode optimization is broken for 128 bit keys with
input data longer than 128 bytes. It uses the wrong key material for
en- and decryption.

The key registers xkey0, xkey4, xkey8 and xkey12 need to be preserved
in case we're handling more than 128 bytes of input data -- they won't
get reloaded after the initial load. They must therefore be (a) loaded
on the first iteration and (b) be preserved for the latter ones. The
implementation for 128 bit keys does not comply with (a) nor (b).

Fix this by bringing the implementation back to its original source
and correctly load the key registers and preserve their values by
*not* re-using the registers for other purposes.

Kudos to James for reporting the issue and providing a test case
showing the discrepancies.

Reported-by: James Yonan ja...@openvpn.net
Cc: Chandramouli Narayanan mo...@linux.intel.com
Cc: sta...@vger.kernel.org # v3.18
Signed-off-by: Mathias Krause mini...@googlemail.com
---

James, this should fix the issue for you, too -- it did for me, using
your test case. Feel free to add your 'Tested-by' if it does.

This patch should go on top of crypto-2.6.git#master as it's a fix
that needs to go to stable as well.

 arch/x86/crypto/aes_ctrby8_avx-x86_64.S |   46 +++
 1 file changed, 35 insertions(+), 11 deletions(-)

diff --git a/arch/x86/crypto/aes_ctrby8_avx-x86_64.S 
b/arch/x86/crypto/aes_ctrby8_avx-x86_64.S
index 2df2a0298f5a..a916c4a61165 100644
--- a/arch/x86/crypto/aes_ctrby8_avx-x86_64.S
+++ b/arch/x86/crypto/aes_ctrby8_avx-x86_64.S
@@ -208,7 +208,7 @@ ddq_add_8:
 
.if (klen == KEY_128)
.if (load_keys)
-   vmovdqa 3*16(p_keys), xkeyA
+   vmovdqa 3*16(p_keys), xkey4
.endif
.else
vmovdqa 3*16(p_keys), xkeyA
@@ -224,7 +224,7 @@ ddq_add_8:
add $(16*by), p_in
 
.if (klen == KEY_128)
-   vmovdqa 4*16(p_keys), xkey4
+   vmovdqa 4*16(p_keys), xkeyB
.else
.if (load_keys)
vmovdqa 4*16(p_keys), xkey4
@@ -234,7 +234,12 @@ ddq_add_8:
.set i, 0
.rept by
club XDATA, i
-   vaesenc xkeyA, var_xdata, var_xdata /* key 3 */
+   /* key 3 */
+   .if (klen == KEY_128)
+   vaesenc xkey4, var_xdata, var_xdata
+   .else
+   vaesenc xkeyA, var_xdata, var_xdata
+   .endif
.set i, (i +1)
.endr
 
@@ -243,13 +248,18 @@ ddq_add_8:
.set i, 0
.rept by
club XDATA, i
-   vaesenc xkey4, var_xdata, var_xdata /* key 4 */
+   /* key 4 */
+   .if (klen == KEY_128)
+   vaesenc xkeyB, var_xdata, var_xdata
+   .else
+   vaesenc xkey4, var_xdata, var_xdata
+   .endif
.set i, (i +1)
.endr
 
.if (klen == KEY_128)
.if (load_keys)
-   vmovdqa 6*16(p_keys), xkeyB
+   vmovdqa 6*16(p_keys), xkey8
.endif
.else
vmovdqa 6*16(p_keys), xkeyB
@@ -267,12 +277,17 @@ ddq_add_8:
.set i, 0
.rept by
club XDATA, i
-   vaesenc xkeyB, var_xdata, var_xdata /* key 6 */
+   /* key 6 */
+   .if (klen == KEY_128)
+   vaesenc xkey8, var_xdata, var_xdata
+   .else
+   vaesenc xkeyB, var_xdata, var_xdata
+   .endif
.set i, (i +1)
.endr
 
.if (klen == KEY_128)
-   vmovdqa 8*16(p_keys), xkey8
+   vmovdqa 8*16(p_keys), xkeyB
.else
.if (load_keys)
vmovdqa 8*16(p_keys), xkey8
@@ -288,7 +303,7 @@ ddq_add_8:
 
.if (klen == KEY_128)
.if (load_keys)
-   vmovdqa 9*16(p_keys), xkeyA
+   vmovdqa 9*16(p_keys), xkey12
.endif
.else
vmovdqa 9*16(p_keys), xkeyA
@@ -297,7 +312,12 @@ ddq_add_8:
.set i, 0
.rept by
club XDATA, i
-   vaesenc xkey8, var_xdata, var_xdata /* key 8 */
+   /* key 8 */
+   .if (klen == KEY_128)
+   vaesenc xkeyB, var_xdata, var_xdata
+   .else
+   vaesenc xkey8, var_xdata, var_xdata
+   .endif
.set i, (i +1)
.endr
 
@@ -306,7 +326,12 @@ ddq_add_8:
.set i, 0
.rept by
club XDATA, i
-   vaesenc xkeyA, var_xdata, var_xdata /* key 9 */
+   /* key 9 */
+   .if (klen == KEY_128)
+   vaesenc xkey12, var_xdata, var_xdata
+   .else

Re: [PATCH] crypto: aesni - disable by8 AVX CTR optimization

2014-12-14 Thread Mathias Krause
Hi James,

On 11 December 2014 at 09:52, James Yonan ja...@openvpn.net wrote:
 I'm seeing some anomalous results with the by8 AVX CTR optimization in
 3.18.

the patch you're replying to actually *disabled* the by8 variant for
v3.17 as it had another bug related to wrong counter handling in GCM.
The fix for that particular issue only made it to v3.18, so the code
got re-enabled only for v3.18. But it looks like that there's yet
another bug :/

 In particular, crypto_aead_encrypt appears to produce different ciphertext
 from the same plaintext depending on whether or not the optimization is
 enabled.

 See the attached patch to tcrypt that demonstrates the discrepancy.

I can reproduce your observations, so I can confirm the difference,
when using the by8 variant compared to other AES implementations.
When applying this very patch (commit 7da4b29d496b (crypto: aesni -
disable by8 AVX CTR optimization)) -- the patch that disables the
by8 variant -- on top of v3.18 the discrepancies are gone. So the
behavior is bound to the by8 optimization, only.
As it was Chandramouli, who contributed the code, maybe he has a clue
what's wrong here. Chandramouli?


Mathias


 James
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] crypto: include crypto- module prefix in template

2014-11-24 Thread Mathias Krause
On 24 November 2014 at 20:17, Kees Cook keesc...@chromium.org wrote:
 This adds the module loading prefix crypto- to the template lookup
 as well.

 For example, attempting to load 'vfat(blowfish)' via AF_ALG now correctly
 includes the crypto- prefix at every level, correctly rejecting vfat:

 net-pf-38
 algif-hash
 crypto-vfat(blowfish)
 crypto-vfat(blowfish)-all
 crypto-vfat

 Reported-by: Mathias Krause mini...@googlemail.com
 Signed-off-by: Kees Cook keesc...@chromium.org
 ---
  crypto/algapi.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

That commit will break the user API again as ciphers like 'cbc(aes)'
won't work any more -- as the cbc module won't be loaded.
You're missing the MODULE_ALIAS_CRYPTO() annotaions for all the crypto
templates -- cbc, ctr, xts, hmac, ...


Regards,
Mathias
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] crypto: prefix module autoloading with crypto-

2014-11-18 Thread Mathias Krause
On 18 November 2014 01:45, Kees Cook keesc...@chromium.org wrote:
 On Mon, Nov 17, 2014 at 3:20 PM, Mathias Krause mini...@googlemail.com 
 wrote:
 On 17 November 2014 21:02, Kees Cook keesc...@chromium.org wrote:
 This prefixes all crypto module loading with crypto- so we never run
 the risk of exposing module auto-loading to userspace via a crypto API,
 as demonstrated by Mathias Krause:

 https://lkml.org/lkml/2013/3/4/70

 Signed-off-by: Kees Cook keesc...@chromium.org
 ---
 v2:
  - added missing #include, thanks to minipli
  - built with allmodconfig
 [...snip...]
 diff --git a/include/linux/crypto.h b/include/linux/crypto.h
 index d45e949699ea..d14230f6e977 100644
 --- a/include/linux/crypto.h
 +++ b/include/linux/crypto.h
 @@ -26,6 +26,13 @@
  #include linux/uaccess.h

  /*
 + * Autoloaded crypto modules should only use a prefixed name to avoid 
 allowing
 + * arbitrary modules to be loaded.
 + */
 +#define MODULE_ALIAS_CRYPTO(name)  \
 +   MODULE_ALIAS(crypto- name)

 This would break userland relying on the old aliases, e.g. 'modprobe
 aes' no longer works.

 Why not have both aliases, one with the crypto- prefix for on-demand
 loading within the crypto API and one without for manual loading from
 userland? E.g., something like this:

 #define MODULE_ALIAS_CRYPTO(name)  \
MODULE_ALIAS(name); \
MODULE_ALIAS(crypto- name)

 That would prevent the userland breakage and still achieves the goal
 of restricting the request_module() call offered by the means of the
 AF_ALG API.

 That was my intention originally, and I should go back to it. The
 trouble is with the use of __UNIQUE_ID in the MODULE_ALIAS macro. It
 uses __LINE__ to produce the id, so the suggested macro expansion
 (which is what I started with) won't work on non-gcc compilers.

 I haven't found any solutions for C89 version of gcc's __COUNTER__,
 and I haven't found any C89 ways to force a macro to be expanded as
 being multi-line.

Well, clang should support it as well, according to [1]. But still, a
compiler independent solution would be nice.
Anyway, the __COUNTER__ support is gcc = 4.3 only. So, according to
Documentation/Changes, stating gcc 3.2 is the minimum supported
version for compiling the kernel, this would be a no-go, too.

[1] http://clang.llvm.org/docs/LanguageExtensions.html#builtin-macros


 I'd like to avoid having to open-code both MODULE_ALIAS and
 MODULE_ALIAS_CRYPTO in each module's source.

 Anyone see some sneaky way to accomplish this?

Unfortunately, I do not ... beside this, maybe:

#define MODULE_ALIAS_CRYPTO(name)  \
   __MODULE_INFO(alias, alias_userland, name); \
   __MODULE_INFO(alias, alias_crypto, crypto- name)

Looks ugly, but works. ;)


Regards,
Mathias
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] crypto: prefix module autoloading with crypto-

2014-11-17 Thread Mathias Krause
On 17 November 2014 16:09, Herbert Xu herb...@gondor.apana.org.au wrote:
 On Fri, Nov 14, 2014 at 01:34:59PM -0800, Kees Cook wrote:
 This prefixes all crypto module loading with crypto- so we never run
 the risk of exposing module auto-loading to userspace via a crypto API,
 as demonstrated by Mathias Krause:

 https://lkml.org/lkml/2013/3/4/70

 Signed-off-by: Kees Cook keesc...@chromium.org

 Sorry but this doesn't build for me:

   CC [M]  drivers/crypto/qat/qat_common/adf_ctl_drv.o
 drivers/crypto/qat/qat_common/adf_ctl_drv.c:490:21: error: expected 
 declaration specifiers or ‘...’ before string constant
 make[3]: *** [drivers/crypto/qat/qat_common/adf_ctl_drv.o] Error 1
 make[2]: *** [drivers/crypto/qat/qat_common] Error 2
 make[1]: *** [drivers/crypto/qat] Error 2
 make[1]: *** Waiting for unfinished jobs

Looks like drivers/crypto/qat/qat_common/adf_ctl_drv.c is missing
'#include linux/crypto.h'.

Mathias
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] crypto: prefix module autoloading with crypto-

2014-11-17 Thread Mathias Krause
On 17 November 2014 21:02, Kees Cook keesc...@chromium.org wrote:
 This prefixes all crypto module loading with crypto- so we never run
 the risk of exposing module auto-loading to userspace via a crypto API,
 as demonstrated by Mathias Krause:

 https://lkml.org/lkml/2013/3/4/70

 Signed-off-by: Kees Cook keesc...@chromium.org
 ---
 v2:
  - added missing #include, thanks to minipli
  - built with allmodconfig
 ---
  arch/arm/crypto/aes_glue.c  | 4 ++--
  arch/arm/crypto/sha1_glue.c | 2 +-
  arch/arm/crypto/sha1_neon_glue.c| 2 +-
  arch/arm/crypto/sha512_neon_glue.c  | 4 ++--
  arch/arm64/crypto/aes-ce-ccm-glue.c | 2 +-
  arch/arm64/crypto/aes-glue.c| 8 
  arch/powerpc/crypto/sha1.c  | 2 +-
  arch/s390/crypto/aes_s390.c | 2 +-
  arch/s390/crypto/des_s390.c | 4 ++--
  arch/s390/crypto/ghash_s390.c   | 2 +-
  arch/s390/crypto/sha1_s390.c| 2 +-
  arch/s390/crypto/sha256_s390.c  | 4 ++--
  arch/s390/crypto/sha512_s390.c  | 4 ++--
  arch/sparc/crypto/aes_glue.c| 2 +-
  arch/sparc/crypto/camellia_glue.c   | 2 +-
  arch/sparc/crypto/crc32c_glue.c | 2 +-
  arch/sparc/crypto/des_glue.c| 2 +-
  arch/sparc/crypto/md5_glue.c| 2 +-
  arch/sparc/crypto/sha1_glue.c   | 2 +-
  arch/sparc/crypto/sha256_glue.c | 4 ++--
  arch/sparc/crypto/sha512_glue.c | 4 ++--
  arch/x86/crypto/aes_glue.c  | 4 ++--
  arch/x86/crypto/aesni-intel_glue.c  | 2 +-
  arch/x86/crypto/blowfish_glue.c | 4 ++--
  arch/x86/crypto/camellia_aesni_avx2_glue.c  | 4 ++--
  arch/x86/crypto/camellia_aesni_avx_glue.c   | 4 ++--
  arch/x86/crypto/camellia_glue.c | 4 ++--
  arch/x86/crypto/cast5_avx_glue.c| 2 +-
  arch/x86/crypto/cast6_avx_glue.c| 2 +-
  arch/x86/crypto/crc32-pclmul_glue.c | 4 ++--
  arch/x86/crypto/crc32c-intel_glue.c | 4 ++--
  arch/x86/crypto/crct10dif-pclmul_glue.c | 4 ++--
  arch/x86/crypto/des3_ede_glue.c | 8 
  arch/x86/crypto/ghash-clmulni-intel_glue.c  | 2 +-
  arch/x86/crypto/salsa20_glue.c  | 4 ++--
  arch/x86/crypto/serpent_avx2_glue.c | 4 ++--
  arch/x86/crypto/serpent_avx_glue.c  | 2 +-
  arch/x86/crypto/serpent_sse2_glue.c | 2 +-
  arch/x86/crypto/sha1_ssse3_glue.c   | 2 +-
  arch/x86/crypto/sha256_ssse3_glue.c | 4 ++--
  arch/x86/crypto/sha512_ssse3_glue.c | 4 ++--
  arch/x86/crypto/twofish_avx_glue.c  | 2 +-
  arch/x86/crypto/twofish_glue.c  | 4 ++--
  arch/x86/crypto/twofish_glue_3way.c | 4 ++--
  crypto/842.c| 1 +
  crypto/aes_generic.c| 2 +-
  crypto/ansi_cprng.c | 2 +-
  crypto/anubis.c | 1 +
  crypto/api.c| 4 ++--
  crypto/arc4.c   | 1 +
  crypto/blowfish_generic.c   | 2 +-
  crypto/camellia_generic.c   | 2 +-
  crypto/cast5_generic.c  | 2 +-
  crypto/cast6_generic.c  | 2 +-
  crypto/ccm.c| 4 ++--
  crypto/crc32.c  | 1 +
  crypto/crc32c_generic.c | 2 +-
  crypto/crct10dif_generic.c  | 2 +-
  crypto/crypto_null.c| 6 +++---
  crypto/ctr.c| 2 +-
  crypto/deflate.c| 2 +-
  crypto/des_generic.c| 2 +-
  crypto/fcrypt.c | 1 +
  crypto/gcm.c| 6 +++---
  crypto/ghash-generic.c  | 2 +-
  crypto/khazad.c | 1 +
  crypto/krng.c   | 2 +-
  crypto/lz4.c| 1 +
  crypto/lz4hc.c  | 1 +
  crypto/lzo.c| 1 +
  crypto/md4.c| 2 +-
  crypto/md5.c| 1 +
  crypto/michael_mic.c| 1 +
  crypto/rmd128.c | 1 +
  crypto/rmd160.c | 1 +
  crypto/rmd256.c | 1 +
  crypto/rmd320.c | 1 +
  crypto/salsa20_generic.c| 2 +-
  crypto/seed.c   | 1 +
  crypto/serpent_generic.c| 4 ++--
  crypto/sha1_generic.c   | 2 +-
  crypto/sha256_generic.c | 4 ++--
  crypto/sha512_generic.c | 4 ++--
  crypto/tea.c| 4 ++--
  crypto/tgr192.c | 4 ++--
  crypto

[PATCH 0/3] crypto: aesni - fix and re-enable by8 CTR variant

2014-09-28 Thread Mathias Krause
This series fixes the counter overflow handling of the by8 CTR variant
which lead to failing cryptomgr tests and, in turn, disabling this
optimization with commit 7da4b29d496b.

Patch 1 fixes the bug, patch 2 removes some unused defines (left overs
from the unification of the initial source files) and patch 3 re-enables
the code.

The fix was tested by me, doing tcrypt and dm-crypt tests. It was also
tested by Romain who initially reported the issue.

The patches should go on top of crypto-2.6.git.

In case this doesn't get merged for v3.17, patches 1 and 3 may be cc'ed
to stable to propagate the fix.

Please apply!

Thanks,
Mathias


Mathias Krause (3):
  crypto: aesni - fix counter overflow handling in by8 variant
  crypto: aesni - remove unused defines in by8 variant
  Revert crypto: aesni - disable by8 AVX CTR optimization

 arch/x86/crypto/aes_ctrby8_avx-x86_64.S |   20 +++-
 arch/x86/crypto/aesni-intel_glue.c  |4 ++--
 2 files changed, 17 insertions(+), 7 deletions(-)

-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] Revert crypto: aesni - disable by8 AVX CTR optimization

2014-09-28 Thread Mathias Krause
This reverts commit 7da4b29d496b1389d3a29b55d3668efecaa08ebd.

Now, that the issue is fixed, we can re-enable the code.

Signed-off-by: Mathias Krause mini...@googlemail.com
Cc: Chandramouli Narayanan mo...@linux.intel.com
---
 arch/x86/crypto/aesni-intel_glue.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/crypto/aesni-intel_glue.c 
b/arch/x86/crypto/aesni-intel_glue.c
index a7ccd57f19e4..888950f29fd9 100644
--- a/arch/x86/crypto/aesni-intel_glue.c
+++ b/arch/x86/crypto/aesni-intel_glue.c
@@ -481,7 +481,7 @@ static void ctr_crypt_final(struct crypto_aes_ctx *ctx,
crypto_inc(ctrblk, AES_BLOCK_SIZE);
 }
 
-#if 0  /* temporary disabled due to failing crypto tests */
+#ifdef CONFIG_AS_AVX
 static void aesni_ctr_enc_avx_tfm(struct crypto_aes_ctx *ctx, u8 *out,
  const u8 *in, unsigned int len, u8 *iv)
 {
@@ -1522,7 +1522,7 @@ static int __init aesni_init(void)
aesni_gcm_dec_tfm = aesni_gcm_dec;
}
aesni_ctr_enc_tfm = aesni_ctr_enc;
-#if 0  /* temporary disabled due to failing crypto tests */
+#ifdef CONFIG_AS_AVX
if (cpu_has_avx) {
/* optimize performance of ctr mode encryption transform */
aesni_ctr_enc_tfm = aesni_ctr_enc_avx_tfm;
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] crypto: aesni - fix counter overflow handling in by8 variant

2014-09-28 Thread Mathias Krause
The by8 CTR AVX implementation fails to propperly handle counter
overflows. That was the reason it got disabled in commit 7da4b29d496b
(crypto: aesni - disable by8 AVX CTR optimization).

Fix the overflow handling by incrementing the counter block as a double
quad word, i.e. a 128 bit, and testing for overflows afterwards. We need
to use VPTEST to do so as VPADD* does not set the flags itself and
silently drops the carry bit.

As this change adds branches to the hot path, minor performance
regressions  might be a side effect. But, OTOH, we now have a conforming
implementation -- the preferable goal.

A tcrypt test on a SandyBridge system (i7-2620M) showed almost identical
numbers for the old and this version with differences within the noise
range. A dm-crypt test with the fixed version gave even slightly better
results for this version. So the performance impact might not be as big
as expected.

Tested-by: Romain Francoise rom...@orebokech.com
Signed-off-by: Mathias Krause mini...@googlemail.com
Cc: Chandramouli Narayanan mo...@linux.intel.com
---
 arch/x86/crypto/aes_ctrby8_avx-x86_64.S |   17 +++--
 1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/arch/x86/crypto/aes_ctrby8_avx-x86_64.S 
b/arch/x86/crypto/aes_ctrby8_avx-x86_64.S
index f091f122ed24..a029bc744244 100644
--- a/arch/x86/crypto/aes_ctrby8_avx-x86_64.S
+++ b/arch/x86/crypto/aes_ctrby8_avx-x86_64.S
@@ -108,6 +108,10 @@
 
 byteswap_const:
.octa 0x000102030405060708090A0B0C0D0E0F
+ddq_low_msk:
+   .octa 0x
+ddq_high_add_1:
+   .octa 0x0001
 ddq_add_1:
.octa 0x0001
 ddq_add_2:
@@ -169,7 +173,12 @@ ddq_add_8:
.rept (by - 1)
club DDQ_DATA, i
club XDATA, i
-   vpaddd  var_ddq_add(%rip), xcounter, var_xdata
+   vpaddq  var_ddq_add(%rip), xcounter, var_xdata
+   vptest  ddq_low_msk(%rip), var_xdata
+   jnz 1f
+   vpaddq  ddq_high_add_1(%rip), var_xdata, var_xdata
+   vpaddq  ddq_high_add_1(%rip), xcounter, xcounter
+   1:
vpshufb xbyteswap, var_xdata, var_xdata
.set i, (i +1)
.endr
@@ -178,7 +187,11 @@ ddq_add_8:
 
vpxor   xkey0, xdata0, xdata0
club DDQ_DATA, by
-   vpaddd  var_ddq_add(%rip), xcounter, xcounter
+   vpaddq  var_ddq_add(%rip), xcounter, xcounter
+   vptest  ddq_low_msk(%rip), xcounter
+   jnz 1f
+   vpaddq  ddq_high_add_1(%rip), xcounter, xcounter
+   1:
 
.set i, 1
.rept (by - 1)
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] crypto: aesni - remove unused defines in by8 variant

2014-09-28 Thread Mathias Krause
The defines for xkey3, xkey6 and xkey9 are not used in the code. They're
probably left overs from merging the three source files for 128, 192 and
256 bit AES. They can safely be removed.

Signed-off-by: Mathias Krause mini...@googlemail.com
Cc: Chandramouli Narayanan mo...@linux.intel.com
---
 arch/x86/crypto/aes_ctrby8_avx-x86_64.S |3 ---
 1 file changed, 3 deletions(-)

diff --git a/arch/x86/crypto/aes_ctrby8_avx-x86_64.S 
b/arch/x86/crypto/aes_ctrby8_avx-x86_64.S
index a029bc744244..2df2a0298f5a 100644
--- a/arch/x86/crypto/aes_ctrby8_avx-x86_64.S
+++ b/arch/x86/crypto/aes_ctrby8_avx-x86_64.S
@@ -79,9 +79,6 @@
 #define xcounter   %xmm8
 #define xbyteswap  %xmm9
 #define xkey0  %xmm10
-#define xkey3  %xmm11
-#define xkey6  %xmm12
-#define xkey9  %xmm13
 #define xkey4  %xmm11
 #define xkey8  %xmm12
 #define xkey12 %xmm13
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: v3.17-rc5: alg: skcipher: Test 4 failed on encryption for ctr-aes-aesni

2014-09-25 Thread Mathias Krause
On 25 September 2014 00:23, chandramouli narayanan
mo...@linux.intel.com wrote:
 On Mon, 2014-09-22 at 00:28 +0200, Mathias Krause wrote:
 What might be worth noting, the failing test uses an IV value of
 \xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFD,
 or, in other words, a counter value that'll wrap after processing
 three blocks. The Crypto++ implementation explicitly states, it can
 handle the wrap around (see [1]). Dumping the IV before and after the
 cryptomgr tests shows, the by8 implementation only handles the lower
 32 bit as a counter. Looking at RFC 3686, it defines the counter
 block as a 128 bit combination of nonce, IV and a 32 bit counter
 value. It also defines the initial value of the counter part (1) and
 how it should be incremented (increment the whole counter block, i.e.
 the 128 bit value). However, it also states that the maximum number
 blocks per packet are (2^32)-1. So, according to the RFC, the wrap
 around cannot happen -- not even for the 32 bit part of the counter
 block. However the other aesni-backed implementation does handle the
 wrap around just fine. It does so by doing the increment on a integer
 register so it can use the carry flag to detect the wrap around.
 Changing the by8 variant would be straight forward, but I fear
 performance implications... :/


 It will take some time for me to debug this issue. Is there a list of
 test vectors to debug with?

The failing test is aes_ctr_enc_tv_template[3] from crypto/testmgr.h.
It fails because of a wrong handling of counter overflows.

I'm already working on a patch that fixes the counter overflow issue.
It passes the cryptomgr test but I like to do some more thorough
tests. Especially some performance measurements as we now have
branches in the hot path.

I don't know if we should still rush fix this for v3.17 or delay this
for the next merge window. The offending code was already disabled in
Linus' tree and the fixes would be small enough to be backported if
there is interest.

Regards,
Mathias


 thanks
 -mouli
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] crypto: aesni - disable by8 AVX CTR optimization

2014-09-23 Thread Mathias Krause
The by8 implementation introduced in commit 22cddcc7df8f (crypto: aes
- AES CTR x86_64 by8 AVX optimization) is failing crypto tests as it
handles counter block overflows differently. It only accounts the right
most 32 bit as a counter -- not the whole block as all other
implementations do. This makes it fail the cryptomgr test #4 that
specifically tests this corner case.

As we're quite late in the release cycle, just disable the by8 variant
for now.

Reported-by: Romain Francoise rom...@orebokech.com
Signed-off-by: Mathias Krause mini...@googlemail.com
Cc: Chandramouli Narayanan mo...@linux.intel.com

---
I'll try to create a real fix next week but I can't promise much. If
Linus releases v3.17 early, as he has mentioned in his -rc6
announcement, we should hot fix this by just disabling the by8
variant. The real fix would add the necessary counter block handling to
the asm code and revert this commit afterwards. Reverting the whole code
is not necessary, imho.

Would that be okay for you, Herbert?
---
 arch/x86/crypto/aesni-intel_glue.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/crypto/aesni-intel_glue.c 
b/arch/x86/crypto/aesni-intel_glue.c
index 888950f29fd9..a7ccd57f19e4 100644
--- a/arch/x86/crypto/aesni-intel_glue.c
+++ b/arch/x86/crypto/aesni-intel_glue.c
@@ -481,7 +481,7 @@ static void ctr_crypt_final(struct crypto_aes_ctx *ctx,
crypto_inc(ctrblk, AES_BLOCK_SIZE);
 }
 
-#ifdef CONFIG_AS_AVX
+#if 0  /* temporary disabled due to failing crypto tests */
 static void aesni_ctr_enc_avx_tfm(struct crypto_aes_ctx *ctx, u8 *out,
  const u8 *in, unsigned int len, u8 *iv)
 {
@@ -1522,7 +1522,7 @@ static int __init aesni_init(void)
aesni_gcm_dec_tfm = aesni_gcm_dec;
}
aesni_ctr_enc_tfm = aesni_ctr_enc;
-#ifdef CONFIG_AS_AVX
+#if 0  /* temporary disabled due to failing crypto tests */
if (cpu_has_avx) {
/* optimize performance of ctr mode encryption transform */
aesni_ctr_enc_tfm = aesni_ctr_enc_avx_tfm;
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: v3.17-rc5: alg: skcipher: Test 4 failed on encryption for ctr-aes-aesni

2014-09-21 Thread Mathias Krause
On 17 September 2014 13:29, Herbert Xu herb...@gondor.apana.org.au wrote:
 On Tue, Sep 16, 2014 at 10:01:07PM +0200, Romain Francoise wrote:
 On Mon, Sep 15, 2014 at 10:36:46AM +0200, Romain Francoise wrote:
  I upgraded from v3.16 to v3.17-rc5 and the ctr-aes-aesni encryption test
  fails, which makes my IPsec tunnels unhappy (see trace below). Before I
  start bisecting (2cddcc7df8fd3 is probably my first guess), is this
  already known?

  Sep 15 08:07:56 silenus kernel: [   35.137145] alg: skcipher: Test 4 
  failed on encryption for ctr-aes-aesni [...]

 Update: reverting 2cddcc7df8 (crypto: aes - AES CTR x86_64 by8 AVX
 optimization) fixes this. This machine is Sandy Bridge (Core i7-2600),
 and the problem doesn't seem to occur with the exact same kernel image
 on Ivy Bridge (Xeon E3-1240v2).

 Thanks for bisecting.  If we can't fix this quickly then we should
 just revert it for now.


[Adding James and Sean as they're stated as contact information]

I compared the implementation against the original code from Intel
referenced in the source file and found a few differences. But even
after removing those, the code still generates the same error. So if
Chandramouli does not come up with something, we should revert it, as
the reference implementation from Intel is a) either wrong or b) used
wrongly in the Linux kernel.

What might be worth noting, the failing test uses an IV value of
\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFD,
or, in other words, a counter value that'll wrap after processing
three blocks. The Crypto++ implementation explicitly states, it can
handle the wrap around (see [1]). Dumping the IV before and after the
cryptomgr tests shows, the by8 implementation only handles the lower
32 bit as a counter. Looking at RFC 3686, it defines the counter
block as a 128 bit combination of nonce, IV and a 32 bit counter
value. It also defines the initial value of the counter part (1) and
how it should be incremented (increment the whole counter block, i.e.
the 128 bit value). However, it also states that the maximum number
blocks per packet are (2^32)-1. So, according to the RFC, the wrap
around cannot happen -- not even for the 32 bit part of the counter
block. However the other aesni-backed implementation does handle the
wrap around just fine. It does so by doing the increment on a integer
register so it can use the carry flag to detect the wrap around.
Changing the by8 variant would be straight forward, but I fear
performance implications... :/

Looking back at the test vector, even if it might be inappropriate for
IPsec, it is still valid for AES-CTR in the general case. So IMHO the
by8 implementation is wrong and should be fixed -- or reverted, for
that matter.

James, Sean: Have you observed any interoperability problems with the
Intel reference implementation for the AVX by8 variant of AES-CTR?
Especially, have you tested the code for wrapping counter values?


Regards,
Mathias

[1] http://www.cryptopp.com/wiki/CTR_Mode
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: v3.17-rc5: alg: skcipher: Test 4 failed on encryption for ctr-aes-aesni

2014-09-17 Thread Mathias Krause
On 16 September 2014 22:01, Romain Francoise rom...@orebokech.com wrote:
 On Mon, Sep 15, 2014 at 10:36:46AM +0200, Romain Francoise wrote:
 I upgraded from v3.16 to v3.17-rc5 and the ctr-aes-aesni encryption test
 fails, which makes my IPsec tunnels unhappy (see trace below). Before I
 start bisecting (2cddcc7df8fd3 is probably my first guess), is this
 already known?

 Sep 15 08:07:56 silenus kernel: [   35.137145] alg: skcipher: Test 4 failed 
 on encryption for ctr-aes-aesni [...]

I do not get the above message on a SandyBridge i7-2620M, even though
the module makes use of the by8 variant on my system, too:

[0.340626] AVX version of gcm_enc/dec engaged.
[0.340627] AES CTR mode by8 optimization enabled
[0.341273] alg: No test for __gcm-aes-aesni (__driver-gcm-aes-aesni)

 Update: reverting 2cddcc7df8 (crypto: aes - AES CTR x86_64 by8 AVX
 optimization) fixes this. This machine is Sandy Bridge (Core i7-2600),
 and the problem doesn't seem to occur with the exact same kernel image
 on Ivy Bridge (Xeon E3-1240v2).

Can you please provide the full kernel log and /proc/cpuinfo of those
machines? It would be interesting to know which variant was used on
those machines -- the new by8 or the old one.


Thanks,
Mathias
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: v3.17-rc5: alg: skcipher: Test 4 failed on encryption for ctr-aes-aesni

2014-09-17 Thread Mathias Krause
On 17 September 2014 22:10, Mathias Krause mini...@googlemail.com wrote:
 On 16 September 2014 22:01, Romain Francoise rom...@orebokech.com wrote:
 On Mon, Sep 15, 2014 at 10:36:46AM +0200, Romain Francoise wrote:
 I upgraded from v3.16 to v3.17-rc5 and the ctr-aes-aesni encryption test
 fails, which makes my IPsec tunnels unhappy (see trace below). Before I
 start bisecting (2cddcc7df8fd3 is probably my first guess), is this
 already known?

 Sep 15 08:07:56 silenus kernel: [   35.137145] alg: skcipher: Test 4 failed 
 on encryption for ctr-aes-aesni [...]

 I do not get the above message on a SandyBridge i7-2620M, even though
 the module makes use of the by8 variant on my system, too:


Never mind! Playing a little with crconf to instantiate 'ctr(aes)' and
I can see the failing test now too.

Thanks,
Mathias
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] x86, crypto: Check if gas supports CRC32

2014-07-30 Thread Mathias Krause
On Wed, Jul 30, 2014 at 06:47:01PM +0200, Borislav Petkov wrote:
 Building current kernel with some old toolchain (gcc 4.1.2 and gas 2.17)
 chokes with:
 
 arch/x86/crypto/crc32c-pcl-intel-asm_64.S: Assembler messages:
 arch/x86/crypto/crc32c-pcl-intel-asm_64.S:128: Error: no such instruction: 
 `crc32b %bl,%r8d'
 arch/x86/crypto/crc32c-pcl-intel-asm_64.S:204: Error: no such instruction: 
 `crc32q -i*8(%rcx),%r8'
 ...
 
 due to the fact that gas doesn't know the CRC32 instruction. Check that
 before building.
 
 Cc: linux-crypto@vger.kernel.org
 Signed-off-by: Borislav Petkov b...@suse.de
 ---
 
 So this is one possibility to address this. Code already
 has the ifdeffery around crc_pcl() which is implemented in
 crc32c-pcl-intel-asm_64.S so we can piggyback on that and not build that
 file if gas doesn't know CRC32.
 
 If no CRC32 support, it still builds fine silently, however it would be
 better to probably say that due to old toolchain, kernel doesn't include
 fast CRC32 stuff in crc32c-intel.ko.
 
 Hmmm.
 
 
  arch/x86/crypto/Makefile|  8 +++-
  arch/x86/crypto/crc32c-intel_glue.c |  7 ---
  scripts/gas-can-do-crc32.sh | 12 
  3 files changed, 23 insertions(+), 4 deletions(-)
  create mode 100755 scripts/gas-can-do-crc32.sh
 
 diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile
 index 61d6e281898b..707bf7ecb903 100644
 --- a/arch/x86/crypto/Makefile
 +++ b/arch/x86/crypto/Makefile
 @@ -83,7 +83,13 @@ ifeq ($(avx2_supported),yes)
  sha1-ssse3-y += sha1_avx2_x86_64_asm.o
  endif
  crc32c-intel-y := crc32c-intel_glue.o
 -crc32c-intel-$(CONFIG_64BIT) += crc32c-pcl-intel-asm_64.o
 +
 +gas_can_crc32 := $(srctree)/scripts/gas-can-do-crc32.sh
 +CONFIG_GAS_SUPPORTS_CRC32=$(shell $(CONFIG_SHELL) $(gas_can_crc32) $(AS))
 +

This looks too complicated. We do have as-instr for exactly those kind
of tests. And, in fact, looking at arch/x86/Makefile we already have one
for crc32:

  asinstr += $(call as-instr,crc32l %eax$(comma)%eax,-DCONFIG_AS_CRC32=1)

So you can just used CONFIG_AS_CRC32 for your tests and drop the shell
script.

 +ifdef CONFIG_64BIT
 +crc32c-intel-$(CONFIG_GAS_SUPPORTS_CRC32) += crc32c-pcl-intel-asm_64.o
 +endif

s/CONFIG_GAS_SUPPORTS_CRC32/CONFIG_AS_CRC32/ here and for all further
uses.

  crc32-pclmul-y := crc32-pclmul_asm.o crc32-pclmul_glue.o
  sha256-ssse3-y := sha256-ssse3-asm.o sha256-avx-asm.o sha256-avx2-asm.o 
 sha256_ssse3_glue.o
  sha512-ssse3-y := sha512-ssse3-asm.o sha512-avx-asm.o sha512-avx2-asm.o 
 sha512_ssse3_glue.o
 diff --git a/arch/x86/crypto/crc32c-intel_glue.c 
 b/arch/x86/crypto/crc32c-intel_glue.c
 index 6812ad98355c..4ce8e2db2984 100644
 --- a/arch/x86/crypto/crc32c-intel_glue.c
 +++ b/arch/x86/crypto/crc32c-intel_glue.c
 @@ -46,7 +46,7 @@
  #define REX_PRE
  #endif
  
 -#ifdef CONFIG_X86_64
 +#if defined(CONFIG_X86_64)  defined(CONFIG_GAS_SUPPORTS_CRC32)
  /*
   * use carryless multiply version of crc32c when buffer
   * size is = 512 (when eager fpu is enabled) or
 @@ -181,7 +181,7 @@ static int crc32c_intel_cra_init(struct crypto_tfm *tfm)
   return 0;
  }
  
 -#ifdef CONFIG_X86_64
 +#if defined(CONFIG_X86_64)  defined(CONFIG_GAS_SUPPORTS_CRC32)
  static int crc32c_pcl_intel_update(struct shash_desc *desc, const u8 *data,
  unsigned int len)
  {
 @@ -257,7 +257,8 @@ static int __init crc32c_intel_mod_init(void)
  {
   if (!x86_match_cpu(crc32c_cpu_id))
   return -ENODEV;
 -#ifdef CONFIG_X86_64
 +
 +#if defined(CONFIG_X86_64)  defined(CONFIG_GAS_SUPPORTS_CRC32)
   if (cpu_has_pclmulqdq) {
   alg.update = crc32c_pcl_intel_update;
   alg.finup = crc32c_pcl_intel_finup;
 diff --git a/scripts/gas-can-do-crc32.sh b/scripts/gas-can-do-crc32.sh
 new file mode 100755
 index ..88ff476bd3cc
 --- /dev/null
 +++ b/scripts/gas-can-do-crc32.sh
 @@ -0,0 +1,12 @@
 +#!/bin/sh
 +
 +TMP=$(mktemp)
 +
 +echo crc32 %rax,%rbx | $* --warn - -o $TMP 2/dev/null
 +if [ $? -eq 0 ]; then
 + echo y
 +else
 + echo n
 +fi
 +
 +rm -f $TMP
 -- 
 2.0.0
 

Regards,
Mathias
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] x86, crypto: Check if gas supports CRC32

2014-07-30 Thread Mathias Krause
On Wed, Jul 30, 2014 at 11:19:37PM +0200, Mathias Krause wrote:
 On Wed, Jul 30, 2014 at 06:47:01PM +0200, Borislav Petkov wrote:
  [...]
 
 This looks too complicated. We do have as-instr for exactly those kind
 of tests. And, in fact, looking at arch/x86/Makefile we already have one
 for crc32:
 
   asinstr += $(call as-instr,crc32l %eax$(comma)%eax,-DCONFIG_AS_CRC32=1)
 
 So you can just used CONFIG_AS_CRC32 for your tests and drop the shell
 script.
 
  +ifdef CONFIG_64BIT
  +crc32c-intel-$(CONFIG_GAS_SUPPORTS_CRC32) += crc32c-pcl-intel-asm_64.o
  +endif
 
 s/CONFIG_GAS_SUPPORTS_CRC32/CONFIG_AS_CRC32/ here and for all further
 uses.
 

Gah, CONFIG_AS_CRC32 gets defined as a preprocessor symbol only so
cannot be used in makefiles. So crc32c-pcl-intel-asm_64.S needs a
#ifdef CONFIG_AS_CRC32 guard and still be compiled for CONFIG_64BIT,
as it is now. It'll be an empty object for older binutils versions not
supporting the crc32 instruction.

Sorry for the confusion.


Regards,
Mathias
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] x86, crypto: Check if gas supports CRC32

2014-07-30 Thread Mathias Krause
On 31 July 2014 00:11, Borislav Petkov b...@alien8.de wrote:

 On Wed, Jul 30, 2014 at 11:28:14PM +0200, Mathias Krause wrote:
  Gah, CONFIG_AS_CRC32 gets defined as a preprocessor symbol only so
  cannot be used in makefiles. So crc32c-pcl-intel-asm_64.S needs a
  #ifdef CONFIG_AS_CRC32 guard and still be compiled for CONFIG_64BIT,
  as it is now. It'll be an empty object for older binutils versions not
  supporting the crc32 instruction.

 Yeah, that makes it all even simpler, thanks!

 We're still b0rked though:

 arch/x86/crypto/crct10dif-pcl-asm_64.S: Assembler messages:
 arch/x86/crypto/crct10dif-pcl-asm_64.S:147: Error: no such instruction: 
 `pclmulqdq $0x0,%xmm10,%xmm0'
 arch/x86/crypto/crct10dif-pcl-asm_64.S:148: Error: no such instruction: 
 `pclmulqdq $0x11,%xmm10,%xmm8'
 arch/x86/crypto/crct10dif-pcl-asm_64.S:149: Error: no such instruction: 
 `pclmulqdq $0x0,%xmm10,%xmm1'
 ...

 and need checking for more instructions. I'll play with this more
 tomorrow.


You probably can reuse the AVX test for this -- either the
CONFIG_AS_AVX preprocessor one or the $(avx_supported) make one, local
to arch/x86/crypto/Makefile.
Even though the CLMUL feature has not much to with AVX (it has a
dedicated CPUID feature bit), support for it in binutils was added
together with AVX support, see
https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=c0f3af977b0f28a0dc5a620110b8dcf9d8286f84

Regards,
Mathias

 Good night :-)

 --
 Regards/Gruss,
 Boris.

 Sent from a fat crate under my desk. Formatting is fine.
 --
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 1/1] crypto: AES CTR x86_64 by8 AVX optimization

2014-06-10 Thread Mathias Krause
On 9 June 2014 19:41, chandramouli narayanan mo...@linux.intel.com wrote:
 [...]
 @@ -1493,6 +1521,14 @@ static int __init aesni_init(void)
 aesni_gcm_enc_tfm = aesni_gcm_enc;
 aesni_gcm_dec_tfm = aesni_gcm_dec;
 }
 +   aesni_ctr_enc_tfm = aesni_ctr_enc;
 +#ifdef CONFIG_AS_AVX
 +   if (cpu_has_avx) {
 +   /* optimize performance of ctr mode encryption trasform */

The trasform typo is also still there. :/

 +   aesni_ctr_enc_tfm = aesni_ctr_enc_avx_tfm;
 +   pr_info(AES CTR mode by8 optimization enabled\n);
 +   }
 +#endif
  #endif

 err = crypto_fpu_init();

Mathias
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 1/1] crypto: AES CTR x86_64 by8 AVX optimization

2014-06-10 Thread Mathias Krause
)
 +
 +/*
 + * routine to do AES256 CTR enc/decrypt by8
 + * XMM registers are clobbered.
 + * Saving/restoring must be done at a higher level
 + * aes_ctr_enc_256_avx_by8(void *in, void *iv, void *keys, void *out,
 + * unsigned int num_bytes)
 + */
 +ENTRY(aes_ctr_enc_256_avx_by8)
 +   /* call the aes main loop */
 +   do_aes_ctrmain KEY_256
 +
 +ENDPROC(aes_ctr_enc_256_avx_by8)
 diff --git a/arch/x86/crypto/aesni-intel_glue.c 
 b/arch/x86/crypto/aesni-intel_glue.c
 index 948ad0e..888950f 100644
 --- a/arch/x86/crypto/aesni-intel_glue.c
 +++ b/arch/x86/crypto/aesni-intel_glue.c
 @@ -105,6 +105,9 @@ void crypto_fpu_exit(void);
  #define AVX_GEN4_OPTSIZE 4096

  #ifdef CONFIG_X86_64
 +
 +static void (*aesni_ctr_enc_tfm)(struct crypto_aes_ctx *ctx, u8 *out,
 + const u8 *in, unsigned int len, u8 *iv);
  asmlinkage void aesni_ctr_enc(struct crypto_aes_ctx *ctx, u8 *out,
   const u8 *in, unsigned int len, u8 *iv);

 @@ -155,6 +158,12 @@ asmlinkage void aesni_gcm_dec(void *ctx, u8 *out,


  #ifdef CONFIG_AS_AVX
 +asmlinkage void aes_ctr_enc_128_avx_by8(const u8 *in, u8 *iv,
 +   void *keys, u8 *out, unsigned int num_bytes);
 +asmlinkage void aes_ctr_enc_192_avx_by8(const u8 *in, u8 *iv,
 +   void *keys, u8 *out, unsigned int num_bytes);
 +asmlinkage void aes_ctr_enc_256_avx_by8(const u8 *in, u8 *iv,
 +   void *keys, u8 *out, unsigned int num_bytes);
  /*
   * asmlinkage void aesni_gcm_precomp_avx_gen2()
   * gcm_data *my_ctx_data, context data
 @@ -472,6 +481,25 @@ static void ctr_crypt_final(struct crypto_aes_ctx *ctx,
 crypto_inc(ctrblk, AES_BLOCK_SIZE);
  }

 +#ifdef CONFIG_AS_AVX
 +static void aesni_ctr_enc_avx_tfm(struct crypto_aes_ctx *ctx, u8 *out,
 + const u8 *in, unsigned int len, u8 *iv)
 +{
 +   /*
 +* based on key length, override with the by8 version
 +* of ctr mode encryption/decryption for improved performance
 +* aes_set_key_common() ensures that key length is one of
 +* {128,192,256}
 +*/
 +   if (ctx-key_length == AES_KEYSIZE_128)
 +   aes_ctr_enc_128_avx_by8(in, iv, (void *)ctx, out, len);
 +   else if (ctx-key_length == AES_KEYSIZE_192)
 +   aes_ctr_enc_192_avx_by8(in, iv, (void *)ctx, out, len);
 +   else
 +   aes_ctr_enc_256_avx_by8(in, iv, (void *)ctx, out, len);
 +}
 +#endif
 +
  static int ctr_crypt(struct blkcipher_desc *desc,
  struct scatterlist *dst, struct scatterlist *src,
  unsigned int nbytes)
 @@ -486,8 +514,8 @@ static int ctr_crypt(struct blkcipher_desc *desc,

 kernel_fpu_begin();
 while ((nbytes = walk.nbytes) = AES_BLOCK_SIZE) {
 -   aesni_ctr_enc(ctx, walk.dst.virt.addr, walk.src.virt.addr,
 - nbytes  AES_BLOCK_MASK, walk.iv);
 +   aesni_ctr_enc_tfm(ctx, walk.dst.virt.addr, walk.src.virt.addr,
 + nbytes  AES_BLOCK_MASK, walk.iv);
 nbytes = AES_BLOCK_SIZE - 1;
 err = blkcipher_walk_done(desc, walk, nbytes);
 }
 @@ -1493,6 +1521,14 @@ static int __init aesni_init(void)
 aesni_gcm_enc_tfm = aesni_gcm_enc;
 aesni_gcm_dec_tfm = aesni_gcm_dec;
 }
 +   aesni_ctr_enc_tfm = aesni_ctr_enc;
 +#ifdef CONFIG_AS_AVX
 +   if (cpu_has_avx) {
 +   /* optimize performance of ctr mode encryption transform */
 +   aesni_ctr_enc_tfm = aesni_ctr_enc_avx_tfm;
 +   pr_info(AES CTR mode by8 optimization enabled\n);
 +   }
 +#endif
  #endif

 err = crypto_fpu_init();
 --
 1.8.2.1



Patch is
Reviewed-by: Mathias Krause mini...@googlemail.com

Thanks, Chandramouli!

Regards,
Mathias
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 1/1] crypto: AES CTR x86_64 by8 AVX optimization

2014-06-07 Thread Mathias Krause
,
 +   const u8 *in, unsigned int len, u8 *iv);
  asmlinkage void aesni_ctr_enc(struct crypto_aes_ctx *ctx, u8 *out,
 const u8 *in, unsigned int len, u8 *iv);
  
 @@ -155,6 +158,12 @@ asmlinkage void aesni_gcm_dec(void *ctx, u8 *out,
  
 
  #ifdef CONFIG_AS_AVX
 +asmlinkage void aes_ctr_enc_128_avx_by8(const u8 *in, u8 *iv,
 + void *keys, u8 *out, unsigned int num_bytes);
 +asmlinkage void aes_ctr_enc_192_avx_by8(const u8 *in, u8 *iv,
 + void *keys, u8 *out, unsigned int num_bytes);
 +asmlinkage void aes_ctr_enc_256_avx_by8(const u8 *in, u8 *iv,
 + void *keys, u8 *out, unsigned int num_bytes);
  /*
   * asmlinkage void aesni_gcm_precomp_avx_gen2()
   * gcm_data *my_ctx_data, context data
 @@ -472,6 +481,25 @@ static void ctr_crypt_final(struct crypto_aes_ctx *ctx,
   crypto_inc(ctrblk, AES_BLOCK_SIZE);
  }
  
 +#ifdef CONFIG_AS_AVX
 +static void aesni_ctr_enc_avx_tfm(struct crypto_aes_ctx *ctx, u8 *out,
 +   const u8 *in, unsigned int len, u8 *iv)
 +{
 + /*
 +  * based on key length, override with the by8 version
 +  * of ctr mode encryption/decryption for improved performance
 +  * aes_set_key_common() ensures that key length is one of
 +  * {128,192,256}
 +  */
 + if (ctx-key_length == AES_KEYSIZE_128)
 + aes_ctr_enc_128_avx_by8(in, iv, (void *)ctx, out, len);
 + else if (ctx-key_length == AES_KEYSIZE_192)
 + aes_ctr_enc_192_avx_by8(in, iv, (void *)ctx, out, len);
 + else if (ctx-key_length == AES_KEYSIZE_256)

You can make that last one an unconditional else, because it *has* to
be that key size. Otherwise someone managed to tamper with the crypto
context and what is worse than unsing the wrong key size for the
encryption algorithm is doing no encryption at all.

 + aes_ctr_enc_256_avx_by8(in, iv, (void *)ctx, out, len);
 +}
 +#endif
 +
  static int ctr_crypt(struct blkcipher_desc *desc,
struct scatterlist *dst, struct scatterlist *src,
unsigned int nbytes)
 @@ -486,8 +514,8 @@ static int ctr_crypt(struct blkcipher_desc *desc,
  
   kernel_fpu_begin();
   while ((nbytes = walk.nbytes) = AES_BLOCK_SIZE) {
 - aesni_ctr_enc(ctx, walk.dst.virt.addr, walk.src.virt.addr,
 -   nbytes  AES_BLOCK_MASK, walk.iv);
 + aesni_ctr_enc_tfm(ctx, walk.dst.virt.addr, walk.src.virt.addr,
 +   nbytes  AES_BLOCK_MASK, walk.iv);
   nbytes = AES_BLOCK_SIZE - 1;
   err = blkcipher_walk_done(desc, walk, nbytes);
   }
 @@ -1493,6 +1521,14 @@ static int __init aesni_init(void)
   aesni_gcm_enc_tfm = aesni_gcm_enc;
   aesni_gcm_dec_tfm = aesni_gcm_dec;
   }
 + aesni_ctr_enc_tfm = aesni_ctr_enc;
 +#ifdef CONFIG_AS_AVX
 + if (cpu_has_aes  cpu_has_avx) {

The cpu_has_aes test is still redundant.

 + /* optimize performance of ctr mode encryption trasform */
 + aesni_ctr_enc_tfm = aesni_ctr_enc_avx_tfm;
 + pr_info(AES CTR mode by8 optimization enabled\n);
 + }
 +#endif
  #endif
  
   err = crypto_fpu_init();
 -- 
 1.8.2.1
 
 

With the above nitpicks handled:

Reviewed-by: Mathias Krause mini...@googlemail.com


Thanks,
Mathias
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/1] crypto: AES CTR x86_64 by8 AVX optimization

2014-06-04 Thread Mathias Krause
On Tue, Jun 03, 2014 at 05:41:14PM -0700, chandramouli narayanan wrote:
 This patch introduces by8 AES CTR mode AVX optimization inspired by
 Intel Optimized IPSEC Cryptograhpic library. For additional information,
 please see:
 http://downloadcenter.intel.com/Detail_Desc.aspx?agr=YDwnldID=22972
 
 The functions aes_ctr_enc_128_avx_by8(), aes_ctr_enc_192_avx_by8() and
 aes_ctr_enc_256_avx_by8() are adapted from
 Intel Optimized IPSEC Cryptographic library. When both AES and AVX features
 are enabled in a platform, the glue code in AESNI module overrieds the
 existing by4 CTR mode en/decryption with the by8
 AES CTR mode en/decryption.
 
 On a Haswell desktop, with turbo disabled and all cpus running
 at maximum frequency, the by8 CTR mode optimization
 shows better performance results across data  key sizes
 as measured by tcrypt.
 
 The average performance improvement of the by8 version over the by4
 version is as follows:
 
 For 128 bit key and data sizes = 256 bytes, there is a 10-16% improvement.
 For 192 bit key and data sizes = 256 bytes, there is a 20-22% improvement.
 For 256 bit key and data sizes = 256 bytes, there is a 20-25% improvement.

Nice improvement :)

How does it perform on older processors that do have a penalty for
unaligned loads (vmovdqu), e.g. SandyBridge? If those perform worse it
might be wise to extend the CPU feature test in the glue code by a model
test to enable the by8 variant only for Haswell and newer processors
that don't have such a penalty.

 
 A typical run of tcrypt with AES CTR mode encryption of the by4 and by8
 optimization shows the following results:
 
 tcrypt with by4 AES CTR mode encryption optimization on a Haswell Desktop:
 ---
 
 testing speed of __ctr-aes-aesni encryption
 test 0 (128 bit key, 16 byte blocks): 1 operation in 343 cycles (16 bytes)
 test 1 (128 bit key, 64 byte blocks): 1 operation in 336 cycles (64 bytes)
 test 2 (128 bit key, 256 byte blocks): 1 operation in 491 cycles (256 bytes)
 test 3 (128 bit key, 1024 byte blocks): 1 operation in 1130 cycles (1024 
 bytes)
 test 4 (128 bit key, 8192 byte blocks): 1 operation in 7309 cycles (8192 
 bytes)
 test 5 (192 bit key, 16 byte blocks): 1 operation in 346 cycles (16 bytes)
 test 6 (192 bit key, 64 byte blocks): 1 operation in 361 cycles (64 bytes)
 test 7 (192 bit key, 256 byte blocks): 1 operation in 543 cycles (256 bytes)
 test 8 (192 bit key, 1024 byte blocks): 1 operation in 1321 cycles (1024 
 bytes)
 test 9 (192 bit key, 8192 byte blocks): 1 operation in 9649 cycles (8192 
 bytes)
 test 10 (256 bit key, 16 byte blocks): 1 operation in 369 cycles (16 bytes)
 test 11 (256 bit key, 64 byte blocks): 1 operation in 366 cycles (64 bytes)
 test 12 (256 bit key, 256 byte blocks): 1 operation in 595 cycles (256 bytes)
 test 13 (256 bit key, 1024 byte blocks): 1 operation in 1531 cycles (1024 
 bytes)
 test 14 (256 bit key, 8192 byte blocks): 1 operation in 10522 cycles (8192 
 bytes)
 
 testing speed of __ctr-aes-aesni decryption
 test 0 (128 bit key, 16 byte blocks): 1 operation in 336 cycles (16 bytes)
 test 1 (128 bit key, 64 byte blocks): 1 operation in 350 cycles (64 bytes)
 test 2 (128 bit key, 256 byte blocks): 1 operation in 487 cycles (256 bytes)
 test 3 (128 bit key, 1024 byte blocks): 1 operation in 1129 cycles (1024 
 bytes)
 test 4 (128 bit key, 8192 byte blocks): 1 operation in 7287 cycles (8192 
 bytes)
 test 5 (192 bit key, 16 byte blocks): 1 operation in 350 cycles (16 bytes)
 test 6 (192 bit key, 64 byte blocks): 1 operation in 359 cycles (64 bytes)
 test 7 (192 bit key, 256 byte blocks): 1 operation in 635 cycles (256 bytes)
 test 8 (192 bit key, 1024 byte blocks): 1 operation in 1324 cycles (1024 
 bytes)
 test 9 (192 bit key, 8192 byte blocks): 1 operation in 9595 cycles (8192 
 bytes)
 test 10 (256 bit key, 16 byte blocks): 1 operation in 364 cycles (16 bytes)
 test 11 (256 bit key, 64 byte blocks): 1 operation in 377 cycles (64 bytes)
 test 12 (256 bit key, 256 byte blocks): 1 operation in 604 cycles (256 bytes)
 test 13 (256 bit key, 1024 byte blocks): 1 operation in 1527 cycles (1024 
 bytes)
 test 14 (256 bit key, 8192 byte blocks): 1 operation in 10549 cycles (8192 
 bytes)
 
 tcrypt with by8 AES CTR mode encryption optimization on a Haswell Desktop:
 ---
 
 testing speed of __ctr-aes-aesni encryption
 test 0 (128 bit key, 16 byte blocks): 1 operation in 340 cycles (16 bytes)
 test 1 (128 bit key, 64 byte blocks): 1 operation in 330 cycles (64 bytes)
 test 2 (128 bit key, 256 byte blocks): 1 operation in 450 cycles (256 bytes)
 test 3 (128 bit key, 1024 byte blocks): 1 operation in 1043 cycles (1024 
 bytes)
 test 4 (128 bit key, 8192 byte blocks): 1 operation in 6597 cycles (8192 
 bytes)
 test 5 (192 bit key, 16 byte blocks): 1 operation in 339 cycles (16 bytes)
 test 6 (192 bit key, 64 byte blocks): 1 operation in 352 cycles 

Re: [PATCH 0/3] crypto: x86/sha1 - regression and other fixes

2014-03-25 Thread Mathias Krause
On 24 March 2014 18:29, chandramouli narayanan mo...@linux.intel.com wrote:
 On Mon, 2014-03-24 at 17:10 +0100, Mathias Krause wrote:
 The recent addition of the AVX2 variant of the SHA1 hash function wrongly
 disabled the AVX variant by introducing a flaw in the feature test. Fixed
 in patch 1.

 The alignment calculations of the AVX2 assembler implementation are
 questionable, too. Especially the page alignment of the stack pointer is
 broken in multiple ways. Fixed in patch 2. In patch 3 another issue for
 code alignment is fixed.

 Please apply!

 Mathias Krause (3):
   crypto: x86/sha1 - re-enable the AVX variant
   crypto: x86/sha1 - fix stack alignment of AVX2 variant
   crypto: x86/sha1 - reduce size of the AVX2 asm implementation

  arch/x86/crypto/sha1_avx2_x86_64_asm.S |8 ++--
  arch/x86/crypto/sha1_ssse3_glue.c  |   26 --
  2 files changed, 18 insertions(+), 16 deletions(-)

 Your fixes are the right on mark. I went through your patches and tested
 them and found to be correct.

Thanks for double-checking!

 Sorry for causing regression and missing alignment issues in the patches
 I submitted.

No problem with that. But as I'm not subscribed to the linux-crypto
mailing list I haven't seen your earlier submissions. Otherwise I
would have objected earlier. ;)

Thanks,
Mathias
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] crypto: x86/sha1 - re-enable the AVX variant

2014-03-24 Thread Mathias Krause
Commit 7c1da8d0d0 crypto: sha - SHA1 transform x86_64 AVX2
accidentally disabled the AVX variant by making the avx_usable() test
not only fail in case the CPU doesn't support AVX or OSXSAVE but also
if it doesn't support AVX2.

Fix that regression by splitting up the AVX/AVX2 test into two
functions. Also test for the BMI1 extension in the avx2_usable() test
as the AVX2 implementation not only makes use of BMI2 but also BMI1
instructions.

Cc: Chandramouli Narayanan mo...@linux.intel.com
Cc: H. Peter Anvin h...@zytor.com
Cc: Marek Vasut ma...@denx.de
Signed-off-by: Mathias Krause mini...@googlemail.com
---
 arch/x86/crypto/sha1_ssse3_glue.c |   26 --
 1 file changed, 16 insertions(+), 10 deletions(-)

diff --git a/arch/x86/crypto/sha1_ssse3_glue.c 
b/arch/x86/crypto/sha1_ssse3_glue.c
index 139a55c04d..74d16ef707 100644
--- a/arch/x86/crypto/sha1_ssse3_glue.c
+++ b/arch/x86/crypto/sha1_ssse3_glue.c
@@ -208,11 +208,7 @@ static bool __init avx_usable(void)
 {
u64 xcr0;
 
-#if defined(CONFIG_AS_AVX2)
-   if (!cpu_has_avx || !cpu_has_avx2 || !cpu_has_osxsave)
-#else
if (!cpu_has_avx || !cpu_has_osxsave)
-#endif
return false;
 
xcr0 = xgetbv(XCR_XFEATURE_ENABLED_MASK);
@@ -224,11 +220,23 @@ static bool __init avx_usable(void)
 
return true;
 }
+
+#ifdef CONFIG_AS_AVX2
+static bool __init avx2_usable(void)
+{
+   if (avx_usable()  cpu_has_avx2  boot_cpu_has(X86_FEATURE_BMI1) 
+   boot_cpu_has(X86_FEATURE_BMI2))
+   return true;
+
+   return false;
+}
+#endif
 #endif
 
 static int __init sha1_ssse3_mod_init(void)
 {
char *algo_name;
+
/* test for SSSE3 first */
if (cpu_has_ssse3) {
sha1_transform_asm = sha1_transform_ssse3;
@@ -238,13 +246,11 @@ static int __init sha1_ssse3_mod_init(void)
 #ifdef CONFIG_AS_AVX
/* allow AVX to override SSSE3, it's a little faster */
if (avx_usable()) {
-   if (cpu_has_avx) {
-   sha1_transform_asm = sha1_transform_avx;
-   algo_name = AVX;
-   }
+   sha1_transform_asm = sha1_transform_avx;
+   algo_name = AVX;
 #ifdef CONFIG_AS_AVX2
-   if (cpu_has_avx2  boot_cpu_has(X86_FEATURE_BMI2)) {
-   /* allow AVX2 to override AVX, it's a little faster */
+   /* allow AVX2 to override AVX, it's a little faster */
+   if (avx2_usable()) {
sha1_transform_asm = sha1_apply_transform_avx2;
algo_name = AVX2;
}
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] crypto: x86/sha1 - fix stack alignment of AVX2 variant

2014-03-24 Thread Mathias Krause
The AVX2 implementation might waste up to a page of stack memory because
of a wrong alignment calculation. This will, in the worst case, increase
the stack usage of sha1_transform_avx2() alone to 5.4 kB -- way to big
for a kernel function. Even worse, it might also allocate *less* bytes
than needed if the stack pointer is already aligned bacause in that case
the 'sub %rbx, %rsp' is effectively moving the stack pointer upwards,
not downwards.

Fix those issues by changing and simplifying the alignment calculation
to use a 32 byte alignment, the alignment really needed.

Cc: Chandramouli Narayanan mo...@linux.intel.com
Cc: H. Peter Anvin h...@zytor.com
Cc: Marek Vasut ma...@denx.de
Signed-off-by: Mathias Krause mini...@googlemail.com
---
 arch/x86/crypto/sha1_avx2_x86_64_asm.S |7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/arch/x86/crypto/sha1_avx2_x86_64_asm.S 
b/arch/x86/crypto/sha1_avx2_x86_64_asm.S
index 4f348544d1..bacac22b20 100644
--- a/arch/x86/crypto/sha1_avx2_x86_64_asm.S
+++ b/arch/x86/crypto/sha1_avx2_x86_64_asm.S
@@ -636,9 +636,7 @@ _loop3:
 
/* Align stack */
mov %rsp, %rbx
-   and $(0x1000-1), %rbx
-   sub $(8+32), %rbx
-   sub %rbx, %rsp
+   and $~(0x20-1), %rsp
push%rbx
sub $RESERVE_STACK, %rsp
 
@@ -665,8 +663,7 @@ _loop3:
avx2_zeroupper
 
add $RESERVE_STACK, %rsp
-   pop %rbx
-   add %rbx, %rsp
+   pop %rsp
 
pop %r15
pop %r14
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] crypto: x86/sha1 - reduce size of the AVX2 asm implementation

2014-03-24 Thread Mathias Krause
There is really no need to page align sha1_transform_avx2. The default
alignment is just fine. This is not the hot code but only the entry
point, after all.

Cc: Chandramouli Narayanan mo...@linux.intel.com
Cc: H. Peter Anvin h...@zytor.com
Cc: Marek Vasut ma...@denx.de
Signed-off-by: Mathias Krause mini...@googlemail.com
---
 arch/x86/crypto/sha1_avx2_x86_64_asm.S |1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/x86/crypto/sha1_avx2_x86_64_asm.S 
b/arch/x86/crypto/sha1_avx2_x86_64_asm.S
index bacac22b20..1cd792db15 100644
--- a/arch/x86/crypto/sha1_avx2_x86_64_asm.S
+++ b/arch/x86/crypto/sha1_avx2_x86_64_asm.S
@@ -623,7 +623,6 @@ _loop3:
  */
 .macro SHA1_VECTOR_ASM  name
ENTRY(\name)
-   .align 4096
 
push%rbx
push%rbp
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/3] crypto: x86/sha1 - regression and other fixes

2014-03-24 Thread Mathias Krause
The recent addition of the AVX2 variant of the SHA1 hash function wrongly
disabled the AVX variant by introducing a flaw in the feature test. Fixed
in patch 1.

The alignment calculations of the AVX2 assembler implementation are
questionable, too. Especially the page alignment of the stack pointer is
broken in multiple ways. Fixed in patch 2. In patch 3 another issue for
code alignment is fixed.

Please apply!

Mathias Krause (3):
  crypto: x86/sha1 - re-enable the AVX variant
  crypto: x86/sha1 - fix stack alignment of AVX2 variant
  crypto: x86/sha1 - reduce size of the AVX2 asm implementation

 arch/x86/crypto/sha1_avx2_x86_64_asm.S |8 ++--
 arch/x86/crypto/sha1_ssse3_glue.c  |   26 --
 2 files changed, 18 insertions(+), 16 deletions(-)

-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] padata: Fix wrong usage of rcu_dereference()

2013-11-28 Thread Mathias Krause
A kernel with enabled lockdep complains about the wrong usage of
rcu_dereference() under a rcu_read_lock_bh() protected region.

  ===
  [ INFO: suspicious RCU usage. ]
  3.13.0-rc1+ #126 Not tainted
  ---
  linux/kernel/padata.c:115 suspicious rcu_dereference_check() usage!

  other info that might help us debug this:

  rcu_scheduler_active = 1, debug_locks = 1
  1 lock held by cryptomgr_test/153:
   #0:  (rcu_read_lock_bh){.+}, at: [8115c235] 
padata_do_parallel+0x5/0x270

Fix that by using rcu_dereference_bh() instead.

Signed-off-by: Mathias Krause mini...@googlemail.com
Cc: Steffen Klassert steffen.klass...@secunet.com
---
 kernel/padata.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/padata.c b/kernel/padata.c
index 2abd25d79c..161402f0b5 100644
--- a/kernel/padata.c
+++ b/kernel/padata.c
@@ -112,7 +112,7 @@ int padata_do_parallel(struct padata_instance *pinst,
 
rcu_read_lock_bh();
 
-   pd = rcu_dereference(pinst-pd);
+   pd = rcu_dereference_bh(pinst-pd);
 
err = -EINVAL;
if (!(pinst-flags  PADATA_INIT) || pinst-flags  PADATA_INVALID)
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] crypto: pcrypt - Fix wrong usage of rcu_dereference()

2013-11-28 Thread Mathias Krause
A kernel with enabled lockdep complains about the wrong usage of
rcu_dereference() under a rcu_read_lock_bh() protected region.

  ===
  [ INFO: suspicious RCU usage. ]
  3.13.0-rc1+ #126 Not tainted
  ---
  linux/crypto/pcrypt.c:81 suspicious rcu_dereference_check() usage!

  other info that might help us debug this:

  rcu_scheduler_active = 1, debug_locks = 1
  1 lock held by cryptomgr_test/153:
   #0:  (rcu_read_lock_bh){.+}, at: [812c8075] 
pcrypt_do_parallel.isra.2+0x5/0x200

Fix that by using rcu_dereference_bh() instead.

Signed-off-by: Mathias Krause mini...@googlemail.com
Cc: Steffen Klassert steffen.klass...@secunet.com
Cc: Herbert Xu herb...@gondor.apana.org.au
Cc: David S. Miller da...@davemloft.net
---
 crypto/pcrypt.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/crypto/pcrypt.c b/crypto/pcrypt.c
index f8c920cafe..309d345ead 100644
--- a/crypto/pcrypt.c
+++ b/crypto/pcrypt.c
@@ -78,7 +78,7 @@ static int pcrypt_do_parallel(struct padata_priv *padata, 
unsigned int *cb_cpu,
cpu = *cb_cpu;
 
rcu_read_lock_bh();
-   cpumask = rcu_dereference(pcrypt-cb_cpumask);
+   cpumask = rcu_dereference_bh(pcrypt-cb_cpumask);
if (cpumask_test_cpu(cpu, cpumask-mask))
goto out;
 
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/2] pcrypt/padata rcu fixes

2013-11-28 Thread Mathias Krause
Two small RCU related fixes, lockdep complained about.

Please apply!


Mathias Krause (2):
  crypto: pcrypt - Fix wrong usage of rcu_dereference()
  padata: Fix wrong usage of rcu_dereference()

 crypto/pcrypt.c |2 +-
 kernel/padata.c |2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] padata: make the sequence counter an atomic_t

2013-10-25 Thread Mathias Krause
On 08.10.2013 14:08, Steffen Klassert wrote:
 On Wed, Oct 02, 2013 at 03:40:45PM +0200, Mathias Krause wrote:
 Using a spinlock to atomically increase a counter sounds wrong -- we've
 atomic_t for this!

 Also move 'seq_nr' to a different cache line than 'lock' to reduce cache
 line trashing. This has the nice side effect of decreasing the size of
 struct parallel_data from 192 to 128 bytes for a x86-64 build, e.g.
 occupying only two instead of three cache lines.

 Those changes results in a 5% performance increase on an IPsec test run
 using pcrypt.

 Btw. the seq_lock spinlock was never explicitly initialized -- one more
 reason to get rid of it.

 Signed-off-by: Mathias Krause mathias.kra...@secunet.com
 
 Acked-by: Steffen Klassert steffen.klass...@secunet.com
 
 Herbert can you take this one?

Ping, Herbert? Anything wrong with the patch?
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] padata: make the sequence counter an atomic_t

2013-10-25 Thread Mathias Krause
On 25.10.2013 11:26, Herbert Xu wrote:
 On Fri, Oct 25, 2013 at 10:20:48AM +0200, Mathias Krause wrote:
 On 08.10.2013 14:08, Steffen Klassert wrote:
 On Wed, Oct 02, 2013 at 03:40:45PM +0200, Mathias Krause wrote:
 Using a spinlock to atomically increase a counter sounds wrong -- we've
 atomic_t for this!

 Also move 'seq_nr' to a different cache line than 'lock' to reduce cache
 line trashing. This has the nice side effect of decreasing the size of
 struct parallel_data from 192 to 128 bytes for a x86-64 build, e.g.
 occupying only two instead of three cache lines.

 Those changes results in a 5% performance increase on an IPsec test run
 using pcrypt.

 Btw. the seq_lock spinlock was never explicitly initialized -- one more
 reason to get rid of it.

 Signed-off-by: Mathias Krause mathias.kra...@secunet.com
 Acked-by: Steffen Klassert steffen.klass...@secunet.com

 Herbert can you take this one?
 Ping, Herbert? Anything wrong with the patch?
 
 Sorry I don't seem to have this patch in my mail box.  Can you
 resend it please?

I send it to linux-crypto and Steffen only. Will resend it directed to
you, now.

 
 Thanks!
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RESEND] padata: make the sequence counter an atomic_t

2013-10-25 Thread Mathias Krause
Using a spinlock to atomically increase a counter sounds wrong -- we've
atomic_t for this!

Also move 'seq_nr' to a different cache line than 'lock' to reduce cache
line trashing. This has the nice side effect of decreasing the size of
struct parallel_data from 192 to 128 bytes for a x86-64 build, e.g.
occupying only two instead of three cache lines.

Those changes results in a 5% performance increase on an IPsec test run
using pcrypt.

Btw. the seq_lock spinlock was never explicitly initialized -- one more
reason to get rid of it.

Signed-off-by: Mathias Krause mathias.kra...@secunet.com
Acked-by: Steffen Klassert steffen.klass...@secunet.com
---
 include/linux/padata.h |3 +--
 kernel/padata.c|9 -
 2 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/include/linux/padata.h b/include/linux/padata.h
index 86292be..4386946 100644
--- a/include/linux/padata.h
+++ b/include/linux/padata.h
@@ -129,10 +129,9 @@ struct parallel_data {
struct padata_serial_queue  __percpu *squeue;
atomic_treorder_objects;
atomic_trefcnt;
+   atomic_tseq_nr;
struct padata_cpumask   cpumask;
spinlock_t  lock cacheline_aligned;
-   spinlock_t  seq_lock;
-   unsigned intseq_nr;
unsigned intprocessed;
struct timer_list   timer;
 };
diff --git a/kernel/padata.c b/kernel/padata.c
index 07af2c9..2abd25d 100644
--- a/kernel/padata.c
+++ b/kernel/padata.c
@@ -46,6 +46,7 @@ static int padata_index_to_cpu(struct parallel_data *pd, int 
cpu_index)
 
 static int padata_cpu_hash(struct parallel_data *pd)
 {
+   unsigned int seq_nr;
int cpu_index;
 
/*
@@ -53,10 +54,8 @@ static int padata_cpu_hash(struct parallel_data *pd)
 * seq_nr mod. number of cpus in use.
 */
 
-   spin_lock(pd-seq_lock);
-   cpu_index =  pd-seq_nr % cpumask_weight(pd-cpumask.pcpu);
-   pd-seq_nr++;
-   spin_unlock(pd-seq_lock);
+   seq_nr = atomic_inc_return(pd-seq_nr);
+   cpu_index = seq_nr % cpumask_weight(pd-cpumask.pcpu);
 
return padata_index_to_cpu(pd, cpu_index);
 }
@@ -429,7 +428,7 @@ static struct parallel_data *padata_alloc_pd(struct 
padata_instance *pinst,
padata_init_pqueues(pd);
padata_init_squeues(pd);
setup_timer(pd-timer, padata_reorder_timer, (unsigned long)pd);
-   pd-seq_nr = 0;
+   atomic_set(pd-seq_nr, -1);
atomic_set(pd-reorder_objects, 0);
atomic_set(pd-refcnt, 0);
pd-pinst = pinst;
-- 
1.7.2.5

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/5] crypto: authenc - Export key parsing helper function

2013-10-15 Thread Mathias Krause
AEAD key parsing is duplicated to multiple places in the kernel. Add a
common helper function to consolidate that functionality.

Cc: Herbert Xu herb...@gondor.apana.org.au
Cc: David S. Miller da...@davemloft.net
Signed-off-by: Mathias Krause mathias.kra...@secunet.com
---
 crypto/authenc.c |   48 -
 include/crypto/authenc.h |   12 ++-
 2 files changed, 41 insertions(+), 19 deletions(-)

diff --git a/crypto/authenc.c b/crypto/authenc.c
index ffce19d..0fda5ba 100644
--- a/crypto/authenc.c
+++ b/crypto/authenc.c
@@ -52,40 +52,52 @@ static void authenc_request_complete(struct aead_request 
*req, int err)
aead_request_complete(req, err);
 }
 
-static int crypto_authenc_setkey(struct crypto_aead *authenc, const u8 *key,
-unsigned int keylen)
+int crypto_authenc_extractkeys(struct crypto_authenc_keys *keys, const u8 *key,
+  unsigned int keylen)
 {
-   unsigned int authkeylen;
-   unsigned int enckeylen;
-   struct crypto_authenc_ctx *ctx = crypto_aead_ctx(authenc);
-   struct crypto_ahash *auth = ctx-auth;
-   struct crypto_ablkcipher *enc = ctx-enc;
-   struct rtattr *rta = (void *)key;
+   struct rtattr *rta = (struct rtattr *)key;
struct crypto_authenc_key_param *param;
-   int err = -EINVAL;
 
if (!RTA_OK(rta, keylen))
-   goto badkey;
+   return -EINVAL;
if (rta-rta_type != CRYPTO_AUTHENC_KEYA_PARAM)
-   goto badkey;
+   return -EINVAL;
if (RTA_PAYLOAD(rta)  sizeof(*param))
-   goto badkey;
+   return -EINVAL;
 
param = RTA_DATA(rta);
-   enckeylen = be32_to_cpu(param-enckeylen);
+   keys-enckeylen = be32_to_cpu(param-enckeylen);
 
key += RTA_ALIGN(rta-rta_len);
keylen -= RTA_ALIGN(rta-rta_len);
 
-   if (keylen  enckeylen)
-   goto badkey;
+   if (keylen  keys-enckeylen)
+   return -EINVAL;
 
-   authkeylen = keylen - enckeylen;
+   keys-authkeylen = keylen - keys-enckeylen;
+   keys-authkey = key;
+   keys-enckey = key + keys-authkeylen;
+
+   return 0;
+}
+EXPORT_SYMBOL_GPL(crypto_authenc_extractkeys);
+
+static int crypto_authenc_setkey(struct crypto_aead *authenc, const u8 *key,
+unsigned int keylen)
+{
+   struct crypto_authenc_ctx *ctx = crypto_aead_ctx(authenc);
+   struct crypto_ahash *auth = ctx-auth;
+   struct crypto_ablkcipher *enc = ctx-enc;
+   struct crypto_authenc_keys keys;
+   int err = -EINVAL;
+
+   if (crypto_authenc_extractkeys(keys, key, keylen) != 0)
+   goto badkey;
 
crypto_ahash_clear_flags(auth, CRYPTO_TFM_REQ_MASK);
crypto_ahash_set_flags(auth, crypto_aead_get_flags(authenc) 
CRYPTO_TFM_REQ_MASK);
-   err = crypto_ahash_setkey(auth, key, authkeylen);
+   err = crypto_ahash_setkey(auth, keys.authkey, keys.authkeylen);
crypto_aead_set_flags(authenc, crypto_ahash_get_flags(auth) 
   CRYPTO_TFM_RES_MASK);
 
@@ -95,7 +107,7 @@ static int crypto_authenc_setkey(struct crypto_aead 
*authenc, const u8 *key,
crypto_ablkcipher_clear_flags(enc, CRYPTO_TFM_REQ_MASK);
crypto_ablkcipher_set_flags(enc, crypto_aead_get_flags(authenc) 
 CRYPTO_TFM_REQ_MASK);
-   err = crypto_ablkcipher_setkey(enc, key + authkeylen, enckeylen);
+   err = crypto_ablkcipher_setkey(enc, keys.enckey, keys.enckeylen);
crypto_aead_set_flags(authenc, crypto_ablkcipher_get_flags(enc) 
   CRYPTO_TFM_RES_MASK);
 
diff --git a/include/crypto/authenc.h b/include/crypto/authenc.h
index e47b044..6775059 100644
--- a/include/crypto/authenc.h
+++ b/include/crypto/authenc.h
@@ -23,5 +23,15 @@ struct crypto_authenc_key_param {
__be32 enckeylen;
 };
 
-#endif /* _CRYPTO_AUTHENC_H */
+struct crypto_authenc_keys {
+   const u8 *authkey;
+   const u8 *enckey;
+
+   unsigned int authkeylen;
+   unsigned int enckeylen;
+};
 
+int crypto_authenc_extractkeys(struct crypto_authenc_keys *keys, const u8 *key,
+  unsigned int keylen);
+
+#endif /* _CRYPTO_AUTHENC_H */
-- 
1.7.2.5

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/5] crypto: picoxcell - Simplify and harden key parsing

2013-10-15 Thread Mathias Krause
Use the common helper function crypto_authenc_extractkeys() for key
parsing. Also ensure the auth key won't overflow the hash_ctx buffer.

Cc: Jamie Iles ja...@jamieiles.com
Cc: Herbert Xu herb...@gondor.apana.org.au
Cc: David S. Miller da...@davemloft.net
Signed-off-by: Mathias Krause mathias.kra...@secunet.com
---
Not tested as I've no such hardware, nor the needed cross compiler!

 drivers/crypto/picoxcell_crypto.c |   32 
 1 files changed, 8 insertions(+), 24 deletions(-)

diff --git a/drivers/crypto/picoxcell_crypto.c 
b/drivers/crypto/picoxcell_crypto.c
index 888f7f4..a6175ba 100644
--- a/drivers/crypto/picoxcell_crypto.c
+++ b/drivers/crypto/picoxcell_crypto.c
@@ -495,45 +495,29 @@ static int spacc_aead_setkey(struct crypto_aead *tfm, 
const u8 *key,
 {
struct spacc_aead_ctx *ctx = crypto_aead_ctx(tfm);
struct spacc_alg *alg = to_spacc_alg(tfm-base.__crt_alg);
-   struct rtattr *rta = (void *)key;
-   struct crypto_authenc_key_param *param;
-   unsigned int authkeylen, enckeylen;
+   struct crypto_authenc_keys keys;
int err = -EINVAL;
 
-   if (!RTA_OK(rta, keylen))
+   if (crypto_authenc_extractkeys(keys, key, keylen) != 0)
goto badkey;
 
-   if (rta-rta_type != CRYPTO_AUTHENC_KEYA_PARAM)
+   if (keys.enckeylen  AES_MAX_KEY_SIZE)
goto badkey;
 
-   if (RTA_PAYLOAD(rta)  sizeof(*param))
-   goto badkey;
-
-   param = RTA_DATA(rta);
-   enckeylen = be32_to_cpu(param-enckeylen);
-
-   key += RTA_ALIGN(rta-rta_len);
-   keylen -= RTA_ALIGN(rta-rta_len);
-
-   if (keylen  enckeylen)
-   goto badkey;
-
-   authkeylen = keylen - enckeylen;
-
-   if (enckeylen  AES_MAX_KEY_SIZE)
+   if (keys.authkeylen  sizeof(ctx-hash_ctx))
goto badkey;
 
if ((alg-ctrl_default  SPACC_CRYPTO_ALG_MASK) ==
SPA_CTRL_CIPH_ALG_AES)
-   err = spacc_aead_aes_setkey(tfm, key + authkeylen, enckeylen);
+   err = spacc_aead_aes_setkey(tfm, keys.enckey, keys.enckeylen);
else
-   err = spacc_aead_des_setkey(tfm, key + authkeylen, enckeylen);
+   err = spacc_aead_des_setkey(tfm, keys.enckey, keys.enckeylen);
 
if (err)
goto badkey;
 
-   memcpy(ctx-hash_ctx, key, authkeylen);
-   ctx-hash_key_len = authkeylen;
+   memcpy(ctx-hash_ctx, keys.authkey, keys.authkeylen);
+   ctx-hash_key_len = keys.authkeylen;
 
return 0;
 
-- 
1.7.2.5

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/5] crypto: ixp4xx - Simplify and harden key parsing

2013-10-15 Thread Mathias Krause
Use the common helper function crypto_authenc_extractkeys() for key
parsing. Also ensure the keys do fit into the corresponding buffers.
Otherwise memory corruption might occur.

Cc: Christian Hohnstaedt chohnsta...@innominate.com
Cc: Herbert Xu herb...@gondor.apana.org.au
Cc: David S. Miller da...@davemloft.net
Signed-off-by: Mathias Krause mathias.kra...@secunet.com
---
Not tested as I've no such hardware, nor the needed cross compiler!

 drivers/crypto/ixp4xx_crypto.c |   26 +-
 1 files changed, 9 insertions(+), 17 deletions(-)

diff --git a/drivers/crypto/ixp4xx_crypto.c b/drivers/crypto/ixp4xx_crypto.c
index 21180d6..153f73c 100644
--- a/drivers/crypto/ixp4xx_crypto.c
+++ b/drivers/crypto/ixp4xx_crypto.c
@@ -1159,32 +1159,24 @@ static int aead_setkey(struct crypto_aead *tfm, const 
u8 *key,
unsigned int keylen)
 {
struct ixp_ctx *ctx = crypto_aead_ctx(tfm);
-   struct rtattr *rta = (struct rtattr *)key;
-   struct crypto_authenc_key_param *param;
+   struct crypto_authenc_keys keys;
 
-   if (!RTA_OK(rta, keylen))
-   goto badkey;
-   if (rta-rta_type != CRYPTO_AUTHENC_KEYA_PARAM)
-   goto badkey;
-   if (RTA_PAYLOAD(rta)  sizeof(*param))
+   if (crypto_authenc_extractkeys(keys, key, keylen) != 0)
goto badkey;
 
-   param = RTA_DATA(rta);
-   ctx-enckey_len = be32_to_cpu(param-enckeylen);
-
-   key += RTA_ALIGN(rta-rta_len);
-   keylen -= RTA_ALIGN(rta-rta_len);
+   if (keys.authkeylen  sizeof(ctx-authkey))
+   goto badkey;
 
-   if (keylen  ctx-enckey_len)
+   if (keys.enckeylen  sizeof(ctx-enckey))
goto badkey;
 
-   ctx-authkey_len = keylen - ctx-enckey_len;
-   memcpy(ctx-enckey, key + ctx-authkey_len, ctx-enckey_len);
-   memcpy(ctx-authkey, key, ctx-authkey_len);
+   memcpy(ctx-authkey, keys.authkey, keys.authkeylen);
+   memcpy(ctx-enckey, keys.enckey, keys.enckeylen);
+   ctx-authkey_len = keys.authkeylen;
+   ctx-enckey_len = keys.enckeylen;
 
return aead_setup(tfm, crypto_aead_authsize(tfm));
 badkey:
-   ctx-enckey_len = 0;
crypto_aead_set_flags(tfm, CRYPTO_TFM_RES_BAD_KEY_LEN);
return -EINVAL;
 }
-- 
1.7.2.5

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/5] crypto: authencesn - Simplify key parsing

2013-10-15 Thread Mathias Krause
Use the common helper function crypto_authenc_extractkeys() for key
parsing.

Cc: Herbert Xu herb...@gondor.apana.org.au
Cc: David S. Miller da...@davemloft.net
Signed-off-by: Mathias Krause mathias.kra...@secunet.com
---
 crypto/authencesn.c |   26 --
 1 files changed, 4 insertions(+), 22 deletions(-)

diff --git a/crypto/authencesn.c b/crypto/authencesn.c
index ab53762..8ed5b47 100644
--- a/crypto/authencesn.c
+++ b/crypto/authencesn.c
@@ -59,37 +59,19 @@ static void authenc_esn_request_complete(struct 
aead_request *req, int err)
 static int crypto_authenc_esn_setkey(struct crypto_aead *authenc_esn, const u8 
*key,
 unsigned int keylen)
 {
-   unsigned int authkeylen;
-   unsigned int enckeylen;
struct crypto_authenc_esn_ctx *ctx = crypto_aead_ctx(authenc_esn);
struct crypto_ahash *auth = ctx-auth;
struct crypto_ablkcipher *enc = ctx-enc;
-   struct rtattr *rta = (void *)key;
-   struct crypto_authenc_key_param *param;
+   struct crypto_authenc_keys keys;
int err = -EINVAL;
 
-   if (!RTA_OK(rta, keylen))
+   if (crypto_authenc_extractkeys(keys, key, keylen) != 0)
goto badkey;
-   if (rta-rta_type != CRYPTO_AUTHENC_KEYA_PARAM)
-   goto badkey;
-   if (RTA_PAYLOAD(rta)  sizeof(*param))
-   goto badkey;
-
-   param = RTA_DATA(rta);
-   enckeylen = be32_to_cpu(param-enckeylen);
-
-   key += RTA_ALIGN(rta-rta_len);
-   keylen -= RTA_ALIGN(rta-rta_len);
-
-   if (keylen  enckeylen)
-   goto badkey;
-
-   authkeylen = keylen - enckeylen;
 
crypto_ahash_clear_flags(auth, CRYPTO_TFM_REQ_MASK);
crypto_ahash_set_flags(auth, crypto_aead_get_flags(authenc_esn) 
 CRYPTO_TFM_REQ_MASK);
-   err = crypto_ahash_setkey(auth, key, authkeylen);
+   err = crypto_ahash_setkey(auth, keys.authkey, keys.authkeylen);
crypto_aead_set_flags(authenc_esn, crypto_ahash_get_flags(auth) 
   CRYPTO_TFM_RES_MASK);
 
@@ -99,7 +81,7 @@ static int crypto_authenc_esn_setkey(struct crypto_aead 
*authenc_esn, const u8 *
crypto_ablkcipher_clear_flags(enc, CRYPTO_TFM_REQ_MASK);
crypto_ablkcipher_set_flags(enc, crypto_aead_get_flags(authenc_esn) 
 CRYPTO_TFM_REQ_MASK);
-   err = crypto_ablkcipher_setkey(enc, key + authkeylen, enckeylen);
+   err = crypto_ablkcipher_setkey(enc, keys.enckey, keys.enckeylen);
crypto_aead_set_flags(authenc_esn, crypto_ablkcipher_get_flags(enc) 
   CRYPTO_TFM_RES_MASK);
 
-- 
1.7.2.5

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/5] Authenc key parsing consolidation

2013-10-15 Thread Mathias Krause
This series removes the code duplication of authenc key parsing by
introducing a common helper function crypto_authenc_extractkeys() in
patch 1. Patches 2 to 5 change all remaining places to use the new
helper. Patches 3 and 4 also fix potential memory corruptions by
ensuring the supplied keys won't overflow there respective buffers.

I was unable to test patches 3 to 5 as I don't have the needed hardware
for these devices -- not even a cross compiler for those architectures.

In case patches 3 and 4 are enqueued for stable, patch 1 needs to be as
well, as it's a prerequisite for those.

Please apply!

Mathias Krause (5):
  crypto: authenc - Export key parsing helper function
  crypto: authencesn - Simplify key parsing
  crypto: ixp4xx - Simplify and harden key parsing
  crypto: picoxcell - Simplify and harden key parsing
  crypto: talitos - Simplify key parsing

 crypto/authenc.c  |   48 +++--
 crypto/authencesn.c   |   26 +++-
 drivers/crypto/ixp4xx_crypto.c|   26 +++-
 drivers/crypto/picoxcell_crypto.c |   32 ++--
 drivers/crypto/talitos.c  |   35 ++
 include/crypto/authenc.h  |   12 -
 6 files changed, 70 insertions(+), 109 deletions(-)

-- 
1.7.2.5

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/5] crypto: talitos - Simplify key parsing

2013-10-15 Thread Mathias Krause
Use the common helper function crypto_authenc_extractkeys() for key
parsing.

Cc: Kim Phillips kim.phill...@freescale.com
Cc: Herbert Xu herb...@gondor.apana.org.au
Cc: David S. Miller da...@davemloft.net
Signed-off-by: Mathias Krause mathias.kra...@secunet.com
---
Not tested as I've no such hardware, nor the needed cross compiler!

 drivers/crypto/talitos.c |   35 ---
 1 files changed, 8 insertions(+), 27 deletions(-)

diff --git a/drivers/crypto/talitos.c b/drivers/crypto/talitos.c
index 661dc3e..f6f7c68 100644
--- a/drivers/crypto/talitos.c
+++ b/drivers/crypto/talitos.c
@@ -671,39 +671,20 @@ static int aead_setkey(struct crypto_aead *authenc,
   const u8 *key, unsigned int keylen)
 {
struct talitos_ctx *ctx = crypto_aead_ctx(authenc);
-   struct rtattr *rta = (void *)key;
-   struct crypto_authenc_key_param *param;
-   unsigned int authkeylen;
-   unsigned int enckeylen;
-
-   if (!RTA_OK(rta, keylen))
-   goto badkey;
+   struct crypto_authenc_keys keys;
 
-   if (rta-rta_type != CRYPTO_AUTHENC_KEYA_PARAM)
+   if (crypto_authenc_extractkeys(keys, key, keylen) != 0)
goto badkey;
 
-   if (RTA_PAYLOAD(rta)  sizeof(*param))
+   if (keys.authkeylen + keys.enckeylen  TALITOS_MAX_KEY_SIZE)
goto badkey;
 
-   param = RTA_DATA(rta);
-   enckeylen = be32_to_cpu(param-enckeylen);
-
-   key += RTA_ALIGN(rta-rta_len);
-   keylen -= RTA_ALIGN(rta-rta_len);
-
-   if (keylen  enckeylen)
-   goto badkey;
+   memcpy(ctx-key, keys.authkey, keys.authkeylen);
+   memcpy(ctx-key[keys.authkeylen], keys.enckey, keys.enckeylen);
 
-   authkeylen = keylen - enckeylen;
-
-   if (keylen  TALITOS_MAX_KEY_SIZE)
-   goto badkey;
-
-   memcpy(ctx-key, key, keylen);
-
-   ctx-keylen = keylen;
-   ctx-enckeylen = enckeylen;
-   ctx-authkeylen = authkeylen;
+   ctx-keylen = keys.authkeylen + keys.enckeylen;
+   ctx-enckeylen = keys.enckeylen;
+   ctx-authkeylen = keys.authkeylen;
 
return 0;
 
-- 
1.7.2.5

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] padata: make the sequence counter an atomic_t

2013-10-02 Thread Mathias Krause
Using a spinlock to atomically increase a counter sounds wrong -- we've
atomic_t for this!

Also move 'seq_nr' to a different cache line than 'lock' to reduce cache
line trashing. This has the nice side effect of decreasing the size of
struct parallel_data from 192 to 128 bytes for a x86-64 build, e.g.
occupying only two instead of three cache lines.

Those changes results in a 5% performance increase on an IPsec test run
using pcrypt.

Btw. the seq_lock spinlock was never explicitly initialized -- one more
reason to get rid of it.

Signed-off-by: Mathias Krause mathias.kra...@secunet.com
---
 include/linux/padata.h |3 +--
 kernel/padata.c|9 -
 2 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/include/linux/padata.h b/include/linux/padata.h
index 86292be..4386946 100644
--- a/include/linux/padata.h
+++ b/include/linux/padata.h
@@ -129,10 +129,9 @@ struct parallel_data {
struct padata_serial_queue  __percpu *squeue;
atomic_treorder_objects;
atomic_trefcnt;
+   atomic_tseq_nr;
struct padata_cpumask   cpumask;
spinlock_t  lock cacheline_aligned;
-   spinlock_t  seq_lock;
-   unsigned intseq_nr;
unsigned intprocessed;
struct timer_list   timer;
 };
diff --git a/kernel/padata.c b/kernel/padata.c
index 07af2c9..2abd25d 100644
--- a/kernel/padata.c
+++ b/kernel/padata.c
@@ -46,6 +46,7 @@ static int padata_index_to_cpu(struct parallel_data *pd, int 
cpu_index)
 
 static int padata_cpu_hash(struct parallel_data *pd)
 {
+   unsigned int seq_nr;
int cpu_index;
 
/*
@@ -53,10 +54,8 @@ static int padata_cpu_hash(struct parallel_data *pd)
 * seq_nr mod. number of cpus in use.
 */
 
-   spin_lock(pd-seq_lock);
-   cpu_index =  pd-seq_nr % cpumask_weight(pd-cpumask.pcpu);
-   pd-seq_nr++;
-   spin_unlock(pd-seq_lock);
+   seq_nr = atomic_inc_return(pd-seq_nr);
+   cpu_index = seq_nr % cpumask_weight(pd-cpumask.pcpu);
 
return padata_index_to_cpu(pd, cpu_index);
 }
@@ -429,7 +428,7 @@ static struct parallel_data *padata_alloc_pd(struct 
padata_instance *pinst,
padata_init_pqueues(pd);
padata_init_squeues(pd);
setup_timer(pd-timer, padata_reorder_timer, (unsigned long)pd);
-   pd-seq_nr = 0;
+   atomic_set(pd-seq_nr, -1);
atomic_set(pd-reorder_objects, 0);
atomic_set(pd-refcnt, 0);
pd-pinst = pinst;
-- 
1.7.2.5

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] crypto: algif - suppress sending source address information in recvmsg

2013-04-21 Thread Mathias Krause
On Wed, Apr 10, 2013 at 8:26 AM, Herbert Xu herb...@gondor.apana.org.au wrote:
 On Wed, Apr 10, 2013 at 08:21:51AM +0200, Mathias Krause wrote:
 On Wed, Apr 10, 2013 at 5:31 AM, Herbert Xu herb...@gondor.apana.org.au 
 wrote:
  On Sun, Apr 07, 2013 at 02:05:39PM +0200, Mathias Krause wrote:
  The current code does not set the msg_namelen member to 0 and therefore
  makes net/socket.c leak the local sockaddr_storage variable to userland
  -- 128 bytes of kernel stack memory. Fix that.
 
  Signed-off-by: Mathias Krause mini...@googlemail.com
 
  Patch applied.  Thanks!

 Thanks, but that patch shouldn't have been applied to cryptodev but
 crypto instead, and probably queued up for stable as well.
 I missed the 'Cc: stable # v2.6.38'. My bad.

 OK, I'll move it across and add the stable Cc.

Any specific reason you're not pushing it to Linus for inclusion in v3.9?

Mathias
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] crypto: user - constify netlink dispatch table

2013-02-24 Thread Mathias Krause
There is no need to modify the netlink dispatch table at runtime and
making it const even makes the resulting object file slightly smaller.

Cc: Steffen Klassert steffen.klass...@secunet.com
Signed-off-by: Mathias Krause mini...@googlemail.com
---
 crypto/crypto_user.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/crypto/crypto_user.c b/crypto/crypto_user.c
index 35d700a..77f6c4e 100644
--- a/crypto/crypto_user.c
+++ b/crypto/crypto_user.c
@@ -426,7 +426,7 @@ static const struct nla_policy 
crypto_policy[CRYPTOCFGA_MAX+1] = {
 
 #undef MSGSIZE
 
-static struct crypto_link {
+static const struct crypto_link {
int (*doit)(struct sk_buff *, struct nlmsghdr *, struct nlattr **);
int (*dump)(struct sk_buff *, struct netlink_callback *);
int (*done)(struct netlink_callback *);
@@ -442,7 +442,7 @@ static struct crypto_link {
 static int crypto_user_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
 {
struct nlattr *attrs[CRYPTOCFGA_MAX+1];
-   struct crypto_link *link;
+   const struct crypto_link *link;
int type, err;
 
type = nlh-nlmsg_type;
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] crypto: user - fix empty string test in report API

2013-02-05 Thread Mathias Krause
On Mon, Feb 4, 2013 at 2:15 PM, Herbert Xu herb...@gondor.apana.org.au wrote:
 On Sun, Feb 03, 2013 at 12:09:01PM +0100, Mathias Krause wrote:
 The current test for empty strings fails because it is testing the
 address of a field, not a pointer. So the test will always be true.
 Test for the string length instead.

 Signed-off-by: Mathias Krause mini...@googlemail.com
 Cc: Steffen Klassert steffen.klass...@secunet.com

 Good catch.  However, what if cru_driver_name isn't NUL-terminated?

Your objection is totally valid, sure. And my initial idea wouldn't
have that problem as it would just test for the first character to be
'\0', i.e. do something like that:

-   if (!p-cru_driver_name)
+   if (!p-cru_driver_name[0])

But then I looked how the other code in the crypto user API does refer
to string lengths related to cru_driver_name and switched to strlen().
So the other code is (potentially) vulnerable to non-NUL-terminated
strings, too.

So, I think we need another patch that adds sanity checks for
non-NUL-terminated strings. I can do this, maybe this evening, and
send out a new version of the patch series if you like me to.

Regards,
Mathias
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] crypto: user - fix empty string test in report API

2013-02-03 Thread Mathias Krause
The current test for empty strings fails because it is testing the
address of a field, not a pointer. So the test will always be true.
Test for the string length instead.

Signed-off-by: Mathias Krause mini...@googlemail.com
Cc: Steffen Klassert steffen.klass...@secunet.com
---
 crypto/crypto_user.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/crypto/crypto_user.c b/crypto/crypto_user.c
index f6d9baf..1b9a9a1 100644
--- a/crypto/crypto_user.c
+++ b/crypto/crypto_user.c
@@ -196,7 +196,7 @@ static int crypto_report(struct sk_buff *in_skb, struct 
nlmsghdr *in_nlh,
struct crypto_dump_info info;
int err;
 
-   if (!p-cru_driver_name)
+   if (!strlen(p-cru_driver_name))
return -EINVAL;
 
alg = crypto_alg_match(p, 1);
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] crypto: user - fix info leaks in report API

2013-02-03 Thread Mathias Krause
Three errors resulting in kernel memory disclosure:

1/ The structures used for the netlink based crypto algorithm report API
are located on the stack. As snprintf() does not fill the remainder of
the buffer with null bytes, those stack bytes will be disclosed to users
of the API. Switch to strncpy() to fix this.

2/ crypto_report_one() does not initialize all field of struct
crypto_user_alg. Fix this to fix the heap info leak.

3/ For the module name we should copy only as many bytes as
module_name() returns -- not as much as the destination buffer could
hold. But the current code does not and therefore copies random data
from behind the end of the module name, as the module name is always
shorter than CRYPTO_MAX_ALG_NAME.

Also switch to use strncpy() to copy the algorithm's name and
driver_name. They are strings, after all.

Signed-off-by: Mathias Krause mini...@googlemail.com
Cc: Steffen Klassert steffen.klass...@secunet.com
---
 crypto/ablkcipher.c  |   12 ++--
 crypto/aead.c|9 -
 crypto/ahash.c   |2 +-
 crypto/blkcipher.c   |6 +++---
 crypto/crypto_user.c |   22 +++---
 crypto/pcompress.c   |3 +--
 crypto/rng.c |2 +-
 crypto/shash.c   |3 ++-
 8 files changed, 29 insertions(+), 30 deletions(-)

diff --git a/crypto/ablkcipher.c b/crypto/ablkcipher.c
index 533de95..7d4a8d2 100644
--- a/crypto/ablkcipher.c
+++ b/crypto/ablkcipher.c
@@ -388,9 +388,9 @@ static int crypto_ablkcipher_report(struct sk_buff *skb, 
struct crypto_alg *alg)
 {
struct crypto_report_blkcipher rblkcipher;
 
-   snprintf(rblkcipher.type, CRYPTO_MAX_ALG_NAME, %s, ablkcipher);
-   snprintf(rblkcipher.geniv, CRYPTO_MAX_ALG_NAME, %s,
-alg-cra_ablkcipher.geniv ?: default);
+   strncpy(rblkcipher.type, ablkcipher, sizeof(rblkcipher.type));
+   strncpy(rblkcipher.geniv, alg-cra_ablkcipher.geniv ?: default,
+   sizeof(rblkcipher.geniv));
 
rblkcipher.blocksize = alg-cra_blocksize;
rblkcipher.min_keysize = alg-cra_ablkcipher.min_keysize;
@@ -469,9 +469,9 @@ static int crypto_givcipher_report(struct sk_buff *skb, 
struct crypto_alg *alg)
 {
struct crypto_report_blkcipher rblkcipher;
 
-   snprintf(rblkcipher.type, CRYPTO_MAX_ALG_NAME, %s, givcipher);
-   snprintf(rblkcipher.geniv, CRYPTO_MAX_ALG_NAME, %s,
-alg-cra_ablkcipher.geniv ?: built-in);
+   strncpy(rblkcipher.type, givcipher, sizeof(rblkcipher.type));
+   strncpy(rblkcipher.geniv, alg-cra_ablkcipher.geniv ?: built-in,
+   sizeof(rblkcipher.geniv));
 
rblkcipher.blocksize = alg-cra_blocksize;
rblkcipher.min_keysize = alg-cra_ablkcipher.min_keysize;
diff --git a/crypto/aead.c b/crypto/aead.c
index 0b8121e..27bc487 100644
--- a/crypto/aead.c
+++ b/crypto/aead.c
@@ -117,9 +117,8 @@ static int crypto_aead_report(struct sk_buff *skb, struct 
crypto_alg *alg)
struct crypto_report_aead raead;
struct aead_alg *aead = alg-cra_aead;
 
-   snprintf(raead.type, CRYPTO_MAX_ALG_NAME, %s, aead);
-   snprintf(raead.geniv, CRYPTO_MAX_ALG_NAME, %s,
-aead-geniv ?: built-in);
+   strncpy(raead.type, aead, sizeof(raead.type));
+   strncpy(raead.geniv, aead-geniv ?: built-in, sizeof(raead.geniv));
 
raead.blocksize = alg-cra_blocksize;
raead.maxauthsize = aead-maxauthsize;
@@ -203,8 +202,8 @@ static int crypto_nivaead_report(struct sk_buff *skb, 
struct crypto_alg *alg)
struct crypto_report_aead raead;
struct aead_alg *aead = alg-cra_aead;
 
-   snprintf(raead.type, CRYPTO_MAX_ALG_NAME, %s, nivaead);
-   snprintf(raead.geniv, CRYPTO_MAX_ALG_NAME, %s, aead-geniv);
+   strncpy(raead.type, nivaead, sizeof(raead.type));
+   strncpy(raead.geniv, aead-geniv, sizeof(raead.geniv));
 
raead.blocksize = alg-cra_blocksize;
raead.maxauthsize = aead-maxauthsize;
diff --git a/crypto/ahash.c b/crypto/ahash.c
index 3887856..793a27f 100644
--- a/crypto/ahash.c
+++ b/crypto/ahash.c
@@ -404,7 +404,7 @@ static int crypto_ahash_report(struct sk_buff *skb, struct 
crypto_alg *alg)
 {
struct crypto_report_hash rhash;
 
-   snprintf(rhash.type, CRYPTO_MAX_ALG_NAME, %s, ahash);
+   strncpy(rhash.type, ahash, sizeof(rhash.type));
 
rhash.blocksize = alg-cra_blocksize;
rhash.digestsize = __crypto_hash_alg_common(alg)-digestsize;
diff --git a/crypto/blkcipher.c b/crypto/blkcipher.c
index a8d85a1..c44e014 100644
--- a/crypto/blkcipher.c
+++ b/crypto/blkcipher.c
@@ -499,9 +499,9 @@ static int crypto_blkcipher_report(struct sk_buff *skb, 
struct crypto_alg *alg)
 {
struct crypto_report_blkcipher rblkcipher;
 
-   snprintf(rblkcipher.type, CRYPTO_MAX_ALG_NAME, %s, blkcipher);
-   snprintf(rblkcipher.geniv, CRYPTO_MAX_ALG_NAME, %s,
-alg-cra_blkcipher.geniv ?: default);
+   strncpy(rblkcipher.type, blkcipher, sizeof(rblkcipher.type));
+   strncpy

Re: Linux 3.6-rc5

2012-09-09 Thread Mathias Krause
On Sun, Sep 09, 2012 at 12:19:58PM -0700, Herbert Xu wrote:
 On Sun, Sep 09, 2012 at 11:13:02AM +0200, Romain Francoise wrote:
  Still seeing this BUG with -rc5, that I originally reported here:
  http://marc.info/?l=linux-crypto-vgerm=134653220530264w=2
  
  [   26.362567] [ cut here ]
  [   26.362583] kernel BUG at crypto/scatterwalk.c:37!
  [   26.362606] invalid opcode:  [#1] SMP 
 
 Can you try blacklisting/not loading sha1_ssse3 and aesni_intel
 to see which one of them is causing this crash? Of course if you
 can still reproduce this without loading either of them that would
 also be interesting to know.

It happens with the C variants of SHA1 and AES, too. You can easily
trigger the bug with Steffen's crconf[1]:

$ crconf add alg authenc(hmac(sha1-generic),cbc(aes-generic)) type 3

So the problem is likely not related to sha1-ssse3.ko or aesni-intel.ko.


Regards,
Mathias

[1] http://sourceforge.net/projects/crconf/
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Linux 3.6-rc5

2012-09-09 Thread Mathias Krause
On Sun, Sep 09, 2012 at 02:00:00PM -0700, Herbert Xu wrote:
 On Sun, Sep 09, 2012 at 10:09:10PM +0200, Mathias Krause wrote:
 
  It happens with the C variants of SHA1 and AES, too. You can easily
  trigger the bug with Steffen's crconf[1]:
  
  $ crconf add alg authenc(hmac(sha1-generic),cbc(aes-generic)) type 3
  
  So the problem is likely not related to sha1-ssse3.ko or aesni-intel.ko.
 
 Thanks! I think this patch should fix the problem.  Can someone
 please confirm this?
 
 crypto: authenc - Fix crash with zero-length assoc data
 
 The authenc code doesn't deal with zero-length associated data
 correctly and ends up constructing a zero-length sg entry which
 causes a crash when it's fed into the crypto system.
 
 This patch fixes this by avoiding the code-path that triggers
 the SG construction if we have no associated data.
 
 This isn't the most optimal fix as it means that we'll end up
 using the fallback code-path even when we could still execute
 the digest function.  However, this isn't a big deal as nobody
 but the test path would supply zero-length associated data.
 
 Reported-by: Romain Francoise rom...@orebokech.com
 Signed-off-by: Herbert Xu herb...@gondor.apana.org.au

Looks good to me.

 
 diff --git a/crypto/authenc.c b/crypto/authenc.c
 index 5ef7ba6..d0583a4 100644
 --- a/crypto/authenc.c
 +++ b/crypto/authenc.c
 @@ -336,7 +336,7 @@ static int crypto_authenc_genicv(struct aead_request 
 *req, u8 *iv,
   cryptlen += ivsize;
   }
  
 - if (sg_is_last(assoc)) {
 + if (req-assoclen  sg_is_last(assoc)) {
   authenc_ahash_fn = crypto_authenc_ahash;
   sg_init_table(asg, 2);
   sg_set_page(asg, sg_page(assoc), assoc-length, assoc-offset);
 @@ -490,7 +490,7 @@ static int crypto_authenc_iverify(struct aead_request 
 *req, u8 *iv,
   cryptlen += ivsize;
   }
  
 - if (sg_is_last(assoc)) {
 + if (req-assoclen  sg_is_last(assoc)) {
   authenc_ahash_fn = crypto_authenc_ahash;
   sg_init_table(asg, 2);
   sg_set_page(asg, sg_page(assoc), assoc-length, assoc-offset);
 

Tested-by: Mathias Krause mini...@googlemail.com

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] crypto: aesni-intel - fix unaligned cbc decrypt for x86-32

2012-05-31 Thread Mathias Krause
On Thu, May 31, 2012 at 7:27 AM, Herbert Xu herb...@gondor.apana.org.au wrote:
 On Wed, May 30, 2012 at 01:43:08AM +0200, Mathias Krause wrote:
 The 32 bit variant of cbc(aes) decrypt is using instructions requiring
 128 bit aligned memory locations but fails to ensure this constraint in
 the code. Fix this by loading the data into intermediate registers with
 load unaligned instructions.

 This fixes reported general protection faults related to aesni.

 References: https://bugzilla.kernel.org/show_bug.cgi?id=43223
 Reported-by: Daniel gark...@mailueberfall.de
 Cc: sta...@kernel.org [v2.6.39+]
 Signed-off-by: Mathias Krause mini...@googlemail.com

 Have measured this against increasing alignmask to 15?

No, but the latter will likely be much slower as it would need to
memmove the data if it's not aligned, right?

My patch essentially just breaks the combined XOR a memory operand
with a register operation into two -- load memory into register, then
XOR with registers. It shouldn't be much slower compared to the
current version. But it fixes a bug the current version exposes when
working on unaligned data.

That said, I did some micro benchmark on pxor (%edx), %xmm0 vs.
movups (%edx), %xmm1; pxor %xmm1, %xmm0 and observed the latter
might be even slightly faster! But changing the code to perform better
is out of scope for this patch as it should just fix the bug in the
code. We can increase performance in a follow up patch.

Mathias
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] crypto: aesni-intel - fix unaligned cbc decrypt for x86-32

2012-05-29 Thread Mathias Krause
The 32 bit variant of cbc(aes) decrypt is using instructions requiring
128 bit aligned memory locations but fails to ensure this constraint in
the code. Fix this by loading the data into intermediate registers with
load unaligned instructions.

This fixes reported general protection faults related to aesni.

References: https://bugzilla.kernel.org/show_bug.cgi?id=43223
Reported-by: Daniel gark...@mailueberfall.de
Cc: sta...@kernel.org [v2.6.39+]
Signed-off-by: Mathias Krause mini...@googlemail.com
---
 arch/x86/crypto/aesni-intel_asm.S |6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/x86/crypto/aesni-intel_asm.S 
b/arch/x86/crypto/aesni-intel_asm.S
index be6d9e3..3470624 100644
--- a/arch/x86/crypto/aesni-intel_asm.S
+++ b/arch/x86/crypto/aesni-intel_asm.S
@@ -2460,10 +2460,12 @@ ENTRY(aesni_cbc_dec)
pxor IN3, STATE4
movaps IN4, IV
 #else
-   pxor (INP), STATE2
-   pxor 0x10(INP), STATE3
pxor IN1, STATE4
movaps IN2, IV
+   movups (INP), IN1
+   pxor IN1, STATE2
+   movups 0x10(INP), IN2
+   pxor IN2, STATE3
 #endif
movups STATE1, (OUTP)
movups STATE2, 0x10(OUTP)
-- 
1.7.10

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] crypto: sha1 - use Kbuild supplied flags for AVX test

2012-05-24 Thread Mathias Krause
Commit ea4d26ae (raid5: add AVX optimized RAID5 checksumming)
introduced x86/ arch wide defines for AFLAGS and CFLAGS indicating AVX
support in binutils based on the same test we have in x86/crypto/ right
now. To minimize duplication drop our implementation in favour to the
one in x86/.

Signed-off-by: Mathias Krause mini...@googlemail.com
---

This should be applied to cryptodev-2.6.git after it contains the above
mentioned commit, e.g. after cryptodev-2.6.git rebased to/merged v3.5-rc1.

 arch/x86/crypto/Makefile  |7 ---
 arch/x86/crypto/sha1_ssse3_asm.S  |2 +-
 arch/x86/crypto/sha1_ssse3_glue.c |6 +++---
 3 files changed, 4 insertions(+), 11 deletions(-)

diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile
index e191ac0..479f95a 100644
--- a/arch/x86/crypto/Makefile
+++ b/arch/x86/crypto/Makefile
@@ -34,12 +34,5 @@ salsa20-x86_64-y := salsa20-x86_64-asm_64.o salsa20_glue.o
 serpent-sse2-x86_64-y := serpent-sse2-x86_64-asm_64.o serpent_sse2_glue.o
 
 aesni-intel-y := aesni-intel_asm.o aesni-intel_glue.o fpu.o
-
 ghash-clmulni-intel-y := ghash-clmulni-intel_asm.o ghash-clmulni-intel_glue.o
-
-# enable AVX support only when $(AS) can actually assemble the instructions
-ifeq ($(call as-instr,vpxor %xmm0$(comma)%xmm1$(comma)%xmm2,yes,no),yes)
-AFLAGS_sha1_ssse3_asm.o += -DSHA1_ENABLE_AVX_SUPPORT
-CFLAGS_sha1_ssse3_glue.o += -DSHA1_ENABLE_AVX_SUPPORT
-endif
 sha1-ssse3-y := sha1_ssse3_asm.o sha1_ssse3_glue.o
diff --git a/arch/x86/crypto/sha1_ssse3_asm.S b/arch/x86/crypto/sha1_ssse3_asm.S
index b2c2f57..49d6987 100644
--- a/arch/x86/crypto/sha1_ssse3_asm.S
+++ b/arch/x86/crypto/sha1_ssse3_asm.S
@@ -468,7 +468,7 @@ W_PRECALC_SSSE3
  */
 SHA1_VECTOR_ASM sha1_transform_ssse3
 
-#ifdef SHA1_ENABLE_AVX_SUPPORT
+#ifdef CONFIG_AS_AVX
 
 .macro W_PRECALC_AVX
 
diff --git a/arch/x86/crypto/sha1_ssse3_glue.c 
b/arch/x86/crypto/sha1_ssse3_glue.c
index f916499..4a11a9d 100644
--- a/arch/x86/crypto/sha1_ssse3_glue.c
+++ b/arch/x86/crypto/sha1_ssse3_glue.c
@@ -35,7 +35,7 @@
 
 asmlinkage void sha1_transform_ssse3(u32 *digest, const char *data,
 unsigned int rounds);
-#ifdef SHA1_ENABLE_AVX_SUPPORT
+#ifdef CONFIG_AS_AVX
 asmlinkage void sha1_transform_avx(u32 *digest, const char *data,
   unsigned int rounds);
 #endif
@@ -184,7 +184,7 @@ static struct shash_alg alg = {
}
 };
 
-#ifdef SHA1_ENABLE_AVX_SUPPORT
+#ifdef CONFIG_AS_AVX
 static bool __init avx_usable(void)
 {
u64 xcr0;
@@ -209,7 +209,7 @@ static int __init sha1_ssse3_mod_init(void)
if (cpu_has_ssse3)
sha1_transform_asm = sha1_transform_ssse3;
 
-#ifdef SHA1_ENABLE_AVX_SUPPORT
+#ifdef CONFIG_AS_AVX
/* allow AVX to override SSSE3, it's a little faster */
if (avx_usable())
sha1_transform_asm = sha1_transform_avx;
-- 
1.5.6.5

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 0/2] crypto, x86: assembler implementation of SHA1

2011-08-17 Thread Mathias Krause
On Fri, Aug 5, 2011 at 9:49 AM, Herbert Xu herb...@gondor.apana.org.au wrote:
 On Thu, Aug 04, 2011 at 08:19:23PM +0200, Mathias Krause wrote:
 This patch series adds an assembler implementation for the SHA1 hash 
 algorithm
 for the x86-64 architecture. Its raw hash performance can be more than 2 
 times
 faster than the generic C implementation. This gives a real world benefit for
 IPsec with an throughput increase of up to +35%. For concrete numbers have a
 look at the second patch.

 I'll apply this as soon as rc1 is out.  Thanks!

 PS if you have time please look into the asynchronous version.

I've converted the implementation to ahash. It was pretty easy in the
end as I had a reference to look at -- the ghash implementation. But
I'm pretty unhappy how many layers of indirection are in there now
even for the irq_fpu_usable() case -- not to mention that the
!irq_fpu_usable() is much more heavyweight but, at least, it can now
always make use of the SSE3 variant. (See the attached patch, but
please don't apply it. It's compile tested only! Also I'm not
comfortable with the ahash version right now.)

I haven't made any benchmarks yet because I've no access to the test
machine right now. But I suspect it'll be much slower for small chunks
and *maybe* faster for the IPsec case I'm interested in because it can
now always use the SSE3 variant.

What I'm thinking about is having both, a shash and an ahash variant
so the user can actually chose which interface fits its needs best --
e.g., dm-crypt might want to use the shash interface and xfrm the
ahash one. Do you think something like that is possible? And if so,
how would dm-crypt automatically chose the shash and xfrm the ahash
variant?


Regards,
Mathias


0001-crypto-sha1-convert-to-ahash-interface.patch
Description: Binary data


Re: [PATCH v2 2/2] crypto, x86: SSSE3 based SHA1 implementation for x86-64

2011-08-14 Thread Mathias Krause
Hi Max,

2011/8/8 Locktyukhin, Maxim maxim.locktyuk...@intel.com:
 I'd like to note that at Intel we very much appreciate Mathias effort to 
 port/integrate this implementation into Linux kernel!


 $0.02 re tcrypt perf numbers below: I believe something must be terribly 
 broken with the tcrypt measurements

 20 (and more) cycles per byte shown below are not reasonable numbers for 
 SHA-1 - ~6 c/b (as can be seen in some of the results for Core2) is the 
 expected results ... so, while relative improvement seen is sort of 
 consistent, the absolute performance numbers are very much off (and yes Sandy 
 Bridge on AVX code is expected to be faster than Core2/SSSE3 - ~5.2 c/b vs. 
 ~5.8 c/b on the level of the sha1_update() call to me more precise)

 this does not affect the proposed patch in any way, it looks like tcrypt's 
 timing problem to me - I'd even venture a guess that it may be due to the use 
 of RDTSC (that gets affected significantly by Turbo/EIST, TSC is isotropic in 
 time but not with the core clock domain, i.e. RDTSC cannot be used to measure 
 core cycles without at least disabling EIST and Turbo, or doing runtime 
 adjustment of actual bus/core clock ratio vs. the standard ratio always used 
 by TSC - I could elaborate more if someone is interested)

I found the Sandy Bridge numbers odd too but suspected, it might be
because of the laptop platform. The SSSE3 numbers on this platform
were slightly lower than the AVX numbers and that for still way off
the ones for the Core2 system. But your explanation fits well, too. It
might be EIST or Turbo mode that tampered with the numbers. Another,
maybe more likely point might be the overhead Andy mentioned.

 thanks again,
 -Max


Mathias
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/2] crypto, x86: SSSE3 based SHA1 implementation for x86-64

2011-08-14 Thread Mathias Krause
On Thu, Aug 11, 2011 at 4:50 PM, Andy Lutomirski l...@mit.edu wrote:
 I have vague plans to clean up extended state handling and make
 kernel_fpu_begin work efficiently from any context.  (i.e. the first
 kernel_fpu_begin after a context switch could take up to ~60 ns on Sandy
 Bridge, but further calls to kernel_fpu_begin would be a single branch.)

 The current code that handles context switches when user code is using
 extended state is terrible and will almost certainly become faster in the
 near future.

Sounds good! This would not only improve the performance of sha1_ssse3
but of aesni as well.

 Hopefully I'll have patches for 3.2 or 3.3.

 IOW, please don't introduce another thing like the fpu crypto module quite
 yet unless there's a good reason.  I'm looking forward to deleting the fpu
 module entirely.

I've no intention to. So please go ahead and do so.


Mathias
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/2] crypto, x86: SSSE3 based SHA1 implementation for x86-64

2011-08-04 Thread Mathias Krause
On Thu, Aug 4, 2011 at 8:44 AM, Herbert Xu herb...@gondor.apana.org.au wrote:
 On Sun, Jul 24, 2011 at 07:53:14PM +0200, Mathias Krause wrote:

 With this algorithm I was able to increase the throughput of a single
 IPsec link from 344 Mbit/s to 464 Mbit/s on a Core 2 Quad CPU using
 the SSSE3 variant -- a speedup of +34.8%.

 Were you testing this on the transmit side or the receive side?

I was running an iperf test on two directly connected systems. Both sides
showed me those numbers (iperf server and client).

 As the IPsec receive code path usually runs in a softirq context,
 does this code have any effect there at all?

It does. Just have a look at how fpu_available() is implemented:

,-[ arch/x86/include/asm/i387.h ]
| static inline bool irq_fpu_usable(void)
| {
| struct pt_regs *regs;
|
| return !in_interrupt() || !(regs = get_irq_regs()) || \
| user_mode(regs) || (read_cr0()  X86_CR0_TS);
| }
`

So, it'll fail in softirq context when the softirq interrupted a kernel thread
or TS in CR0 is set. When it interrupted a userland thread that hasn't the TS
flag set in CR0, i.e. the CPU won't generate an exception when we use the FPU,
it'll work in softirq context, too.

With a busy userland making extensive use of the FPU it'll almost always have
to fall back to the generic implementation, right. However, using this module
on an IPsec gateway with no real userland at all, you get a nice performance
gain.

 This is pretty similar to the situation with the Intel AES code.
 Over there they solved it by using the asynchronous interface and
 deferring the processing to a work queue.

 This also avoids the situation where you have an FPU/SSE using
 process that also tries to transmit over IPsec thrashing the
 FPU state.

Interesting. I'll look into this.

 Now I'm still happy to take this because hashing is very different
 from ciphers in that some users tend to hash small amounts of data
 all the time.  Those users will typically use the shash interface
 that you provide here.

 So I'm interested to know how much of an improvement this is for
 those users ( 64 bytes).

Anything below 64 byte will i(and has to) be padded to a full block, i.e. 64
bytes.

 If you run the tcrypt speed tests that should provide some useful info.

I've summarized the mean values of five consecutive tcrypt runs from two
different systems. The first system is an Intel Core i7 2620M based notebook
running at 2.70 GHz. It's a Sandy Bridge processor so could make use of the
AVX variant. The second system was an Intel Core 2 Quad Xeon system running at
2.40 GHz -- no AVX, but SSSE3.

Since the output of tcrypt is a little awkward to read, I've condensed it
slightly to make it (hopefully) more readable. Please interpret the table as
follow: The triple in the first column is (byte blocks | bytes per update |
updates), c/B is cycles per byte.

Here are the numbers for the first system:

   sha1-genericsha1-ssse3 (AVX)
 (  16 |   16 |   1): 9.65 MiB/s, 266.2 c/B 12.93 MiB/s, 200.0 c/B
 (  64 |   16 |   4):19.05 MiB/s, 140.2 c/B 25.27 MiB/s, 105.6 c/B
 (  64 |   64 |   1):21.35 MiB/s, 119.2 c/B 29.29 MiB/s,  87.0 c/B
 ( 256 |   16 |  16):28.81 MiB/s,  88.8 c/B 37.70 MiB/s,  68.4 c/B
 ( 256 |   64 |   4):34.58 MiB/s,  74.0 c/B 47.16 MiB/s,  54.8 c/B
 ( 256 |  256 |   1):37.44 MiB/s,  68.0 c/B 69.01 MiB/s,  36.8 c/B
 (1024 |   16 |  64):33.55 MiB/s,  76.2 c/B 43.77 MiB/s,  59.0 c/B
 (1024 |  256 |   4):45.12 MiB/s,  58.0 c/B 88.90 MiB/s,  28.8 c/B
 (1024 | 1024 |   1):46.69 MiB/s,  54.0 c/B104.39 MiB/s,  25.6 c/B
 (2048 |   16 | 128):34.66 MiB/s,  74.0 c/B 44.93 MiB/s,  57.2 c/B
 (2048 |  256 |   8):46.81 MiB/s,  54.0 c/B 93.83 MiB/s,  27.0 c/B
 (2048 | 1024 |   2):48.28 MiB/s,  52.4 c/B110.98 MiB/s,  23.0 c/B
 (2048 | 2048 |   1):48.69 MiB/s,  52.0 c/B114.26 MiB/s,  22.0 c/B
 (4096 |   16 | 256):35.15 MiB/s,  72.6 c/B 45.53 MiB/s,  56.0 c/B
 (4096 |  256 |  16):47.69 MiB/s,  53.0 c/B 96.46 MiB/s,  26.0 c/B
 (4096 | 1024 |   4):49.24 MiB/s,  51.0 c/B114.36 MiB/s,  22.0 c/B
 (4096 | 4096 |   1):49.77 MiB/s,  51.0 c/B119.80 MiB/s,  21.0 c/B
 (8192 |   16 | 512):35.46 MiB/s,  72.2 c/B 45.84 MiB/s,  55.8 c/B
 (8192 |  256 |  32):48.15 MiB/s,  53.0 c/B 97.83 MiB/s,  26.0 c/B
 (8192 | 1024 |   8):49.73 MiB/s,  51.0 c/B116.35 MiB/s,  22.0 c/B
 (8192 | 4096 |   2):50.10 MiB/s,  50.8 c/B121.66 MiB/s,  21.0 c/B
 (8192 | 8192 |   1):50.25 MiB/s,  50.8 c/B121.87 MiB/s,  21.0 c/B

For the second system I got the following numbers:

   sha1-genericsha1-ssse3 (SSSE3)
 (  16 |   16 |   1):27.23 MiB/s, 106.6 c/B 32.86 MiB/s,  73.8 c/B
 (  64 |   16 |   4):51.67 MiB/s,  54.0 c/B 61.90 MiB/s,  37.8 c/B
 (  64 |   64 |   1):62.44 MiB/s,  44.2 c/B 74.16 MiB/s,  31.6 c/B

Re: [PATCH v2 2/2] crypto, x86: SSSE3 based SHA1 implementation for x86-64

2011-08-04 Thread Mathias Krause
On Thu, Aug 4, 2011 at 7:05 PM, Mathias Krause mini...@googlemail.com wrote:
 It does. Just have a look at how fpu_available() is implemented:

read: irq_fpu_usable()
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 0/2] crypto, x86: assembler implementation of SHA1

2011-08-04 Thread Mathias Krause
This patch series adds an assembler implementation for the SHA1 hash algorithm
for the x86-64 architecture. Its raw hash performance can be more than 2 times
faster than the generic C implementation. This gives a real world benefit for
IPsec with an throughput increase of up to +35%. For concrete numbers have a
look at the second patch.

This implementation is currently x86-64 only but might be ported to 32 bit with
some effort in a follow up patch. (I had no time to do this yet.)

Note: The SSSE3 is no typo, it's Supplemental SSE3.

v3 changes:
- removed #ifdef in first patch
v2 changes:
- fixed typo in Makefile making AVX version unusable
- whitespace fixes for the .S file

Regards,
Mathias

Mathias Krause (2):
  crypto, sha1: export sha1_update for reuse
  crypto, x86: SSSE3 based SHA1 implementation for x86-64

 arch/x86/crypto/Makefile  |8 +
 arch/x86/crypto/sha1_ssse3_asm.S  |  558 +
 arch/x86/crypto/sha1_ssse3_glue.c |  240 
 arch/x86/include/asm/cpufeature.h |3 +
 crypto/Kconfig|   10 +
 crypto/sha1_generic.c |9 +-
 include/crypto/sha.h  |3 +
 7 files changed, 827 insertions(+), 4 deletions(-)
 create mode 100644 arch/x86/crypto/sha1_ssse3_asm.S
 create mode 100644 arch/x86/crypto/sha1_ssse3_glue.c

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 1/2] crypto, sha1: export sha1_update for reuse

2011-08-04 Thread Mathias Krause
Export the update function as crypto_sha1_update() to not have the need
to reimplement the same algorithm for each SHA-1 implementation. This
way the generic SHA-1 implementation can be used as fallback for other
implementations that fail to run under certain circumstances, like the
need for an FPU context while executing in IRQ context.

Signed-off-by: Mathias Krause mini...@googlemail.com
---
 crypto/sha1_generic.c |9 +
 include/crypto/sha.h  |3 +++
 2 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/crypto/sha1_generic.c b/crypto/sha1_generic.c
index 0416091..0b6d907 100644
--- a/crypto/sha1_generic.c
+++ b/crypto/sha1_generic.c
@@ -36,7 +36,7 @@ static int sha1_init(struct shash_desc *desc)
return 0;
 }
 
-static int sha1_update(struct shash_desc *desc, const u8 *data,
+int crypto_sha1_update(struct shash_desc *desc, const u8 *data,
unsigned int len)
 {
struct sha1_state *sctx = shash_desc_ctx(desc);
@@ -70,6 +70,7 @@ static int sha1_update(struct shash_desc *desc, const u8 
*data,
 
return 0;
 }
+EXPORT_SYMBOL(crypto_sha1_update);
 
 
 /* Add padding and return the message digest. */
@@ -86,10 +87,10 @@ static int sha1_final(struct shash_desc *desc, u8 *out)
/* Pad out to 56 mod 64 */
index = sctx-count  0x3f;
padlen = (index  56) ? (56 - index) : ((64+56) - index);
-   sha1_update(desc, padding, padlen);
+   crypto_sha1_update(desc, padding, padlen);
 
/* Append length */
-   sha1_update(desc, (const u8 *)bits, sizeof(bits));
+   crypto_sha1_update(desc, (const u8 *)bits, sizeof(bits));
 
/* Store state in digest */
for (i = 0; i  5; i++)
@@ -120,7 +121,7 @@ static int sha1_import(struct shash_desc *desc, const void 
*in)
 static struct shash_alg alg = {
.digestsize =   SHA1_DIGEST_SIZE,
.init   =   sha1_init,
-   .update =   sha1_update,
+   .update =   crypto_sha1_update,
.final  =   sha1_final,
.export =   sha1_export,
.import =   sha1_import,
diff --git a/include/crypto/sha.h b/include/crypto/sha.h
index 069e85b..83e6be5 100644
--- a/include/crypto/sha.h
+++ b/include/crypto/sha.h
@@ -82,4 +82,7 @@ struct sha512_state {
u8 buf[SHA512_BLOCK_SIZE];
 };
 
+extern int crypto_sha1_update(struct shash_desc *desc, const u8 *data,
+ unsigned int len);
+
 #endif
-- 
1.5.6.5

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 2/2] crypto, x86: SSSE3 based SHA1 implementation for x86-64

2011-08-04 Thread Mathias Krause
This is an assembler implementation of the SHA1 algorithm using the
Supplemental SSE3 (SSSE3) instructions or, when available, the
Advanced Vector Extensions (AVX).

Testing with the tcrypt module shows the raw hash performance is up to
2.3 times faster than the C implementation, using 8k data blocks on a
Core 2 Duo T5500. For the smalest data set (16 byte) it is still 25%
faster.

Since this implementation uses SSE/YMM registers it cannot safely be
used in every situation, e.g. while an IRQ interrupts a kernel thread.
The implementation falls back to the generic SHA1 variant, if using
the SSE/YMM registers is not possible.

With this algorithm I was able to increase the throughput of a single
IPsec link from 344 Mbit/s to 464 Mbit/s on a Core 2 Quad CPU using
the SSSE3 variant -- a speedup of +34.8%.

Saving and restoring SSE/YMM state might make the actual throughput
fluctuate when there are FPU intensive userland applications running.
For example, meassuring the performance using iperf2 directly on the
machine under test gives wobbling numbers because iperf2 uses the FPU
for each packet to check if the reporting interval has expired (in the
above test I got min/max/avg: 402/484/464 MBit/s).

Using this algorithm on a IPsec gateway gives much more reasonable and
stable numbers, albeit not as high as in the directly connected case.
Here is the result from an RFC 2544 test run with a EXFO Packet Blazer
FTB-8510:

 frame sizesha1-generic sha1-ssse3delta
64 byte 37.5 MBit/s37.5 MBit/s 0.0%
   128 byte 56.3 MBit/s62.5 MBit/s   +11.0%
   256 byte 87.5 MBit/s   100.0 MBit/s   +14.3%
   512 byte131.3 MBit/s   150.0 MBit/s   +14.2%
  1024 byte162.5 MBit/s   193.8 MBit/s   +19.3%
  1280 byte175.0 MBit/s   212.5 MBit/s   +21.4%
  1420 byte175.0 MBit/s   218.7 MBit/s   +25.0%
  1518 byte150.0 MBit/s   181.2 MBit/s   +20.8%

The throughput for the largest frame size is lower than for the
previous size because the IP packets need to be fragmented in this
case to make there way through the IPsec tunnel.

Signed-off-by: Mathias Krause mini...@googlemail.com
Cc: Maxim Locktyukhin maxim.locktyuk...@intel.com
---
 arch/x86/crypto/Makefile  |8 +
 arch/x86/crypto/sha1_ssse3_asm.S  |  558 +
 arch/x86/crypto/sha1_ssse3_glue.c |  240 
 arch/x86/include/asm/cpufeature.h |3 +
 crypto/Kconfig|   10 +
 5 files changed, 819 insertions(+), 0 deletions(-)
 create mode 100644 arch/x86/crypto/sha1_ssse3_asm.S
 create mode 100644 arch/x86/crypto/sha1_ssse3_glue.c

diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile
index c04f1b7..57c7f7b 100644
--- a/arch/x86/crypto/Makefile
+++ b/arch/x86/crypto/Makefile
@@ -13,6 +13,7 @@ obj-$(CONFIG_CRYPTO_AES_NI_INTEL) += aesni-intel.o
 obj-$(CONFIG_CRYPTO_GHASH_CLMUL_NI_INTEL) += ghash-clmulni-intel.o
 
 obj-$(CONFIG_CRYPTO_CRC32C_INTEL) += crc32c-intel.o
+obj-$(CONFIG_CRYPTO_SHA1_SSSE3) += sha1-ssse3.o
 
 aes-i586-y := aes-i586-asm_32.o aes_glue.o
 twofish-i586-y := twofish-i586-asm_32.o twofish_glue.o
@@ -25,3 +26,10 @@ salsa20-x86_64-y := salsa20-x86_64-asm_64.o salsa20_glue.o
 aesni-intel-y := aesni-intel_asm.o aesni-intel_glue.o fpu.o
 
 ghash-clmulni-intel-y := ghash-clmulni-intel_asm.o ghash-clmulni-intel_glue.o
+
+# enable AVX support only when $(AS) can actually assemble the instructions
+ifeq ($(call as-instr,vpxor %xmm0$(comma)%xmm1$(comma)%xmm2,yes,no),yes)
+AFLAGS_sha1_ssse3_asm.o += -DSHA1_ENABLE_AVX_SUPPORT
+CFLAGS_sha1_ssse3_glue.o += -DSHA1_ENABLE_AVX_SUPPORT
+endif
+sha1-ssse3-y := sha1_ssse3_asm.o sha1_ssse3_glue.o
diff --git a/arch/x86/crypto/sha1_ssse3_asm.S b/arch/x86/crypto/sha1_ssse3_asm.S
new file mode 100644
index 000..b2c2f57
--- /dev/null
+++ b/arch/x86/crypto/sha1_ssse3_asm.S
@@ -0,0 +1,558 @@
+/*
+ * This is a SIMD SHA-1 implementation. It requires the Intel(R) Supplemental
+ * SSE3 instruction set extensions introduced in Intel Core Microarchitecture
+ * processors. CPUs supporting Intel(R) AVX extensions will get an additional
+ * boost.
+ *
+ * This work was inspired by the vectorized implementation of Dean Gaudet.
+ * Additional information on it can be found at:
+ *http://www.arctic.org/~dean/crypto/sha1.html
+ *
+ * It was improved upon with more efficient vectorization of the message
+ * scheduling. This implementation has also been optimized for all current and
+ * several future generations of Intel CPUs.
+ *
+ * See this article for more information about the implementation details:
+ *   
http://software.intel.com/en-us/articles/improving-the-performance-of-the-secure-hash-algorithm-1/
+ *
+ * Copyright (C) 2010, Intel Corp.
+ *   Authors: Maxim Locktyukhin maxim.locktyuk...@intel.com
+ *Ronen Zohar ronen.zo...@intel.com
+ *
+ * Converted to ATT syntax and adapted for inclusion in the Linux kernel:
+ *   Author: Mathias Krause mini...@googlemail.com
+ *
+ * This program is free

Re: [PATCH v2 1/2] crypto, sha1: export sha1_update for reuse

2011-07-31 Thread Mathias Krause
On Sat, Jul 30, 2011 at 3:46 PM, Herbert Xu herb...@gondor.apana.org.au wrote:
 On Thu, Jul 28, 2011 at 05:29:35PM +0200, Mathias Krause wrote:
 On Thu, Jul 28, 2011 at 4:58 PM, Herbert Xu herb...@gondor.apana.org.au 
 wrote:
  On Sun, Jul 24, 2011 at 07:53:13PM +0200, Mathias Krause wrote:
 
  diff --git a/include/crypto/sha.h b/include/crypto/sha.h
  index 069e85b..7c46d0c 100644
  --- a/include/crypto/sha.h
  +++ b/include/crypto/sha.h
  @@ -82,4 +82,9 @@ struct sha512_state {
        u8 buf[SHA512_BLOCK_SIZE];
   };
 
  +#if defined(CONFIG_CRYPTO_SHA1) || defined (CONFIG_CRYPTO_SHA1_MODULE)
  +extern int crypto_sha1_update(struct shash_desc *desc, const u8 *data,
  +                           unsigned int len);
  +#endif
 
  Please remove the unnecessary #if.

 The function will only be available when crypto/sha1_generic.o will
 either be build into the kernel or build as a module. Without the #if
 a potential wrong user of this function might not be detected as early
 as at compilation time but as late as at link time, or even worse, as
 late as on module load time -- which is pretty bad. IMHO it's better
 to detect errors early, as in reading error: implicit declaration of
 function ‘crypto_sha1_update’ when trying to compile the code in
 question instead of Unknown symbol crypto_sha1_update in dmesg when
 trying to load the module. That for I would like to keep the #if.

 In order to be consistent please remove the ifdef.  In most
 similar cases in the crypto subsystem we don't do this.  As
 adding such ifdefs all over the place would gain very little,
 I'd much rather you left it out.

Noting that this function wasn't exported before and the only user
(sha-ssse3) ensures its availability by other means, it should be okay
to remove the #if. I'll update the patch accordingly.

Any objections to the second patch?

Thanks,
Mathias
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/2] crypto, sha1: export sha1_update for reuse

2011-07-28 Thread Mathias Krause
On Thu, Jul 28, 2011 at 4:58 PM, Herbert Xu herb...@gondor.apana.org.au wrote:
 On Sun, Jul 24, 2011 at 07:53:13PM +0200, Mathias Krause wrote:

 diff --git a/include/crypto/sha.h b/include/crypto/sha.h
 index 069e85b..7c46d0c 100644
 --- a/include/crypto/sha.h
 +++ b/include/crypto/sha.h
 @@ -82,4 +82,9 @@ struct sha512_state {
       u8 buf[SHA512_BLOCK_SIZE];
  };

 +#if defined(CONFIG_CRYPTO_SHA1) || defined (CONFIG_CRYPTO_SHA1_MODULE)
 +extern int crypto_sha1_update(struct shash_desc *desc, const u8 *data,
 +                           unsigned int len);
 +#endif

 Please remove the unnecessary #if.

The function will only be available when crypto/sha1_generic.o will
either be build into the kernel or build as a module. Without the #if
a potential wrong user of this function might not be detected as early
as at compilation time but as late as at link time, or even worse, as
late as on module load time -- which is pretty bad. IMHO it's better
to detect errors early, as in reading error: implicit declaration of
function ‘crypto_sha1_update’ when trying to compile the code in
question instead of Unknown symbol crypto_sha1_update in dmesg when
trying to load the module. That for I would like to keep the #if.

Thanks for the review!

Mathias
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] crypto, x86: SSSE3 based SHA1 implementation for x86-64

2011-07-24 Thread Mathias Krause
On Sat, Jul 16, 2011 at 2:44 PM, Mathias Krause mini...@googlemail.com wrote:
 diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile
 index c04f1b7..a80be92 100644
 --- a/arch/x86/crypto/Makefile
 +++ b/arch/x86/crypto/Makefile
 @@ -13,6 +13,7 @@ obj-$(CONFIG_CRYPTO_AES_NI_INTEL) += aesni-intel.o
  obj-$(CONFIG_CRYPTO_GHASH_CLMUL_NI_INTEL) += ghash-clmulni-intel.o

  obj-$(CONFIG_CRYPTO_CRC32C_INTEL) += crc32c-intel.o
 +obj-$(CONFIG_CRYPTO_SHA1_SSSE3) += sha1-ssse3.o

  aes-i586-y := aes-i586-asm_32.o aes_glue.o
  twofish-i586-y := twofish-i586-asm_32.o twofish_glue.o
 @@ -25,3 +26,10 @@ salsa20-x86_64-y := salsa20-x86_64-asm_64.o salsa20_glue.o
  aesni-intel-y := aesni-intel_asm.o aesni-intel_glue.o fpu.o

  ghash-clmulni-intel-y := ghash-clmulni-intel_asm.o ghash-clmulni-intel_glue.o
 +
 +# enable AVX support only when $(AS) can actually assemble the instructions
 +ifeq ($(call as-instr,vpxor %xmm0$(comma)%xmm1$(comma)%xmm2,yes,no),yes)
 +AFLAGS_sha1_ssse3.o += -DSHA1_ENABLE_AVX_SUPPORT

This should have been

AFLAGS_sha1_ssse3_asm.o += -DSHA1_ENABLE_AVX_SUPPORT

instead. Sorry, a missing adjustment for a last minute file rename.
I'll post a new version of the series with a wider target audience
since there have been no reply so far for a week.

 +CFLAGS_sha1_ssse3_glue.o += -DSHA1_ENABLE_AVX_SUPPORT
 +endif
 +sha1-ssse3-y := sha1_ssse3_asm.o sha1_ssse3_glue.o
 diff --git a/arch/x86/crypto/sha1_ssse3_asm.S 
 b/arch/x86/crypto/sha1_ssse3_asm.S
 new file mode 100644
 index 000..8fb0ba6


Thanks,
Mathias
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 0/2] crypto, x86: assembler implementation of SHA1

2011-07-24 Thread Mathias Krause
This patch series adds an assembler implementation for the SHA1 hash algorithm
for the x86-64 architecture. Its raw hash performance can be more than 2 times
faster than the generic C implementation. This gives a real world benefit for
IPsec with an throughput increase of up to +35%. For concrete numbers have a
look at the second patch.

This implementation is currently x86-64 only but might be ported to 32 bit with
some effort in a follow up patch. (I had no time to do this yet.)

Note: The SSSE3 is no typo, it's Supplemental SSE3.

v2 changes:
- fixed typo in Makefile making AVX version unusable
- whitespace fixes for the .S file

Regards,
Mathias

Mathias Krause (2):
  crypto, sha1: export sha1_update for reuse
  crypto, x86: SSSE3 based SHA1 implementation for x86-64

 arch/x86/crypto/Makefile  |8 +
 arch/x86/crypto/sha1_ssse3_asm.S  |  558 +
 arch/x86/crypto/sha1_ssse3_glue.c |  240 
 arch/x86/include/asm/cpufeature.h |3 +
 crypto/Kconfig|   10 +
 crypto/sha1_generic.c |9 +-
 include/crypto/sha.h  |5 +
 7 files changed, 829 insertions(+), 4 deletions(-)
 create mode 100644 arch/x86/crypto/sha1_ssse3_asm.S
 create mode 100644 arch/x86/crypto/sha1_ssse3_glue.c

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 1/2] crypto, sha1: export sha1_update for reuse

2011-07-24 Thread Mathias Krause
Export the update function as crypto_sha1_update() to not have the need
to reimplement the same algorithm for each SHA-1 implementation. This
way the generic SHA-1 implementation can be used as fallback for other
implementations that fail to run under certain circumstances, like the
need for an FPU context while executing in IRQ context.

Signed-off-by: Mathias Krause mini...@googlemail.com
---
 crypto/sha1_generic.c |9 +
 include/crypto/sha.h  |5 +
 2 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/crypto/sha1_generic.c b/crypto/sha1_generic.c
index 0416091..0b6d907 100644
--- a/crypto/sha1_generic.c
+++ b/crypto/sha1_generic.c
@@ -36,7 +36,7 @@ static int sha1_init(struct shash_desc *desc)
return 0;
 }
 
-static int sha1_update(struct shash_desc *desc, const u8 *data,
+int crypto_sha1_update(struct shash_desc *desc, const u8 *data,
unsigned int len)
 {
struct sha1_state *sctx = shash_desc_ctx(desc);
@@ -70,6 +70,7 @@ static int sha1_update(struct shash_desc *desc, const u8 
*data,
 
return 0;
 }
+EXPORT_SYMBOL(crypto_sha1_update);
 
 
 /* Add padding and return the message digest. */
@@ -86,10 +87,10 @@ static int sha1_final(struct shash_desc *desc, u8 *out)
/* Pad out to 56 mod 64 */
index = sctx-count  0x3f;
padlen = (index  56) ? (56 - index) : ((64+56) - index);
-   sha1_update(desc, padding, padlen);
+   crypto_sha1_update(desc, padding, padlen);
 
/* Append length */
-   sha1_update(desc, (const u8 *)bits, sizeof(bits));
+   crypto_sha1_update(desc, (const u8 *)bits, sizeof(bits));
 
/* Store state in digest */
for (i = 0; i  5; i++)
@@ -120,7 +121,7 @@ static int sha1_import(struct shash_desc *desc, const void 
*in)
 static struct shash_alg alg = {
.digestsize =   SHA1_DIGEST_SIZE,
.init   =   sha1_init,
-   .update =   sha1_update,
+   .update =   crypto_sha1_update,
.final  =   sha1_final,
.export =   sha1_export,
.import =   sha1_import,
diff --git a/include/crypto/sha.h b/include/crypto/sha.h
index 069e85b..7c46d0c 100644
--- a/include/crypto/sha.h
+++ b/include/crypto/sha.h
@@ -82,4 +82,9 @@ struct sha512_state {
u8 buf[SHA512_BLOCK_SIZE];
 };
 
+#if defined(CONFIG_CRYPTO_SHA1) || defined (CONFIG_CRYPTO_SHA1_MODULE)
+extern int crypto_sha1_update(struct shash_desc *desc, const u8 *data,
+ unsigned int len);
+#endif
+
 #endif
-- 
1.5.6.5

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 2/2] crypto, x86: SSSE3 based SHA1 implementation for x86-64

2011-07-24 Thread Mathias Krause
This is an assembler implementation of the SHA1 algorithm using the
Supplemental SSE3 (SSSE3) instructions or, when available, the
Advanced Vector Extensions (AVX).

Testing with the tcrypt module shows the raw hash performance is up to
2.3 times faster than the C implementation, using 8k data blocks on a
Core 2 Duo T5500. For the smalest data set (16 byte) it is still 25%
faster.

Since this implementation uses SSE/YMM registers it cannot safely be
used in every situation, e.g. while an IRQ interrupts a kernel thread.
The implementation falls back to the generic SHA1 variant, if using
the SSE/YMM registers is not possible.

With this algorithm I was able to increase the throughput of a single
IPsec link from 344 Mbit/s to 464 Mbit/s on a Core 2 Quad CPU using
the SSSE3 variant -- a speedup of +34.8%.

Saving and restoring SSE/YMM state might make the actual throughput
fluctuate when there are FPU intensive userland applications running.
For example, meassuring the performance using iperf2 directly on the
machine under test gives wobbling numbers because iperf2 uses the FPU
for each packet to check if the reporting interval has expired (in the
above test I got min/max/avg: 402/484/464 MBit/s).

Using this algorithm on a IPsec gateway gives much more reasonable and
stable numbers, albeit not as high as in the directly connected case.
Here is the result from an RFC 2544 test run with a EXFO Packet Blazer
FTB-8510:

 frame sizesha1-generic sha1-ssse3delta
64 byte 37.5 MBit/s37.5 MBit/s 0.0%
   128 byte 56.3 MBit/s62.5 MBit/s   +11.0%
   256 byte 87.5 MBit/s   100.0 MBit/s   +14.3%
   512 byte131.3 MBit/s   150.0 MBit/s   +14.2%
  1024 byte162.5 MBit/s   193.8 MBit/s   +19.3%
  1280 byte175.0 MBit/s   212.5 MBit/s   +21.4%
  1420 byte175.0 MBit/s   218.7 MBit/s   +25.0%
  1518 byte150.0 MBit/s   181.2 MBit/s   +20.8%

The throughput for the largest frame size is lower than for the
previous size because the IP packets need to be fragmented in this
case to make there way through the IPsec tunnel.

Signed-off-by: Mathias Krause mini...@googlemail.com
Cc: Maxim Locktyukhin maxim.locktyuk...@intel.com
---
 arch/x86/crypto/Makefile  |8 +
 arch/x86/crypto/sha1_ssse3_asm.S  |  558 +
 arch/x86/crypto/sha1_ssse3_glue.c |  240 
 arch/x86/include/asm/cpufeature.h |3 +
 crypto/Kconfig|   10 +
 5 files changed, 819 insertions(+), 0 deletions(-)
 create mode 100644 arch/x86/crypto/sha1_ssse3_asm.S
 create mode 100644 arch/x86/crypto/sha1_ssse3_glue.c

diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile
index c04f1b7..57c7f7b 100644
--- a/arch/x86/crypto/Makefile
+++ b/arch/x86/crypto/Makefile
@@ -13,6 +13,7 @@ obj-$(CONFIG_CRYPTO_AES_NI_INTEL) += aesni-intel.o
 obj-$(CONFIG_CRYPTO_GHASH_CLMUL_NI_INTEL) += ghash-clmulni-intel.o
 
 obj-$(CONFIG_CRYPTO_CRC32C_INTEL) += crc32c-intel.o
+obj-$(CONFIG_CRYPTO_SHA1_SSSE3) += sha1-ssse3.o
 
 aes-i586-y := aes-i586-asm_32.o aes_glue.o
 twofish-i586-y := twofish-i586-asm_32.o twofish_glue.o
@@ -25,3 +26,10 @@ salsa20-x86_64-y := salsa20-x86_64-asm_64.o salsa20_glue.o
 aesni-intel-y := aesni-intel_asm.o aesni-intel_glue.o fpu.o
 
 ghash-clmulni-intel-y := ghash-clmulni-intel_asm.o ghash-clmulni-intel_glue.o
+
+# enable AVX support only when $(AS) can actually assemble the instructions
+ifeq ($(call as-instr,vpxor %xmm0$(comma)%xmm1$(comma)%xmm2,yes,no),yes)
+AFLAGS_sha1_ssse3_asm.o += -DSHA1_ENABLE_AVX_SUPPORT
+CFLAGS_sha1_ssse3_glue.o += -DSHA1_ENABLE_AVX_SUPPORT
+endif
+sha1-ssse3-y := sha1_ssse3_asm.o sha1_ssse3_glue.o
diff --git a/arch/x86/crypto/sha1_ssse3_asm.S b/arch/x86/crypto/sha1_ssse3_asm.S
new file mode 100644
index 000..b2c2f57
--- /dev/null
+++ b/arch/x86/crypto/sha1_ssse3_asm.S
@@ -0,0 +1,558 @@
+/*
+ * This is a SIMD SHA-1 implementation. It requires the Intel(R) Supplemental
+ * SSE3 instruction set extensions introduced in Intel Core Microarchitecture
+ * processors. CPUs supporting Intel(R) AVX extensions will get an additional
+ * boost.
+ *
+ * This work was inspired by the vectorized implementation of Dean Gaudet.
+ * Additional information on it can be found at:
+ *http://www.arctic.org/~dean/crypto/sha1.html
+ *
+ * It was improved upon with more efficient vectorization of the message
+ * scheduling. This implementation has also been optimized for all current and
+ * several future generations of Intel CPUs.
+ *
+ * See this article for more information about the implementation details:
+ *   
http://software.intel.com/en-us/articles/improving-the-performance-of-the-secure-hash-algorithm-1/
+ *
+ * Copyright (C) 2010, Intel Corp.
+ *   Authors: Maxim Locktyukhin maxim.locktyuk...@intel.com
+ *Ronen Zohar ronen.zo...@intel.com
+ *
+ * Converted to ATT syntax and adapted for inclusion in the Linux kernel:
+ *   Author: Mathias Krause mini...@googlemail.com
+ *
+ * This program is free

[PATCH 1/2] crypto, sha1: export sha1_update for reuse

2011-07-16 Thread Mathias Krause
Export the update function as crypto_sha1_update() to not have the need
to reimplement the same algorithm for each SHA-1 implementation. This
way the generic SHA-1 implementation can be used as fallback for other
implementations that fail to run under certain circumstances, like the
need for an FPU context while executing in IRQ context.

Signed-off-by: Mathias Krause mini...@googlemail.com
---
 crypto/sha1_generic.c |9 +
 include/crypto/sha.h  |5 +
 2 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/crypto/sha1_generic.c b/crypto/sha1_generic.c
index 0416091..0b6d907 100644
--- a/crypto/sha1_generic.c
+++ b/crypto/sha1_generic.c
@@ -36,7 +36,7 @@ static int sha1_init(struct shash_desc *desc)
return 0;
 }
 
-static int sha1_update(struct shash_desc *desc, const u8 *data,
+int crypto_sha1_update(struct shash_desc *desc, const u8 *data,
unsigned int len)
 {
struct sha1_state *sctx = shash_desc_ctx(desc);
@@ -70,6 +70,7 @@ static int sha1_update(struct shash_desc *desc, const u8 
*data,
 
return 0;
 }
+EXPORT_SYMBOL(crypto_sha1_update);
 
 
 /* Add padding and return the message digest. */
@@ -86,10 +87,10 @@ static int sha1_final(struct shash_desc *desc, u8 *out)
/* Pad out to 56 mod 64 */
index = sctx-count  0x3f;
padlen = (index  56) ? (56 - index) : ((64+56) - index);
-   sha1_update(desc, padding, padlen);
+   crypto_sha1_update(desc, padding, padlen);
 
/* Append length */
-   sha1_update(desc, (const u8 *)bits, sizeof(bits));
+   crypto_sha1_update(desc, (const u8 *)bits, sizeof(bits));
 
/* Store state in digest */
for (i = 0; i  5; i++)
@@ -120,7 +121,7 @@ static int sha1_import(struct shash_desc *desc, const void 
*in)
 static struct shash_alg alg = {
.digestsize =   SHA1_DIGEST_SIZE,
.init   =   sha1_init,
-   .update =   sha1_update,
+   .update =   crypto_sha1_update,
.final  =   sha1_final,
.export =   sha1_export,
.import =   sha1_import,
diff --git a/include/crypto/sha.h b/include/crypto/sha.h
index 069e85b..7c46d0c 100644
--- a/include/crypto/sha.h
+++ b/include/crypto/sha.h
@@ -82,4 +82,9 @@ struct sha512_state {
u8 buf[SHA512_BLOCK_SIZE];
 };
 
+#if defined(CONFIG_CRYPTO_SHA1) || defined (CONFIG_CRYPTO_SHA1_MODULE)
+extern int crypto_sha1_update(struct shash_desc *desc, const u8 *data,
+ unsigned int len);
+#endif
+
 #endif
-- 
1.5.6.5

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] crypto, x86: SSSE3 based SHA1 implementation for x86-64

2011-07-16 Thread Mathias Krause
This is an assembler implementation of the SHA1 algorithm using the
Supplemental SSE3 (SSSE3) instructions or, when available, the
Advanced Vector Extensions (AVX).

Testing with the tcrypt module shows the raw hash performance is up to
2.3 times faster than the C implementation, using 8k data blocks on a
Core 2 Duo T5500. For the smalest data set (16 byte) it is still 25%
faster.

Since this implementation uses SSE/YMM registers it cannot safely be
used in every situation, e.g. while an IRQ interrupts a kernel thread.
The implementation falls back to the generic SHA1 variant, if using
the SSE/YMM registers is not possible.

With this algorithm I was able to increase the throughput of a single
IPsec link from 344 Mbit/s to 464 Mbit/s on a Core 2 Quad CPU using
the SSSE3 variant -- a speedup of +34.8%.

Saving and restoring SSE/YMM state might make the actual throughput
fluctuate when there are FPU intensive userland applications running.
For example, meassuring the performance using iperf2 directly on the
machine under test gives wobbling numbers because iperf2 uses the FPU
for each packet to check if the reporting interval has expired (in the
above test I got min/max/avg: 402/484/464 MBit/s).

Using this algorithm on a IPsec gateway gives much more reasonable and
stable numbers, albeit not as high as in the directly connected case.
Here is the result from an RFC 2544 test run with a EXFO Packet Blazer
FTB-8510:

 frame sizesha1-generic sha1-ssse3delta
64 byte 37.5 MBit/s37.5 MBit/s 0.0%
   128 byte 56.3 MBit/s62.5 MBit/s   +11.0%
   256 byte 87.5 MBit/s   100.0 MBit/s   +14.3%
   512 byte131.3 MBit/s   150.0 MBit/s   +14.2%
  1024 byte162.5 MBit/s   193.8 MBit/s   +19.3%
  1280 byte175.0 MBit/s   212.5 MBit/s   +21.4%
  1420 byte175.0 MBit/s   218.7 MBit/s   +25.0%
  1518 byte150.0 MBit/s   181.2 MBit/s   +20.8%

The throughput for the largest frame size is lower than for the
previous size because the IP packets need to be fragmented in this
case to make there way through the IPsec tunnel.

Signed-off-by: Mathias Krause mini...@googlemail.com
Cc: Maxim Locktyukhin maxim.locktyuk...@intel.com
---
 arch/x86/crypto/Makefile  |8 +
 arch/x86/crypto/sha1_ssse3_asm.S  |  558 +
 arch/x86/crypto/sha1_ssse3_glue.c |  240 
 arch/x86/include/asm/cpufeature.h |3 +
 crypto/Kconfig|   10 +
 5 files changed, 819 insertions(+), 0 deletions(-)
 create mode 100644 arch/x86/crypto/sha1_ssse3_asm.S
 create mode 100644 arch/x86/crypto/sha1_ssse3_glue.c

diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile
index c04f1b7..a80be92 100644
--- a/arch/x86/crypto/Makefile
+++ b/arch/x86/crypto/Makefile
@@ -13,6 +13,7 @@ obj-$(CONFIG_CRYPTO_AES_NI_INTEL) += aesni-intel.o
 obj-$(CONFIG_CRYPTO_GHASH_CLMUL_NI_INTEL) += ghash-clmulni-intel.o
 
 obj-$(CONFIG_CRYPTO_CRC32C_INTEL) += crc32c-intel.o
+obj-$(CONFIG_CRYPTO_SHA1_SSSE3) += sha1-ssse3.o
 
 aes-i586-y := aes-i586-asm_32.o aes_glue.o
 twofish-i586-y := twofish-i586-asm_32.o twofish_glue.o
@@ -25,3 +26,10 @@ salsa20-x86_64-y := salsa20-x86_64-asm_64.o salsa20_glue.o
 aesni-intel-y := aesni-intel_asm.o aesni-intel_glue.o fpu.o
 
 ghash-clmulni-intel-y := ghash-clmulni-intel_asm.o ghash-clmulni-intel_glue.o
+
+# enable AVX support only when $(AS) can actually assemble the instructions
+ifeq ($(call as-instr,vpxor %xmm0$(comma)%xmm1$(comma)%xmm2,yes,no),yes)
+AFLAGS_sha1_ssse3.o += -DSHA1_ENABLE_AVX_SUPPORT
+CFLAGS_sha1_ssse3_glue.o += -DSHA1_ENABLE_AVX_SUPPORT
+endif
+sha1-ssse3-y := sha1_ssse3_asm.o sha1_ssse3_glue.o
diff --git a/arch/x86/crypto/sha1_ssse3_asm.S b/arch/x86/crypto/sha1_ssse3_asm.S
new file mode 100644
index 000..8fb0ba6
--- /dev/null
+++ b/arch/x86/crypto/sha1_ssse3_asm.S
@@ -0,0 +1,558 @@
+/*
+ * This is a SIMD SHA-1 implementation. It requires the Intel(R) Supplemental
+ * SSE3 instruction set extensions introduced in Intel Core Microarchitecture
+ * processors. CPUs supporting Intel(R) AVX extensions will get an additional
+ * boost.
+ *
+ * This work was inspired by the vectorized implementation of Dean Gaudet.
+ * Additional information on it can be found at:
+ *http://www.arctic.org/~dean/crypto/sha1.html
+ *
+ * It was improved upon with more efficient vectorization of the message
+ * scheduling. This implementation has also been optimized for all current and
+ * several future generations of Intel CPUs.
+ *
+ * See this article for more information about the implementation details:
+ *   
http://software.intel.com/en-us/articles/improving-the-performance-of-the-secure-hash-algorithm-1/
+ *
+ * Copyright (C) 2010, Intel Corp.
+ *   Authors: Maxim Locktyukhin maxim.locktyuk...@intel.com
+ *Ronen Zohar ronen.zo...@intel.com
+ *
+ * Converted to ATT syntax and adapted for inclusion in the Linux kernel:
+ *   Author: Mathias Krause mini...@googlemail.com
+ *
+ * This program is free

[PATCH] crypto, gf128: fix call to memset()

2011-07-07 Thread Mathias Krause
In gf128mul_lle() and gf128mul_bbe() r isn't completely initialized with
zero because the size argument passed to memset() is the size of the
pointer, not the structure it points to.

Luckily there are no in-kernel users of those functions so the ABI
change implied by this fix should break no existing code.

Based on a patch by the PaX Team.

Signed-off-by: Mathias Krause mini...@googlemail.com
Cc: PaX Team pagee...@freemail.hu
---
 crypto/gf128mul.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/crypto/gf128mul.c b/crypto/gf128mul.c
index df35e4c..5276607 100644
--- a/crypto/gf128mul.c
+++ b/crypto/gf128mul.c
@@ -182,7 +182,7 @@ void gf128mul_lle(be128 *r, const be128 *b)
for (i = 0; i  7; ++i)
gf128mul_x_lle(p[i + 1], p[i]);
 
-   memset(r, 0, sizeof(r));
+   memset(r, 0, sizeof(*r));
for (i = 0;;) {
u8 ch = ((u8 *)b)[15 - i];
 
@@ -220,7 +220,7 @@ void gf128mul_bbe(be128 *r, const be128 *b)
for (i = 0; i  7; ++i)
gf128mul_x_bbe(p[i + 1], p[i]);
 
-   memset(r, 0, sizeof(r));
+   memset(r, 0, sizeof(*r));
for (i = 0;;) {
u8 ch = ((u8 *)b)[i];
 
-- 
1.5.6.5

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: linux-next: Tree for November 29 (aesni-intel)

2010-11-29 Thread Mathias Krause
On 29.11.2010, 17:31 Randy Dunlap wrote:
 On Mon, 29 Nov 2010 14:03:35 +1100 Stephen Rothwell wrote:
 
 Hi all,
 
 Changes since 20101126:
 
 
 on i386 builds, I get tons of these (and more) errors:
 
 arch/x86/crypto/aesni-intel_asm.S:841: Error: bad register name `%r12'
 arch/x86/crypto/aesni-intel_asm.S:842: Error: bad register name `%r13'
 arch/x86/crypto/aesni-intel_asm.S:843: Error: bad register name `%r14'
 arch/x86/crypto/aesni-intel_asm.S:844: Error: bad register name `%rsp'
 arch/x86/crypto/aesni-intel_asm.S:849: Error: bad register name `%rsp'
 arch/x86/crypto/aesni-intel_asm.S:850: Error: bad register name `%rsp'
 arch/x86/crypto/aesni-intel_asm.S:851: Error: bad register name `%r9'
 
 even though the kernel .config file says:
 
 CONFIG_CRYPTO_AES=m
 CONFIG_CRYPTO_AES_586=m
 CONFIG_CRYPTO_AES_NI_INTEL=m
 
 Should arch/x86/crypto/aesni-intel_asm.S be testing
 #ifdef CONFIG_X86_64
 instead of
 #ifdef __x86_64__
 or does that not matter?
 
 or is this a toolchain issue?

Well, __x86_64__ should be a build-in define of the compiler while
CONFIG_X86_64 is defined for 64 bit builds in include/generated/autoconf.h.
So by using the latter we should be on the safe side but if your compiler
defines __x86_64__ for 32-bit builds it's simply broken. Also git grep
showed quite a few more places using __x86_64__ so those would miscompile on
your toolchain, too.

But it looks like linux-next is just missing
559ad0ff1368baea14dbc3207d55b02bd69bda4b from Herbert's git repo at
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git.
That should fix the build issue.

Kind Regards,
Mathias
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: linux-next: Tree for November 29 (aesni-intel)

2010-11-29 Thread Mathias Krause
On 29.11.2010, 19:54 Randy Dunlap wrote:
 On 11/29/10 10:26, Mathias Krause wrote:
 On 29.11.2010, 17:31 Randy Dunlap wrote:
 On Mon, 29 Nov 2010 14:03:35 +1100 Stephen Rothwell wrote:

 Hi all,

 Changes since 20101126:


 on i386 builds, I get tons of these (and more) errors:

 arch/x86/crypto/aesni-intel_asm.S:841: Error: bad register name `%r12'
 arch/x86/crypto/aesni-intel_asm.S:842: Error: bad register name `%r13'
 arch/x86/crypto/aesni-intel_asm.S:843: Error: bad register name `%r14'
 arch/x86/crypto/aesni-intel_asm.S:844: Error: bad register name `%rsp'
 arch/x86/crypto/aesni-intel_asm.S:849: Error: bad register name `%rsp'
 arch/x86/crypto/aesni-intel_asm.S:850: Error: bad register name `%rsp'
 arch/x86/crypto/aesni-intel_asm.S:851: Error: bad register name `%r9'

 even though the kernel .config file says:

 CONFIG_CRYPTO_AES=m
 CONFIG_CRYPTO_AES_586=m
 CONFIG_CRYPTO_AES_NI_INTEL=m

 Should arch/x86/crypto/aesni-intel_asm.S be testing
 #ifdef CONFIG_X86_64
 instead of
 #ifdef __x86_64__
 or does that not matter?

 or is this a toolchain issue?
 
 Well, __x86_64__ should be a build-in define of the compiler while
 CONFIG_X86_64 is defined for 64 bit builds in include/generated/autoconf.h.
 So by using the latter we should be on the safe side but if your compiler
 defines __x86_64__ for 32-bit builds it's simply broken. Also git grep
 showed quite a few more places using __x86_64__ so those would miscompile on
 your toolchain, too.
 
 But it looks like linux-next is just missing
 559ad0ff1368baea14dbc3207d55b02bd69bda4b from Herbert's git repo at
 git://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git.
 That should fix the build issue.

 The build problem still happens when that patch is applied.

That's weird. So it must be something with your toolchain.
Can you please post the output of the following commands?:

$ touch /tmp/null.c; cc -m32 -dD -E /tmp/null.c | grep -E 'x86|i.86'
$ touch /tmp/null.c; cc -m64 -dD -E /tmp/null.c | grep -E 'x86|i.86'

Beside that, the patch below should fix the issue with your toolchain by using
CONFIG_X86_64 instead of __x86_64__.

Sorry for the inconvenience,
Mathias

[PATCH] crypto: aesni-intel - Fixed another build error on x86-32

It looks like not all compilers undef __x86_64__ for 32-bit builds so
switch to CONFIG_X86_64 to test if we're building for 64 or 32 bit.

Signed-off-by: Mathias Krause mini...@googlemail.com
---
 arch/x86/crypto/aesni-intel_asm.S |   40 ++--
 1 files changed, 20 insertions(+), 20 deletions(-)

diff --git a/arch/x86/crypto/aesni-intel_asm.S 
b/arch/x86/crypto/aesni-intel_asm.S
index d528fde..de0ec32 100644
--- a/arch/x86/crypto/aesni-intel_asm.S
+++ b/arch/x86/crypto/aesni-intel_asm.S
@@ -32,7 +32,7 @@
 #include linux/linkage.h
 #include asm/inst.h
 
-#ifdef __x86_64__
+#ifdef CONFIG_X86_64
 .data
 POLY:   .octa 0xC201
 TWOONE: .octa 0x00010001
@@ -105,7 +105,7 @@ enc:.octa 0x2
 #define CTR%xmm11
 #define INC%xmm12
 
-#ifdef __x86_64__
+#ifdef CONFIG_X86_64
 #define AREG   %rax
 #define KEYP   %rdi
 #define OUTP   %rsi
@@ -132,7 +132,7 @@ enc:.octa 0x2
 #endif
 
 
-#ifdef __x86_64__
+#ifdef CONFIG_X86_64
 /* GHASH_MUL MACRO to implement: Data*HashKey mod (128,127,126,121,0)
 *
 *
@@ -1333,7 +1333,7 @@ _key_expansion_256b:
  *   unsigned int key_len)
  */
 ENTRY(aesni_set_key)
-#ifndef __x86_64__
+#ifndef CONFIG_X86_64
pushl KEYP
movl 8(%esp), KEYP  # ctx
movl 12(%esp), UKEYP# in_key
@@ -1435,7 +1435,7 @@ ENTRY(aesni_set_key)
cmp TKEYP, KEYP
jb .Ldec_key_loop
xor AREG, AREG
-#ifndef __x86_64__
+#ifndef CONFIG_X86_64
popl KEYP
 #endif
ret
@@ -1444,7 +1444,7 @@ ENTRY(aesni_set_key)
  * void aesni_enc(struct crypto_aes_ctx *ctx, u8 *dst, const u8 *src)
  */
 ENTRY(aesni_enc)
-#ifndef __x86_64__
+#ifndef CONFIG_X86_64
pushl KEYP
pushl KLEN
movl 12(%esp), KEYP
@@ -1455,7 +1455,7 @@ ENTRY(aesni_enc)
movups (INP), STATE # input
call _aesni_enc1
movups STATE, (OUTP)# output
-#ifndef __x86_64__
+#ifndef CONFIG_X86_64
popl KLEN
popl KEYP
 #endif
@@ -1630,7 +1630,7 @@ _aesni_enc4:
  * void aesni_dec (struct crypto_aes_ctx *ctx, u8 *dst, const u8 *src)
  */
 ENTRY(aesni_dec)
-#ifndef __x86_64__
+#ifndef CONFIG_X86_64
pushl KEYP
pushl KLEN
movl 12(%esp), KEYP
@@ -1642,7 +1642,7 @@ ENTRY(aesni_dec)
movups (INP), STATE # input
call _aesni_dec1
movups STATE, (OUTP)#output
-#ifndef __x86_64__
+#ifndef CONFIG_X86_64
popl KLEN
popl KEYP
 #endif
@@ -1818,7 +1818,7 @@ _aesni_dec4:
  *   size_t len)
  */
 ENTRY(aesni_ecb_enc)
-#ifndef __x86_64__
+#ifndef CONFIG_X86_64
pushl LEN
pushl KEYP
pushl KLEN
@@ -1863,7 +1863,7

Re: linux-next: Tree for November 29 (aesni-intel)

2010-11-29 Thread Mathias Krause
On 29.11.2010, 20:31 Randy Dunlap wrote:
 On 11/29/10 11:21, Mathias Krause wrote:
 On 29.11.2010, 19:54 Randy Dunlap wrote:
 On 11/29/10 10:26, Mathias Krause wrote:
 On 29.11.2010, 17:31 Randy Dunlap wrote:
 On Mon, 29 Nov 2010 14:03:35 +1100 Stephen Rothwell wrote:
 
 Hi all,
 
 Changes since 20101126:
 
 
 on i386 builds, I get tons of these (and more) errors:
 
 arch/x86/crypto/aesni-intel_asm.S:841: Error: bad register name `%r12'
 arch/x86/crypto/aesni-intel_asm.S:842: Error: bad register name `%r13'
 arch/x86/crypto/aesni-intel_asm.S:843: Error: bad register name `%r14'
 arch/x86/crypto/aesni-intel_asm.S:844: Error: bad register name `%rsp'
 arch/x86/crypto/aesni-intel_asm.S:849: Error: bad register name `%rsp'
 arch/x86/crypto/aesni-intel_asm.S:850: Error: bad register name `%rsp'
 arch/x86/crypto/aesni-intel_asm.S:851: Error: bad register name `%r9'
 
 even though the kernel .config file says:
 
 CONFIG_CRYPTO_AES=m
 CONFIG_CRYPTO_AES_586=m
 CONFIG_CRYPTO_AES_NI_INTEL=m
 
 Should arch/x86/crypto/aesni-intel_asm.S be testing
 #ifdef CONFIG_X86_64
 instead of
 #ifdef __x86_64__
 or does that not matter?
 
 or is this a toolchain issue?
 
 Well, __x86_64__ should be a build-in define of the compiler while
 CONFIG_X86_64 is defined for 64 bit builds in include/generated/autoconf.h.
 So by using the latter we should be on the safe side but if your compiler
 defines __x86_64__ for 32-bit builds it's simply broken. Also git grep
 showed quite a few more places using __x86_64__ so those would miscompile 
 on
 your toolchain, too.
 
 But it looks like linux-next is just missing
 559ad0ff1368baea14dbc3207d55b02bd69bda4b from Herbert's git repo at
 git://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git.
 That should fix the build issue.
 
 The build problem still happens when that patch is applied.
 
 That's weird. So it must be something with your toolchain.
 Can you please post the output of the following commands?:
 
 $ touch /tmp/null.c; cc -m32 -dD -E /tmp/null.c | grep -E 'x86|i.86'
 
 #define __i386 1
 #define __i386__ 1
 #define i386 1
 #define __i586 1
 #define __i586__ 1
 
 $ touch /tmp/null.c; cc -m64 -dD -E /tmp/null.c | grep -E 'x86|i.86'
 
 #define __x86_64 1
 #define __x86_64__ 1
 
 So that's not the problem... and the patch below didn't help.

That's odd. The output of the commands looks good so the x86-64 specific code
should be left out for 32-bit builds. :/

 Sorry that I even asked about that.  What next?

Can you please post the full error message. Meanwhile I'm checking out a
linux-next tree, trying to reproduce your problem.

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: linux-next: Tree for November 29 (aesni-intel)

2010-11-29 Thread Mathias Krause
On 29.11.2010, 20:31 Randy Dunlap wrote:
 On 11/29/10 11:21, Mathias Krause wrote:
 On 29.11.2010, 19:54 Randy Dunlap wrote:
 On 11/29/10 10:26, Mathias Krause wrote:
 On 29.11.2010, 17:31 Randy Dunlap wrote:
 On Mon, 29 Nov 2010 14:03:35 +1100 Stephen Rothwell wrote:
 
 Hi all,
 
 Changes since 20101126:
 
 
 on i386 builds, I get tons of these (and more) errors:
 
 arch/x86/crypto/aesni-intel_asm.S:841: Error: bad register name `%r12'
 arch/x86/crypto/aesni-intel_asm.S:842: Error: bad register name `%r13'
 arch/x86/crypto/aesni-intel_asm.S:843: Error: bad register name `%r14'
 arch/x86/crypto/aesni-intel_asm.S:844: Error: bad register name `%rsp'
 arch/x86/crypto/aesni-intel_asm.S:849: Error: bad register name `%rsp'
 arch/x86/crypto/aesni-intel_asm.S:850: Error: bad register name `%rsp'
 arch/x86/crypto/aesni-intel_asm.S:851: Error: bad register name `%r9'
 
 even though the kernel .config file says:
 
 CONFIG_CRYPTO_AES=m
 CONFIG_CRYPTO_AES_586=m
 CONFIG_CRYPTO_AES_NI_INTEL=m
 
 Should arch/x86/crypto/aesni-intel_asm.S be testing
 #ifdef CONFIG_X86_64
 instead of
 #ifdef __x86_64__
 or does that not matter?
 
 or is this a toolchain issue?
 
 Well, __x86_64__ should be a build-in define of the compiler while
 CONFIG_X86_64 is defined for 64 bit builds in include/generated/autoconf.h.
 So by using the latter we should be on the safe side but if your compiler
 defines __x86_64__ for 32-bit builds it's simply broken. Also git grep
 showed quite a few more places using __x86_64__ so those would miscompile 
 on
 your toolchain, too.
 
 But it looks like linux-next is just missing
 559ad0ff1368baea14dbc3207d55b02bd69bda4b from Herbert's git repo at
 git://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git.
 That should fix the build issue.
 
 The build problem still happens when that patch is applied.
 
 That's weird. So it must be something with your toolchain.
 Can you please post the output of the following commands?:
 
 $ touch /tmp/null.c; cc -m32 -dD -E /tmp/null.c | grep -E 'x86|i.86'
 
 #define __i386 1
 #define __i386__ 1
 #define i386 1
 #define __i586 1
 #define __i586__ 1
 
 $ touch /tmp/null.c; cc -m64 -dD -E /tmp/null.c | grep -E 'x86|i.86'
 
 #define __x86_64 1
 #define __x86_64__ 1
 
 So that's not the problem... and the patch below didn't help.
 Sorry that I even asked about that.  What next?

Sorry, I cannot reproduce the problem with the latest linux-next and commit
559ad0ff1368baea14dbc3207d55b02bd69bda4b from cryptodev-2.6 applied. Please
ensure you've applied that patch.

Regards,
Mathias--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: linux-next: Tree for November 29 (aesni-intel)

2010-11-29 Thread Mathias Krause
On 29.11.2010, 20:54 Randy Dunlap wrote:
 On 11/29/10 11:45, Mathias Krause wrote:
 On 29.11.2010, 20:31 Randy Dunlap wrote:
 On 11/29/10 11:21, Mathias Krause wrote:
 On 29.11.2010, 19:54 Randy Dunlap wrote:
 On 11/29/10 10:26, Mathias Krause wrote:
 On 29.11.2010, 17:31 Randy Dunlap wrote:
 On Mon, 29 Nov 2010 14:03:35 +1100 Stephen Rothwell wrote:
 
 Hi all,
 
 Changes since 20101126:
 
 
 on i386 builds, I get tons of these (and more) errors:
 
 arch/x86/crypto/aesni-intel_asm.S:841: Error: bad register name `%r12'
 arch/x86/crypto/aesni-intel_asm.S:842: Error: bad register name `%r13'
 arch/x86/crypto/aesni-intel_asm.S:843: Error: bad register name `%r14'
 arch/x86/crypto/aesni-intel_asm.S:844: Error: bad register name `%rsp'
 arch/x86/crypto/aesni-intel_asm.S:849: Error: bad register name `%rsp'
 arch/x86/crypto/aesni-intel_asm.S:850: Error: bad register name `%rsp'
 arch/x86/crypto/aesni-intel_asm.S:851: Error: bad register name `%r9'
 
 even though the kernel .config file says:
 
 CONFIG_CRYPTO_AES=m
 CONFIG_CRYPTO_AES_586=m
 CONFIG_CRYPTO_AES_NI_INTEL=m
 
 Should arch/x86/crypto/aesni-intel_asm.S be testing
 #ifdef CONFIG_X86_64
 instead of
 #ifdef __x86_64__
 or does that not matter?
 
 or is this a toolchain issue?
 
 Well, __x86_64__ should be a build-in define of the compiler while
 CONFIG_X86_64 is defined for 64 bit builds in 
 include/generated/autoconf.h.
 So by using the latter we should be on the safe side but if your compiler
 defines __x86_64__ for 32-bit builds it's simply broken. Also git grep
 showed quite a few more places using __x86_64__ so those would 
 miscompile on
 your toolchain, too.
 
 But it looks like linux-next is just missing
 559ad0ff1368baea14dbc3207d55b02bd69bda4b from Herbert's git repo at
 git://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git.
 That should fix the build issue.
 
 The build problem still happens when that patch is applied.
 
 That's weird. So it must be something with your toolchain.
 Can you please post the output of the following commands?:
 
 $ touch /tmp/null.c; cc -m32 -dD -E /tmp/null.c | grep -E 'x86|i.86'
 
 #define __i386 1
 #define __i386__ 1
 #define i386 1
 #define __i586 1
 #define __i586__ 1
 
 $ touch /tmp/null.c; cc -m64 -dD -E /tmp/null.c | grep -E 'x86|i.86'
 
 #define __x86_64 1
 #define __x86_64__ 1
 
 So that's not the problem... and the patch below didn't help.
 
 That's odd. The output of the commands looks good so the x86-64 specific code
 should be left out for 32-bit builds. :/
 
 Sorry that I even asked about that.  What next?
 
 Can you please post the full error message. Meanwhile I'm checking out a
 linux-next tree, trying to reproduce your problem.
 
 
 I just built with make V=1 to see the full commands that are used, but
 that didn't help me either:
 
  gcc -Wp,-MD,arch/x86/crypto/.aesni-intel_asm.o.d  -nostdinc -isystem 
 /usr/lib/gcc/x86_64-redhat-linux/4.4.1/include 
 -I/lnx/src/NEXT/linux-next-20101129/arch/x86/include -Iinclude  
 -I/lnx/src/NEXT/linux-next-20101129/include -include 
 include/generated/autoconf.h -D__KERNEL__ -D__ASSEMBLY__ -m32 
 -DCONFIG_AS_CFI=1 -DCONFIG_AS_CFI_SIGNAL_FRAME=1 -DMODULE  -c -o 
 arch/x86/crypto/aesni-intel_asm.o 
 /lnx/src/NEXT/linux-next-20101129/arch/x86/crypto/aesni-intel_asm.S
 
 
 There are 2945 lines like this:
 
 linux-next-20101129/arch/x86/crypto/aesni-intel_asm.S:841: Error: bad 
 register name `%r12'

Well, in my tree (linux-next + 559ad0ff) line 841 is a comment. Albeit without
559ad0ff it's a 'push %r12'. So maybe you should apply the patch just once
more to be sure. ;)

 It's around 311 KB, so I'll just put it here instead of emailing it:
 http://oss.oracle.com/~rdunlap/doc/cry32.out
 
 -- 
 ~Randy
 *** Remember to use Documentation/SubmitChecklist when testing your code ***

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: linux-next: Tree for November 29 (aesni-intel)

2010-11-29 Thread Mathias Krause
On 29.11.2010, 21:11 Randy Dunlap wrote:
 On 11/29/10 12:02, Mathias Krause wrote:
 On 29.11.2010, 20:54 Randy Dunlap wrote:
 On 11/29/10 11:45, Mathias Krause wrote:
 On 29.11.2010, 20:31 Randy Dunlap wrote:
 On 11/29/10 11:21, Mathias Krause wrote:
 On 29.11.2010, 19:54 Randy Dunlap wrote:
 On 11/29/10 10:26, Mathias Krause wrote:
 On 29.11.2010, 17:31 Randy Dunlap wrote:
 On Mon, 29 Nov 2010 14:03:35 +1100 Stephen Rothwell wrote:
 
 Hi all,
 
 Changes since 20101126:
 
 
 on i386 builds, I get tons of these (and more) errors:
 
 arch/x86/crypto/aesni-intel_asm.S:841: Error: bad register name `%r12'
 arch/x86/crypto/aesni-intel_asm.S:842: Error: bad register name `%r13'
 arch/x86/crypto/aesni-intel_asm.S:843: Error: bad register name `%r14'
 arch/x86/crypto/aesni-intel_asm.S:844: Error: bad register name `%rsp'
 arch/x86/crypto/aesni-intel_asm.S:849: Error: bad register name `%rsp'
 arch/x86/crypto/aesni-intel_asm.S:850: Error: bad register name `%rsp'
 arch/x86/crypto/aesni-intel_asm.S:851: Error: bad register name `%r9'
 
 even though the kernel .config file says:
 
 CONFIG_CRYPTO_AES=m
 CONFIG_CRYPTO_AES_586=m
 CONFIG_CRYPTO_AES_NI_INTEL=m
 
 Should arch/x86/crypto/aesni-intel_asm.S be testing
 #ifdef CONFIG_X86_64
 instead of
 #ifdef __x86_64__
 or does that not matter?
 
 or is this a toolchain issue?
 
 Well, __x86_64__ should be a build-in define of the compiler while
 CONFIG_X86_64 is defined for 64 bit builds in 
 include/generated/autoconf.h.
 So by using the latter we should be on the safe side but if your 
 compiler
 defines __x86_64__ for 32-bit builds it's simply broken. Also git grep
 showed quite a few more places using __x86_64__ so those would 
 miscompile on
 your toolchain, too.
 
 But it looks like linux-next is just missing
 559ad0ff1368baea14dbc3207d55b02bd69bda4b from Herbert's git repo at
 git://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git.
 That should fix the build issue.
 
 The build problem still happens when that patch is applied.
 
 That's weird. So it must be something with your toolchain.
 Can you please post the output of the following commands?:
 
 $ touch /tmp/null.c; cc -m32 -dD -E /tmp/null.c | grep -E 'x86|i.86'
 
 #define __i386 1
 #define __i386__ 1
 #define i386 1
 #define __i586 1
 #define __i586__ 1
 
 $ touch /tmp/null.c; cc -m64 -dD -E /tmp/null.c | grep -E 'x86|i.86'
 
 #define __x86_64 1
 #define __x86_64__ 1
 
 So that's not the problem... and the patch below didn't help.
 
 That's odd. The output of the commands looks good so the x86-64 specific 
 code
 should be left out for 32-bit builds. :/
 
 Sorry that I even asked about that.  What next?
 
 Can you please post the full error message. Meanwhile I'm checking out a
 linux-next tree, trying to reproduce your problem.
 
 
 I just built with make V=1 to see the full commands that are used, but
 that didn't help me either:
 
 gcc -Wp,-MD,arch/x86/crypto/.aesni-intel_asm.o.d  -nostdinc -isystem 
 /usr/lib/gcc/x86_64-redhat-linux/4.4.1/include 
 -I/lnx/src/NEXT/linux-next-20101129/arch/x86/include -Iinclude  
 -I/lnx/src/NEXT/linux-next-20101129/include -include 
 include/generated/autoconf.h -D__KERNEL__ -D__ASSEMBLY__ -m32 
 -DCONFIG_AS_CFI=1 -DCONFIG_AS_CFI_SIGNAL_FRAME=1 -DMODULE  -c -o 
 arch/x86/crypto/aesni-intel_asm.o 
 /lnx/src/NEXT/linux-next-20101129/arch/x86/crypto/aesni-intel_asm.S
 
 
 There are 2945 lines like this:
 
 linux-next-20101129/arch/x86/crypto/aesni-intel_asm.S:841: Error: bad 
 register name `%r12'
 
 Well, in my tree (linux-next + 559ad0ff) line 841 is a comment. Albeit 
 without
 559ad0ff it's a 'push %r12'. So maybe you should apply the patch just once
 more to be sure. ;)
 
 Touche.
 What does that patch have to do with aesni-intel??

The description should be clear enough: crypto: aesni-intel - Fixed build error
on x86-32.
Here is the link to the patch: 
http://git.kernel.org/?p=linux/kernel/git/herbert/cryptodev-2.6.git;a=patch;h=559ad0ff1368baea14dbc3207d55b02bd69bda4b.
 Please apply it on
top of your linux-next build.

 I'm using the linux-next tarball of 2029.
 However, your s/__x86_64__/CONFIG_X86_64/ patch was applied, so I dropped it.

Well I doubt it. The patch was made on top of 559ad0ff so it should have failed
to apply in your tree since obviously 559ad0ff is missing.

 new output file:
 http://oss.oracle.com/~rdunlap/doc/cry4.out

Same bug: 559ad0ff is still missing. Please apply the patch from the link above.

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: linux-next: Tree for November 29 (aesni-intel)

2010-11-29 Thread Mathias Krause
On 29.11.2010, 21:37 Randy Dunlap wrote:
 On 11/29/10 12:21, Mathias Krause wrote:
 On 29.11.2010, 21:11 Randy Dunlap wrote:
 On 11/29/10 12:02, Mathias Krause wrote:
 On 29.11.2010, 20:54 Randy Dunlap wrote:
 On 11/29/10 11:45, Mathias Krause wrote:
 On 29.11.2010, 20:31 Randy Dunlap wrote:
 On 11/29/10 11:21, Mathias Krause wrote:
 On 29.11.2010, 19:54 Randy Dunlap wrote:
 On 11/29/10 10:26, Mathias Krause wrote:
 On 29.11.2010, 17:31 Randy Dunlap wrote:
 On Mon, 29 Nov 2010 14:03:35 +1100 Stephen Rothwell wrote:
 
 Hi all,
 
 Changes since 20101126:
 
 
 on i386 builds, I get tons of these (and more) errors:
 
 arch/x86/crypto/aesni-intel_asm.S:841: Error: bad register name 
 `%r12'
 arch/x86/crypto/aesni-intel_asm.S:842: Error: bad register name 
 `%r13'
 arch/x86/crypto/aesni-intel_asm.S:843: Error: bad register name 
 `%r14'
 arch/x86/crypto/aesni-intel_asm.S:844: Error: bad register name 
 `%rsp'
 arch/x86/crypto/aesni-intel_asm.S:849: Error: bad register name 
 `%rsp'
 arch/x86/crypto/aesni-intel_asm.S:850: Error: bad register name 
 `%rsp'
 arch/x86/crypto/aesni-intel_asm.S:851: Error: bad register name 
 `%r9'
 
 even though the kernel .config file says:
 
 CONFIG_CRYPTO_AES=m
 CONFIG_CRYPTO_AES_586=m
 CONFIG_CRYPTO_AES_NI_INTEL=m
 
 Should arch/x86/crypto/aesni-intel_asm.S be testing
 #ifdef CONFIG_X86_64
 instead of
 #ifdef __x86_64__
 or does that not matter?
 
 or is this a toolchain issue?
 
 Well, __x86_64__ should be a build-in define of the compiler while
 CONFIG_X86_64 is defined for 64 bit builds in 
 include/generated/autoconf.h.
 So by using the latter we should be on the safe side but if your 
 compiler
 defines __x86_64__ for 32-bit builds it's simply broken. Also git 
 grep
 showed quite a few more places using __x86_64__ so those would 
 miscompile on
 your toolchain, too.
 
 But it looks like linux-next is just missing
 559ad0ff1368baea14dbc3207d55b02bd69bda4b from Herbert's git repo at
 git://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git.
 That should fix the build issue.
 
 The build problem still happens when that patch is applied.
 
 That's weird. So it must be something with your toolchain.
 Can you please post the output of the following commands?:
 
 $ touch /tmp/null.c; cc -m32 -dD -E /tmp/null.c | grep -E 'x86|i.86'
 
 #define __i386 1
 #define __i386__ 1
 #define i386 1
 #define __i586 1
 #define __i586__ 1
 
 $ touch /tmp/null.c; cc -m64 -dD -E /tmp/null.c | grep -E 'x86|i.86'
 
 #define __x86_64 1
 #define __x86_64__ 1
 
 So that's not the problem... and the patch below didn't help.
 
 That's odd. The output of the commands looks good so the x86-64 specific 
 code
 should be left out for 32-bit builds. :/
 
 Sorry that I even asked about that.  What next?
 
 Can you please post the full error message. Meanwhile I'm checking out a
 linux-next tree, trying to reproduce your problem.
 
 
 I just built with make V=1 to see the full commands that are used, but
 that didn't help me either:
 
 gcc -Wp,-MD,arch/x86/crypto/.aesni-intel_asm.o.d  -nostdinc -isystem 
 /usr/lib/gcc/x86_64-redhat-linux/4.4.1/include 
 -I/lnx/src/NEXT/linux-next-20101129/arch/x86/include -Iinclude  
 -I/lnx/src/NEXT/linux-next-20101129/include -include 
 include/generated/autoconf.h -D__KERNEL__ -D__ASSEMBLY__ -m32 
 -DCONFIG_AS_CFI=1 -DCONFIG_AS_CFI_SIGNAL_FRAME=1 -DMODULE  -c -o 
 arch/x86/crypto/aesni-intel_asm.o 
 /lnx/src/NEXT/linux-next-20101129/arch/x86/crypto/aesni-intel_asm.S
 
 
 There are 2945 lines like this:
 
 linux-next-20101129/arch/x86/crypto/aesni-intel_asm.S:841: Error: bad 
 register name `%r12'
 
 Well, in my tree (linux-next + 559ad0ff) line 841 is a comment. Albeit 
 without
 559ad0ff it's a 'push %r12'. So maybe you should apply the patch just once
 more to be sure. ;)
 
 Touche.
 What does that patch have to do with aesni-intel??
 
 The description should be clear enough: crypto: aesni-intel - Fixed build 
 error
 on x86-32.
 Here is the link to the patch: 
 http://git.kernel.org/?p=linux/kernel/git/herbert/cryptodev-2.6.git;a=patch;h=559ad0ff1368baea14dbc3207d55b02bd69bda4b.
  Please apply it on
 top of your linux-next build.
 
 I'm using the linux-next tarball of 2029.
 However, your s/__x86_64__/CONFIG_X86_64/ patch was applied, so I dropped 
 it.
 
 Well I doubt it. The patch was made on top of 559ad0ff so it should have 
 failed
 to apply in your tree since obviously 559ad0ff is missing.
 
 new output file:
 http://oss.oracle.com/~rdunlap/doc/cry4.out
 
 Same bug: 559ad0ff is still missing. Please apply the patch from the link 
 above.
 
 Thanks for persisting/continuing with me.
 I apologize, I had applied the most recent patch in Herbert's cryptodev repo,
 not the one that you referred me to.
 
 Yes, the build is now fixed.

Great! Have fun with your new AESNI-accelerated crypto :)

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http

[PATCH] crypto: aesni-intel - Fixed build error on x86-32

2010-11-28 Thread Mathias Krause
Herbert, thanks for merge but the AES-GCM code merged meanwhile made the x86-32
bit version break on build. The following patch fixes this:

Exclude AES-GCM code for x86-32 due to heavy usage of 64-bit registers
not available on x86-32.

While at it, fixed unregister order in aesni_exit().

Signed-off-by: Mathias Krause mini...@googlemail.com
---
 arch/x86/crypto/aesni-intel_asm.S  |5 -
 arch/x86/crypto/aesni-intel_glue.c |   26 +-
 2 files changed, 17 insertions(+), 14 deletions(-)

diff --git a/arch/x86/crypto/aesni-intel_asm.S 
b/arch/x86/crypto/aesni-intel_asm.S
index f592e03..d528fde 100644
--- a/arch/x86/crypto/aesni-intel_asm.S
+++ b/arch/x86/crypto/aesni-intel_asm.S
@@ -32,6 +32,7 @@
 #include linux/linkage.h
 #include asm/inst.h
 
+#ifdef __x86_64__
 .data
 POLY:   .octa 0xC201
 TWOONE: .octa 0x00010001
@@ -84,6 +85,7 @@ enc:.octa 0x2
 #define arg8 STACK_OFFSET+16(%r14)
 #define arg9 STACK_OFFSET+24(%r14)
 #define arg10 STACK_OFFSET+32(%r14)
+#endif
 
 
 #define STATE1 %xmm0
@@ -130,6 +132,7 @@ enc:.octa 0x2
 #endif
 
 
+#ifdef __x86_64__
 /* GHASH_MUL MACRO to implement: Data*HashKey mod (128,127,126,121,0)
 *
 *
@@ -1255,7 +1258,7 @@ _return_T_done_encrypt:
pop %r13
pop %r12
ret
-
+#endif
 
 
 _key_expansion_128:
diff --git a/arch/x86/crypto/aesni-intel_glue.c 
b/arch/x86/crypto/aesni-intel_glue.c
index 8a3b800..0f2c3c6 100644
--- a/arch/x86/crypto/aesni-intel_glue.c
+++ b/arch/x86/crypto/aesni-intel_glue.c
@@ -97,7 +97,6 @@ asmlinkage void aesni_cbc_dec(struct crypto_aes_ctx *ctx, u8 
*out,
 #ifdef CONFIG_X86_64
 asmlinkage void aesni_ctr_enc(struct crypto_aes_ctx *ctx, u8 *out,
  const u8 *in, unsigned int len, u8 *iv);
-#endif
 
 /* asmlinkage void aesni_gcm_enc()
  * void *ctx,  AES Key schedule. Starts on a 16 byte boundary.
@@ -149,6 +148,7 @@ aesni_rfc4106_gcm_ctx *aesni_rfc4106_gcm_ctx_get(struct 
crypto_aead *tfm)
PTR_ALIGN((u8 *)
crypto_tfm_ctx(crypto_aead_tfm(tfm)), AESNI_ALIGN);
 }
+#endif
 
 static inline struct crypto_aes_ctx *aes_ctx(void *raw_ctx)
 {
@@ -822,6 +822,7 @@ static struct crypto_alg ablk_xts_alg = {
 };
 #endif
 
+#ifdef CONFIG_X86_64
 static int rfc4106_init(struct crypto_tfm *tfm)
 {
struct cryptd_aead *cryptd_tfm;
@@ -1237,6 +1238,7 @@ static struct crypto_alg __rfc4106_alg = {
},
},
 };
+#endif
 
 static int __init aesni_init(void)
 {
@@ -1264,6 +1266,10 @@ static int __init aesni_init(void)
goto blk_ctr_err;
if ((err = crypto_register_alg(ablk_ctr_alg)))
goto ablk_ctr_err;
+   if ((err = crypto_register_alg(__rfc4106_alg))
+   goto __aead_gcm_err;
+   if ((err = crypto_register_alg(rfc4106_alg))
+   goto aead_gcm_err;
 #ifdef HAS_CTR
if ((err = crypto_register_alg(ablk_rfc3686_ctr_alg)))
goto ablk_rfc3686_ctr_err;
@@ -1281,19 +1287,9 @@ static int __init aesni_init(void)
if ((err = crypto_register_alg(ablk_xts_alg)))
goto ablk_xts_err;
 #endif
-   err = crypto_register_alg(__rfc4106_alg);
-   if (err)
-   goto __aead_gcm_err;
-   err = crypto_register_alg(rfc4106_alg);
-   if (err)
-   goto aead_gcm_err;
return err;
 
-aead_gcm_err:
-   crypto_unregister_alg(__rfc4106_alg);
-__aead_gcm_err:
 #ifdef HAS_XTS
-   crypto_unregister_alg(ablk_xts_alg);
 ablk_xts_err:
 #endif
 #ifdef HAS_PCBC
@@ -1309,6 +1305,10 @@ ablk_lrw_err:
crypto_unregister_alg(ablk_rfc3686_ctr_alg);
 ablk_rfc3686_ctr_err:
 #endif
+   crypto_unregister_alg(rfc4106_alg);
+aead_gcm_err:
+   crypto_unregister_alg(__rfc4106_alg);
+__aead_gcm_err:
crypto_unregister_alg(ablk_ctr_alg);
 ablk_ctr_err:
crypto_unregister_alg(blk_ctr_alg);
@@ -1331,8 +1331,6 @@ aes_err:
 
 static void __exit aesni_exit(void)
 {
-   crypto_unregister_alg(__rfc4106_alg);
-   crypto_unregister_alg(rfc4106_alg);
 #ifdef HAS_XTS
crypto_unregister_alg(ablk_xts_alg);
 #endif
@@ -1346,6 +1344,8 @@ static void __exit aesni_exit(void)
 #ifdef HAS_CTR
crypto_unregister_alg(ablk_rfc3686_ctr_alg);
 #endif
+   crypto_unregister_alg(rfc4106_alg);
+   crypto_unregister_alg(__rfc4106_alg);
crypto_unregister_alg(ablk_ctr_alg);
crypto_unregister_alg(blk_ctr_alg);
 #endif
-- 
1.5.6.5

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] crypto: aesni-intel - Fixed build error on x86-32

2010-11-28 Thread Mathias Krause
Herbert, thanks for merge but the AES-GCM code merged meanwhile made the x86-32
bit version break on build. The following patch fixes this (now compile tested
on x86-64, too):

Exclude AES-GCM code for x86-32 due to heavy usage of 64-bit registers
not available on x86-32.

While at it, fixed unregister order in aesni_exit().

Signed-off-by: Mathias Krause mini...@googlemail.com
---
 arch/x86/crypto/aesni-intel_asm.S  |5 -
 arch/x86/crypto/aesni-intel_glue.c |   26 +-
 2 files changed, 17 insertions(+), 14 deletions(-)

diff --git a/arch/x86/crypto/aesni-intel_asm.S 
b/arch/x86/crypto/aesni-intel_asm.S
index f592e03..d528fde 100644
--- a/arch/x86/crypto/aesni-intel_asm.S
+++ b/arch/x86/crypto/aesni-intel_asm.S
@@ -32,6 +32,7 @@
 #include linux/linkage.h
 #include asm/inst.h
 
+#ifdef __x86_64__
 .data
 POLY:   .octa 0xC201
 TWOONE: .octa 0x00010001
@@ -84,6 +85,7 @@ enc:.octa 0x2
 #define arg8 STACK_OFFSET+16(%r14)
 #define arg9 STACK_OFFSET+24(%r14)
 #define arg10 STACK_OFFSET+32(%r14)
+#endif
 
 
 #define STATE1 %xmm0
@@ -130,6 +132,7 @@ enc:.octa 0x2
 #endif
 
 
+#ifdef __x86_64__
 /* GHASH_MUL MACRO to implement: Data*HashKey mod (128,127,126,121,0)
 *
 *
@@ -1255,7 +1258,7 @@ _return_T_done_encrypt:
pop %r13
pop %r12
ret
-
+#endif
 
 
 _key_expansion_128:
diff --git a/arch/x86/crypto/aesni-intel_glue.c 
b/arch/x86/crypto/aesni-intel_glue.c
index 8a3b800..e1e60c7 100644
--- a/arch/x86/crypto/aesni-intel_glue.c
+++ b/arch/x86/crypto/aesni-intel_glue.c
@@ -97,7 +97,6 @@ asmlinkage void aesni_cbc_dec(struct crypto_aes_ctx *ctx, u8 
*out,
 #ifdef CONFIG_X86_64
 asmlinkage void aesni_ctr_enc(struct crypto_aes_ctx *ctx, u8 *out,
  const u8 *in, unsigned int len, u8 *iv);
-#endif
 
 /* asmlinkage void aesni_gcm_enc()
  * void *ctx,  AES Key schedule. Starts on a 16 byte boundary.
@@ -149,6 +148,7 @@ aesni_rfc4106_gcm_ctx *aesni_rfc4106_gcm_ctx_get(struct 
crypto_aead *tfm)
PTR_ALIGN((u8 *)
crypto_tfm_ctx(crypto_aead_tfm(tfm)), AESNI_ALIGN);
 }
+#endif
 
 static inline struct crypto_aes_ctx *aes_ctx(void *raw_ctx)
 {
@@ -822,6 +822,7 @@ static struct crypto_alg ablk_xts_alg = {
 };
 #endif
 
+#ifdef CONFIG_X86_64
 static int rfc4106_init(struct crypto_tfm *tfm)
 {
struct cryptd_aead *cryptd_tfm;
@@ -1237,6 +1238,7 @@ static struct crypto_alg __rfc4106_alg = {
},
},
 };
+#endif
 
 static int __init aesni_init(void)
 {
@@ -1264,6 +1266,10 @@ static int __init aesni_init(void)
goto blk_ctr_err;
if ((err = crypto_register_alg(ablk_ctr_alg)))
goto ablk_ctr_err;
+   if ((err = crypto_register_alg(__rfc4106_alg)))
+   goto __aead_gcm_err;
+   if ((err = crypto_register_alg(rfc4106_alg)))
+   goto aead_gcm_err;
 #ifdef HAS_CTR
if ((err = crypto_register_alg(ablk_rfc3686_ctr_alg)))
goto ablk_rfc3686_ctr_err;
@@ -1281,19 +1287,9 @@ static int __init aesni_init(void)
if ((err = crypto_register_alg(ablk_xts_alg)))
goto ablk_xts_err;
 #endif
-   err = crypto_register_alg(__rfc4106_alg);
-   if (err)
-   goto __aead_gcm_err;
-   err = crypto_register_alg(rfc4106_alg);
-   if (err)
-   goto aead_gcm_err;
return err;
 
-aead_gcm_err:
-   crypto_unregister_alg(__rfc4106_alg);
-__aead_gcm_err:
 #ifdef HAS_XTS
-   crypto_unregister_alg(ablk_xts_alg);
 ablk_xts_err:
 #endif
 #ifdef HAS_PCBC
@@ -1309,6 +1305,10 @@ ablk_lrw_err:
crypto_unregister_alg(ablk_rfc3686_ctr_alg);
 ablk_rfc3686_ctr_err:
 #endif
+   crypto_unregister_alg(rfc4106_alg);
+aead_gcm_err:
+   crypto_unregister_alg(__rfc4106_alg);
+__aead_gcm_err:
crypto_unregister_alg(ablk_ctr_alg);
 ablk_ctr_err:
crypto_unregister_alg(blk_ctr_alg);
@@ -1331,8 +1331,6 @@ aes_err:
 
 static void __exit aesni_exit(void)
 {
-   crypto_unregister_alg(__rfc4106_alg);
-   crypto_unregister_alg(rfc4106_alg);
 #ifdef HAS_XTS
crypto_unregister_alg(ablk_xts_alg);
 #endif
@@ -1346,6 +1344,8 @@ static void __exit aesni_exit(void)
 #ifdef HAS_CTR
crypto_unregister_alg(ablk_rfc3686_ctr_alg);
 #endif
+   crypto_unregister_alg(rfc4106_alg);
+   crypto_unregister_alg(__rfc4106_alg);
crypto_unregister_alg(ablk_ctr_alg);
crypto_unregister_alg(blk_ctr_alg);
 #endif
-- 
1.5.6.5

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] x86, crypto: ported aes-ni implementation to x86

2010-11-17 Thread Mathias Krause
On 13.11.2010, 00:25 Mathias Krause wrote:
 On 12.11.2010, 08:34 Huang Ying wrote:
 On Fri, 2010-11-12 at 15:30 +0800, Mathias Krause wrote:
 On 12.11.2010, 01:33 Huang Ying wrote:
 Why the improvement of ECB is so small? I can not understand it. It
 should be as big as CBC.
 
 I don't know why the ECB variant is so slow compared to the other variants.
 But it is so even for the current x86-64 version. See the above values for
 x86-64 (old). I setup dm-crypt for this test like this:
 # cryptsetup -c aes-ecb-plain -d /dev/urandom create cfs /dev/loop0
 
 What where the numbers you measured in your tests while developing the
 x86-64 version?
 
 Can't remember the number. Do you have interest to dig into the issue?
 
 I looked at /proc/crypto while doing the tests again and noticed that ECB
 isn't handled using cryptd, while all other modes, e.g. CBC and CTR, are.
 The reason for that seems to be that for ECB, and only for ECB, the kernel
 is using the synchronous block algorithm instead of the asynchronous one.
 So the question is: Why is the ECB variant handled using the synchronous
 cipher -- because of the missing iv handling in this mode?

Herbert, any idea why this is the case?

Regards,
Mathias

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4] x86, crypto: ported aes-ni implementation to x86

2010-11-17 Thread Mathias Krause
On 11.11.2010, 23:20 Mathias Krause wrote:
 The AES-NI instructions are also available in legacy mode so the 32-bit
 architecture may profit from those, too.
 
 To illustrate the performance gain here's a short summary of a dm-crypt
 speed test on a Core i7 M620 running at 2.67GHz comparing both assembler
 implementations:
 
 x86:   i568   aes-nidelta
 ECB, 256 bit: 93.8 MB/s   123.3 MB/s   +31.4%
 CBC, 256 bit: 84.8 MB/s   262.3 MB/s  +209.3%
 LRW, 256 bit:108.6 MB/s   222.1 MB/s  +104.5%
 XTS, 256 bit:105.0 MB/s   205.5 MB/s   +95.7%
 
 Additionally, due to some minor optimizations, the 64-bit version also
 got a minor performance gain as seen below:
 
 x86-64:   old impl.new impl.delta
 ECB, 256 bit:121.1 MB/s   123.0 MB/s+1.5%
 CBC, 256 bit:285.3 MB/s   290.8 MB/s+1.9%
 LRW, 256 bit:263.7 MB/s   265.3 MB/s+0.6%
 XTS, 256 bit:251.1 MB/s   255.3 MB/s+1.7%
 
 Signed-off-by: Mathias Krause mini...@googlemail.com
 ---
 v4 changes:
 * adapted CBC implementation to be useable on x86, too
 * redo the measurement using dm-crypt
 
 v3 changes:
 * fixed 32-bit implementation of aesni_ecb_enc (a hunk somehow moved to the 
 end
  of another function)
 
 v2 changes:
 * hide almost all register names in macros so the same code base can be shared
  between x86 and x86_64
 * unified Kconfig documentation again
 * added alignment constraints for internal functions.
 
 
 arch/x86/crypto/aesni-intel_asm.S  |  197 ++--
 arch/x86/crypto/aesni-intel_glue.c |   22 +++-
 crypto/Kconfig |   12 ++-
 3 files changed, 191 insertions(+), 40 deletions(-)

No comments so far? :(
What's wrong with the patch?

Regards,
Mathias--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] x86, crypto: ported aes-ni implementation to x86

2010-11-12 Thread Mathias Krause
On 12.11.2010, 08:34 Huang Ying wrote:
On Fri, 2010-11-12 at 15:30 +0800, Mathias Krause wrote:
 On 12.11.2010, 01:33 Huang Ying wrote:
 Why the improvement of ECB is so small? I can not understand it. It
 should be as big as CBC.
 
 I don't know why the ECB variant is so slow compared to the other variants.
 But it is so even for the current x86-64 version. See the above values for
 x86-64 (old). I setup dm-crypt for this test like this:
 # cryptsetup -c aes-ecb-plain -d /dev/urandom create cfs /dev/loop0
 
 What where the numbers you measured in your tests while developing the
 x86-64 version?
 
 Can't remember the number. Do you have interest to dig into the issue?

I looked at /proc/crypto while doing the tests again and noticed that ECB
isn't handled using cryptd, while all other modes, e.g. CBC and CTR, are.
The reason for that seems to be that for ECB, and only for ECB, the kernel
is using the synchronous block algorithm instead of the asynchronous one.
So the question is: Why is the ECB variant handled using the synchronous
cipher -- because of the missing iv handling in this mode?

Best regards,
Mathias

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] x86, crypto: ported aes-ni implementation to x86

2010-11-11 Thread Mathias Krause
Hello Huang Ying,

On 03.11.2010, 23:27 Huang Ying wrote:
 On Wed, 2010-11-03 at 14:14 -0700, Mathias Krause wrote:
 The AES-NI instructions are also available in legacy mode so the 32-bit
 architecture may profit from those, too.
 
 To illustrate the performance gain here's a short summary of the tcrypt
 speed test on a Core i7 M620 running at 2.67GHz comparing both assembler
 implementations:
 
 x86:  i568   aes-ni   delta
 256 bit, 8kB blocks, ECB:  125.94 MB/s  187.09 MB/s  +48.6%
 
 Which method do you used for speed testing?
 
 modprobe tcrypt mode=200 sec=?
 
 That actually does not work very well for AES-NI. Because AES-NI
 blkcipher is tested in synchronous mode, and in that mode,
 kernel_fpu_begin/end() must be called for every block, and
 kernel_fpu_begin/end() is quite slow. At the same time, some further
 optimization for AES-NI can not be tested (such as ecb-aes-aesni
 driver) in that mode, because they are only available in asynchronous
 mode.
 
 When developing AES-NI for x86_64, I uses dm-crypt + AES-NI for speed
 testing, where AES-NI blkcipher will be tested in asynchronous mode, and
 kernel_fpu_begin/end() is called for every page. Can you use that to
 test?
 
 Or you can add test_acipher_speed (similar with test_ahash_speed) to
 test cipher in asynchronous mode.

here are the numbers for dm-crypt. I run the test again on the Core i7
M620, 2.67GHz. During the test I noticed that not porting the CBC
variant to x86 was a bad idea so I did that too and got pretty nice
numbers (see v3 vs. v4 of the patch).

All test were run five times in a row using a 256 bit key and doing i/o
to the block device in chunks of 1MB. The numbers are MB/s.

x86 (i586 variant):
1. run  2. run  3. run  4. run  5. runmean
ECB:  93.993.994.093.593.893.8
CBC:  84.984.884.984.984.884.8
XTS: 108.2   108.3   109.6   108.3   108.9   108.6
LRW: 105.0   105.0   105.1   105.1   105.1   105.0

x86 (AES-NI), v3 of the patch:
1. run  2. run  3. run  4. run  5. runmean
ECB: 124.8   120.8   124.5   120.6   124.5   123.0
CBC: 112.6   109.6   112.6   110.7   109.4   110.9 
XTS: 221.6   221.1   220.9   223.5   224.4   222.3
LRW: 206.2   209.7   207.4   203.7   209.3   207.2

x86 (AES-NI), v4 of the patch:
1. run  2. run  3. run  4. run  5. runmean
ECB: 122.5   121.2   121.6   125.7   125.5   123.3
CBC: 259.5   259.2   261.2   264.0   267.6   262.3 
XTS: 225.1   230.7   220.6   217.9   216.3   222.1
LRW: 202.7   202.8   210.6   208.9   202.7   205.5

Comparing the values for the CBC variant between v3 and v4 of the patch
shows that porting the CBC variant to x86 more then doubled the
performance so the little bit ugly #ifdefed code is worth the effort.

x86-64 (old):
1. run  2. run  3. run  4. run  5. runmean
ECB: 121.4   120.9   121.1   121.2   120.9   121.1
CBC: 282.5   286.3   281.5   282.0   294.5   285.3
XTS: 263.6   260.3   263.0   267.0   264.6   263.7
LRW: 249.6   249.8   250.5   253.4   252.2   251.1

x86-64 (new):
1. run  2. run  3. run  4. run  5. runmean
ECB: 122.1   122.0   122.0   127.0   121.9   123.0
CBC: 291.2   286.2   295.6   291.4   289.9   290.8
XTS: 263.3   264.4   264.5   264.2   270.4   265.3
LRW: 254.9   252.3   253.6   258.2   257.5   255.3

Comparing the mean values gives us:

x86: i586   aes-nidelta
ECB: 93.8123.3   +31.4%
CBC: 84.8262.3  +209.3%
LRW:108.6222.1  +104.5%
XTS:105.0205.5   +95.7%

x86-64:   old  newdelta
ECB:121.1123.0+1.5%
CBC:285.3290.8+1.9%
LRW:263.7265.3+0.6%
XTS:251.1255.3+1.7%

The improvement for the old vs. the new x86-64 version is not as
drastically as for the synchronous variant (see the tcrypt tests in the
previous email), but nevertheless an improvement. The improvement for
the x86 case, albeit, should be noticeable. It's almost as fast as the
x86-64 version.

I'll post the new version of the patch in a follow-up email.


Regards,
Mathias

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4] x86, crypto: ported aes-ni implementation to x86

2010-11-11 Thread Mathias Krause
The AES-NI instructions are also available in legacy mode so the 32-bit
architecture may profit from those, too.

To illustrate the performance gain here's a short summary of a dm-crypt
speed test on a Core i7 M620 running at 2.67GHz comparing both assembler
implementations:

x86:   i568   aes-nidelta
ECB, 256 bit: 93.8 MB/s   123.3 MB/s   +31.4%
CBC, 256 bit: 84.8 MB/s   262.3 MB/s  +209.3%
LRW, 256 bit:108.6 MB/s   222.1 MB/s  +104.5%
XTS, 256 bit:105.0 MB/s   205.5 MB/s   +95.7%

Additionally, due to some minor optimizations, the 64-bit version also
got a minor performance gain as seen below:

x86-64:   old impl.new impl.delta
ECB, 256 bit:121.1 MB/s   123.0 MB/s+1.5%
CBC, 256 bit:285.3 MB/s   290.8 MB/s+1.9%
LRW, 256 bit:263.7 MB/s   265.3 MB/s+0.6%
XTS, 256 bit:251.1 MB/s   255.3 MB/s+1.7%

Signed-off-by: Mathias Krause mini...@googlemail.com
---
v4 changes:
* adapted CBC implementation to be useable on x86, too
* redo the measurement using dm-crypt

v3 changes:
* fixed 32-bit implementation of aesni_ecb_enc (a hunk somehow moved to the end
  of another function)

v2 changes:
* hide almost all register names in macros so the same code base can be shared
  between x86 and x86_64
* unified Kconfig documentation again
* added alignment constraints for internal functions.


 arch/x86/crypto/aesni-intel_asm.S  |  197 ++--
 arch/x86/crypto/aesni-intel_glue.c |   22 +++-
 crypto/Kconfig |   12 ++-
 3 files changed, 191 insertions(+), 40 deletions(-)

diff --git a/arch/x86/crypto/aesni-intel_asm.S 
b/arch/x86/crypto/aesni-intel_asm.S
index ff16756..74626fa 100644
--- a/arch/x86/crypto/aesni-intel_asm.S
+++ b/arch/x86/crypto/aesni-intel_asm.S
@@ -9,6 +9,9 @@
  *Vinodh Gopal vinodh.go...@intel.com
  *Kahraman Akdemir
  *
+ * Ported x86_64 version to x86:
+ *Author: Mathias Krause mini...@googlemail.com
+ *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License as published by
  * the Free Software Foundation; either version 2 of the License, or
@@ -32,12 +35,16 @@
 #define IN IN1
 #define KEY%xmm2
 #define IV %xmm3
+
 #define BSWAP_MASK %xmm10
 #define CTR%xmm11
 #define INC%xmm12
 
+#ifdef __x86_64__
+#define AREG   %rax
 #define KEYP   %rdi
 #define OUTP   %rsi
+#define UKEYP  OUTP
 #define INP%rdx
 #define LEN%rcx
 #define IVP%r8
@@ -46,6 +53,18 @@
 #define TKEYP  T1
 #define T2 %r11
 #define TCTR_LOW T2
+#else
+#define AREG   %eax
+#define KEYP   %edi
+#define OUTP   AREG
+#define UKEYP  OUTP
+#define INP%edx
+#define LEN%esi
+#define IVP%ebp
+#define KLEN   %ebx
+#define T1 %ecx
+#define TKEYP  T1
+#endif
 
 _key_expansion_128:
 _key_expansion_256a:
@@ -55,10 +74,11 @@ _key_expansion_256a:
shufps $0b10001100, %xmm0, %xmm4
pxor %xmm4, %xmm0
pxor %xmm1, %xmm0
-   movaps %xmm0, (%rcx)
-   add $0x10, %rcx
+   movaps %xmm0, (TKEYP)
+   add $0x10, TKEYP
ret
 
+.align 4
 _key_expansion_192a:
pshufd $0b01010101, %xmm1, %xmm1
shufps $0b0001, %xmm0, %xmm4
@@ -76,12 +96,13 @@ _key_expansion_192a:
 
movaps %xmm0, %xmm1
shufps $0b01000100, %xmm0, %xmm6
-   movaps %xmm6, (%rcx)
+   movaps %xmm6, (TKEYP)
shufps $0b01001110, %xmm2, %xmm1
-   movaps %xmm1, 16(%rcx)
-   add $0x20, %rcx
+   movaps %xmm1, 0x10(TKEYP)
+   add $0x20, TKEYP
ret
 
+.align 4
 _key_expansion_192b:
pshufd $0b01010101, %xmm1, %xmm1
shufps $0b0001, %xmm0, %xmm4
@@ -96,10 +117,11 @@ _key_expansion_192b:
pxor %xmm3, %xmm2
pxor %xmm5, %xmm2
 
-   movaps %xmm0, (%rcx)
-   add $0x10, %rcx
+   movaps %xmm0, (TKEYP)
+   add $0x10, TKEYP
ret
 
+.align 4
 _key_expansion_256b:
pshufd $0b10101010, %xmm1, %xmm1
shufps $0b0001, %xmm2, %xmm4
@@ -107,8 +129,8 @@ _key_expansion_256b:
shufps $0b10001100, %xmm2, %xmm4
pxor %xmm4, %xmm2
pxor %xmm1, %xmm2
-   movaps %xmm2, (%rcx)
-   add $0x10, %rcx
+   movaps %xmm2, (TKEYP)
+   add $0x10, TKEYP
ret
 
 /*
@@ -116,17 +138,23 @@ _key_expansion_256b:
  *   unsigned int key_len)
  */
 ENTRY(aesni_set_key)
-   movups (%rsi), %xmm0# user key (first 16 bytes)
-   movaps %xmm0, (%rdi)
-   lea 0x10(%rdi), %rcx# key addr
-   movl %edx, 480(%rdi)
+#ifndef __x86_64__
+   pushl KEYP
+   movl 8(%esp), KEYP  # ctx
+   movl 12(%esp), UKEYP# in_key
+   movl 16(%esp), %edx # key_len
+#endif
+   movups (UKEYP), %xmm0   # user key (first 16 bytes)
+   movaps %xmm0, (KEYP)
+   lea 0x10(KEYP), TKEYP   # key addr
+   movl %edx, 480(KEYP)
pxor %xmm4, %xmm4

Re: [PATCH v3] x86, crypto: ported aes-ni implementation to x86

2010-11-11 Thread Mathias Krause
On 12.11.2010, 01:33 Huang Ying wrote:
 Hi, Mathias,
 
 On Fri, 2010-11-12 at 06:18 +0800, Mathias Krause wrote:
 All test were run five times in a row using a 256 bit key and doing i/o
 to the block device in chunks of 1MB. The numbers are MB/s.
 
 x86 (i586 variant):
1. run  2. run  3. run  4. run  5. runmean
 ECB:  93.993.994.093.593.893.8
 CBC:  84.984.884.984.984.884.8
 XTS: 108.2   108.3   109.6   108.3   108.9   108.6
 LRW: 105.0   105.0   105.1   105.1   105.1   105.0
 
 x86 (AES-NI), v3 of the patch:
1. run  2. run  3. run  4. run  5. runmean
 ECB: 124.8   120.8   124.5   120.6   124.5   123.0
 CBC: 112.6   109.6   112.6   110.7   109.4   110.9 
 XTS: 221.6   221.1   220.9   223.5   224.4   222.3
 LRW: 206.2   209.7   207.4   203.7   209.3   207.2
 
 x86 (AES-NI), v4 of the patch:
1. run  2. run  3. run  4. run  5. runmean
 ECB: 122.5   121.2   121.6   125.7   125.5   123.3
 CBC: 259.5   259.2   261.2   264.0   267.6   262.3 
 XTS: 225.1   230.7   220.6   217.9   216.3   222.1
 LRW: 202.7   202.8   210.6   208.9   202.7   205.5
 
 Comparing the values for the CBC variant between v3 and v4 of the patch
 shows that porting the CBC variant to x86 more then doubled the
 performance so the little bit ugly #ifdefed code is worth the effort.
 
 x86-64 (old):
1. run  2. run  3. run  4. run  5. runmean
 ECB: 121.4   120.9   121.1   121.2   120.9   121.1
 CBC: 282.5   286.3   281.5   282.0   294.5   285.3
 XTS: 263.6   260.3   263.0   267.0   264.6   263.7
 LRW: 249.6   249.8   250.5   253.4   252.2   251.1
 
 x86-64 (new):
1. run  2. run  3. run  4. run  5. runmean
 ECB: 122.1   122.0   122.0   127.0   121.9   123.0
 CBC: 291.2   286.2   295.6   291.4   289.9   290.8
 XTS: 263.3   264.4   264.5   264.2   270.4   265.3
 LRW: 254.9   252.3   253.6   258.2   257.5   255.3
 
 Comparing the mean values gives us:
 
 x86: i586   aes-nidelta
 ECB: 93.8123.3   +31.4%
 
 Why the improvement of ECB is so small? I can not understand it. It
 should be as big as CBC.

I don't know why the ECB variant is so slow compared to the other variants.
But it is so even for the current x86-64 version. See the above values for
x86-64 (old). I setup dm-crypt for this test like this:
# cryptsetup -c aes-ecb-plain -d /dev/urandom create cfs /dev/loop0

What where the numbers you measured in your tests while developing the
x86-64 version?

Best regards,
Mathias

 
 Best Regards,
 Huang Ying
 
 CBC: 84.8262.3  +209.3%
 LRW:108.6222.1  +104.5%
 XTS:105.0205.5   +95.7%
 
 x86-64:   old  newdelta
 ECB:121.1123.0+1.5%
 CBC:285.3290.8+1.9%
 LRW:263.7265.3+0.6%
 XTS:251.1255.3+1.7%
 
 The improvement for the old vs. the new x86-64 version is not as
 drastically as for the synchronous variant (see the tcrypt tests in the
 previous email), but nevertheless an improvement. The improvement for
 the x86 case, albeit, should be noticeable. It's almost as fast as the
 x86-64 version.
 
 I'll post the new version of the patch in a follow-up email.
 
 
 Regards,
 Mathias
 
 
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] x86, crypto: ported aes-ni implementation to x86

2010-11-11 Thread Mathias Krause
On 12.11.2010, 08:34 Huang Ying wrote:
On Fri, 2010-11-12 at 15:30 +0800, Mathias Krause wrote:
 On 12.11.2010, 01:33 Huang Ying wrote:
 Hi, Mathias,
 
 On Fri, 2010-11-12 at 06:18 +0800, Mathias Krause wrote:
 All test were run five times in a row using a 256 bit key and doing i/o
 to the block device in chunks of 1MB. The numbers are MB/s.
 
 x86 (i586 variant):
   1. run  2. run  3. run  4. run  5. runmean
 ECB:  93.993.994.093.593.893.8
 CBC:  84.984.884.984.984.884.8
 XTS: 108.2   108.3   109.6   108.3   108.9   108.6
 LRW: 105.0   105.0   105.1   105.1   105.1   105.0
 
 x86 (AES-NI), v3 of the patch:
   1. run  2. run  3. run  4. run  5. runmean
 ECB: 124.8   120.8   124.5   120.6   124.5   123.0
 CBC: 112.6   109.6   112.6   110.7   109.4   110.9 
 XTS: 221.6   221.1   220.9   223.5   224.4   222.3
 LRW: 206.2   209.7   207.4   203.7   209.3   207.2
 
 x86 (AES-NI), v4 of the patch:
   1. run  2. run  3. run  4. run  5. runmean
 ECB: 122.5   121.2   121.6   125.7   125.5   123.3
 CBC: 259.5   259.2   261.2   264.0   267.6   262.3 
 XTS: 225.1   230.7   220.6   217.9   216.3   222.1
 LRW: 202.7   202.8   210.6   208.9   202.7   205.5
 
 Comparing the values for the CBC variant between v3 and v4 of the patch
 shows that porting the CBC variant to x86 more then doubled the
 performance so the little bit ugly #ifdefed code is worth the effort.
 
 x86-64 (old):
   1. run  2. run  3. run  4. run  5. runmean
 ECB: 121.4   120.9   121.1   121.2   120.9   121.1
 CBC: 282.5   286.3   281.5   282.0   294.5   285.3
 XTS: 263.6   260.3   263.0   267.0   264.6   263.7
 LRW: 249.6   249.8   250.5   253.4   252.2   251.1
 
 x86-64 (new):
   1. run  2. run  3. run  4. run  5. runmean
 ECB: 122.1   122.0   122.0   127.0   121.9   123.0
 CBC: 291.2   286.2   295.6   291.4   289.9   290.8
 XTS: 263.3   264.4   264.5   264.2   270.4   265.3
 LRW: 254.9   252.3   253.6   258.2   257.5   255.3
 
 Comparing the mean values gives us:
 
 x86: i586   aes-nidelta
 ECB: 93.8123.3   +31.4%
 
 Why the improvement of ECB is so small? I can not understand it. It
 should be as big as CBC.
 
 I don't know why the ECB variant is so slow compared to the other variants.
 But it is so even for the current x86-64 version. See the above values for
 x86-64 (old). I setup dm-crypt for this test like this:
 # cryptsetup -c aes-ecb-plain -d /dev/urandom create cfs /dev/loop0
 
 What where the numbers you measured in your tests while developing the
 x86-64 version?
 
 Can't remember the number. Do you have interest to dig into the issue?

Sure. Increasing performance is always a good thing to do. :)

Best regards,
Mathias--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] x86, crypto: ported aes-ni implementation to x86

2010-11-04 Thread Mathias Krause
On 03.11.2010, 23:27 Huang Ying wrote:
 On Wed, 2010-11-03 at 14:14 -0700, Mathias Krause wrote:
 The AES-NI instructions are also available in legacy mode so the 32-bit
 architecture may profit from those, too.
 
 To illustrate the performance gain here's a short summary of the tcrypt
 speed test on a Core i7 M620 running at 2.67GHz comparing both assembler
 implementations:
 
 x86:  i568   aes-ni   delta
 256 bit, 8kB blocks, ECB:  125.94 MB/s  187.09 MB/s  +48.6%
 
 Which method do you used for speed testing?
 
 modprobe tcrypt mode=200 sec=?

Yes. I used: modprobe tcrypt mode=200 sec=1

 That actually does not work very well for AES-NI. Because AES-NI
 blkcipher is tested in synchronous mode, and in that mode,
 kernel_fpu_begin/end() must be called for every block, and
 kernel_fpu_begin/end() is quite slow.

That's what I figured, too. Can this slowdown be avoided by saving and 
restoring the used FPU registers within the assembler implementation or 
would this be even slower?

 At the same time, some further
 optimization for AES-NI can not be tested (such as ecb-aes-aesni
 driver) in that mode, because they are only available in asynchronous
 mode.

After finding the bug in the second version of the patch I noticed this, 
too.

 When developing AES-NI for x86_64, I uses dm-crypt + AES-NI for speed
 testing, where AES-NI blkcipher will be tested in asynchronous mode, and
 kernel_fpu_begin/end() is called for every page. Can you use that to
 test?

But wouldn't this be even slower than the above measurement? I took the 
results for 8kB blocks and a page would only be 4kB ... well, depends on 
what kind of pages you took. IIRC x86-64 not only supports 2MB but also 
1GB pages ;)

 Or you can add test_acipher_speed (similar with test_ahash_speed) to
 test cipher in asynchronous mode.

Maybe I'll try this approach, since it looks like just a minor 
modification of the tcrypt module.
Thanks for the hints!

Best regards,
Mathias

 
 Best Regards,
 Huang Ying
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] x86, crypto: ported aes-ni implementation to x86

2010-11-03 Thread Mathias Krause
Hi,

I modified the patch so it doesn't introduce a copy of the existing
assembler implementation but modifies the existing one to be usable for
64 and 32 bit. Additionally I added some alignment constraints for
internal functions which resulted in a noticeable speed-up.

I rerun the tests on another machine, an Core i7 M620, 2.67GHz. I also
took the low-end numbers for the AES-NI variants because I didn't
want to wait for the big numbers to come every now and then any more ;)
So here is the comparison of 5 consecutive tcrypt test runs for some
selected algorithms in MiB/s:

x86-64 (old):   1. run  2. run  3. run  4. run  5. runmean
ECB, 256 bit, 8kB:  152.49  152.58  152.51  151.80  151.87  152.25
CBC. 256 bit, 8kB:  144.32  144.44  144.35  143.75  143.75  144.12
LRW, 320 bit, 8kB:  159.41  159.21  159.21  158.55  159.28  159.13
XTS, 512 bit, 8kB:  144.87  142.88  144.75  144.11  144.75  144.27

x86-64 (new):   1. run  2. run  3. run  4. run  5. runmean
ECB, 256 bit, 8kB:  184.07  184.07  183.50  183.50  184.07  183.84
CBC. 256 bit, 8kB:  170.25  170.24  169.71  169.71  170.25  170.03
LRW, 320 bit, 8kB:  169.91  169.91  169.39  169.37  169.91  169.69
XTS, 512 bit, 8kB:  172.39  172.35  171.82  171.82  172.35  172.14

i586:   1. run  2. run  3. run  4. run  5. runmean
ECB, 256 bit, 8kB:  125.98  126.03  126.03  125.64  126.03  125.94
CBC. 256 bit, 8kB:  118.18  118.19  117.84  117.84  118.19  118.04
LRW, 320 bit, 8kB:  128.37  128.35  127.97  127.98  128.35  128.20
XTS, 512 bit, 8kB:  118.52  118.50  118.14  118.14  118.49  118.35

x86 (AES-NI):   1. run  2. run  3. run  4. run  5. runmean
ECB, 256 bit, 8kB:  187.33  187.34  187.33  186.75  186.74  187.09
CBC. 256 bit, 8kB:  171.84  171.84  171.84  171.28  171.28  171.61
LRW, 320 bit, 8kB:  168.54  168.54  168.53  168.00  168.02  168.32
XTS, 512 bit, 8kB:  166.61  166.60  166.60  166.08  166.60  166.49

Comparing the mean values gives us:

x86-64:old new   delta
ECB, 256 bit, 8kB:  152.25  183.84  +20.7%
CBC. 256 bit, 8kB:  144.12  170.03  +18.0%
LRW, 320 bit, 8kB:  159.13  169.69   +6.6%
XTS, 512 bit, 8kB:  144.27  172.14  +19.3%

x86:  i586  aes-ni   delta
ECB, 256 bit, 8kB:  125.94  187.09  +48.6%
CBC. 256 bit, 8kB:  118.04  171.61  +45.4%
LRW, 320 bit, 8kB:  128.20  168.32  +31.3%
XTS, 512 bit, 8kB:  118.35  166.49  +40.7%

The funny thing is that the 32 bit implementation is sometimes even
faster then the 64 bit one. Nevertheless the minor optimization of
aligning function entries gave the 64 bit version quite a big
performance gain (up to 20%).

I'll post the new version of the patch in a follow-up email.

Regards,
Mathias
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   >