Re: [PATCH 0/3] padata cpu awareness fixes
On 12 September 2017 at 11:07, Steffen Klassert <steffen.klass...@secunet.com> wrote: > On Fri, Sep 08, 2017 at 08:57:08PM +0200, Mathias Krause wrote: >> Hi Steffen, Herbert, >> >> this series solves multiple issues of padata I ran into while trying to >> make use of pcrypt for IPsec. >> >> The first issue is that the reorder timer callback may run on a CPU that >> isn't part of the parallel set, as it runs on the CPU where the timer >> interrupt gets handled. As a result, padata_reorder() may run on a CPU >> with uninitialized per-cpu parallel queues, so the __this_cpu_read() in >> padata_get_next() refers to an uninitialized cpu_index. However, as >> per-cpu memory gets memset(0) on allocation time, it'll read 0. If the >> next CPU index we're waiting for is 0 too, the compare will succeed and >> we'll return with -ENODATA, making padata_reorder() think there's >> nothing to do, stop any pending timer and return immediately. This is >> wrong as the pending timer might have been the one to trigger the needed >> reordering, but, as it was canceled, will never happen -- if no other >> parallel requests arrive afterwards. >> >> That issue is addressed with the first two patches, ensuring we're not >> using a bogus cpu_index value for the compare and thereby not wrongly >> cancel a pending timer. The second patch then ensures the reorder timer >> callback runs on the correct CPU or, at least, on a CPU from the >> parallel set to generate forward progress. >> >> The third patch addresses a similar issues for the serialization >> callback. We implicitly require padata_do_serial() to be called on the >> very same CPU padata_do_parallel() was called on to ensure the correct >> ordering of requests -- and correct functioning of padata, even. If the >> algorithm we're parallelizing is asynchronous itself, that invariant >> might not be true, as, e.g. crypto hardware may execute the callback on >> the CPU its interrupt gets handled on which isn't necessarily the one >> the request got issued on. >> >> To handle that issue we track the CPU we're supposed to handle the >> request on and ensure we'll call padata_reorder() on that CPU when >> padata_do_serial() gets called -- either by already running on the >> corect CPU or by deferring the call to a worker. >> >> Please apply! >> >> Mathias Krause (3): >> padata: set cpu_index of unused CPUs to -1 >> padata: ensure the reorder timer callback runs on the correct CPU >> padata: ensure padata_do_serial() runs on the correct CPU > > Looks good, thanks! > > Acked-by: Steffen Klassert <steffen.klass...@secunet.com> Ping! Herbert, will these patches go through your tree or Steffen's?
[PATCH 3/3] padata: ensure padata_do_serial() runs on the correct CPU
If the algorithm we're parallelizing is asynchronous we might change CPUs between padata_do_parallel() and padata_do_serial(). However, we don't expect this to happen as we need to enqueue the padata object into the per-cpu reorder queue we took it from, i.e. the same-cpu's parallel queue. Ensure we're not switching CPUs for a given padata object by tracking the CPU within the padata object. If the serial callback gets called on the wrong CPU, defer invoking padata_reorder() via a kernel worker on the CPU we're expected to run on. Signed-off-by: Mathias Krause <mini...@googlemail.com> --- include/linux/padata.h |2 ++ kernel/padata.c| 20 +++- 2 files changed, 21 insertions(+), 1 deletion(-) diff --git a/include/linux/padata.h b/include/linux/padata.h index 5c0175bbc179..5d13d25da2c8 100644 --- a/include/linux/padata.h +++ b/include/linux/padata.h @@ -37,6 +37,7 @@ * @list: List entry, to attach to the padata lists. * @pd: Pointer to the internal control structure. * @cb_cpu: Callback cpu for serializatioon. + * @cpu: Cpu for parallelization. * @seq_nr: Sequence number of the parallelized data object. * @info: Used to pass information from the parallel to the serial function. * @parallel: Parallel execution function. @@ -46,6 +47,7 @@ struct padata_priv { struct list_headlist; struct parallel_data*pd; int cb_cpu; + int cpu; int info; void(*parallel)(struct padata_priv *padata); void(*serial)(struct padata_priv *padata); diff --git a/kernel/padata.c b/kernel/padata.c index b4066147bce4..f262c9a4e70a 100644 --- a/kernel/padata.c +++ b/kernel/padata.c @@ -131,6 +131,7 @@ int padata_do_parallel(struct padata_instance *pinst, padata->cb_cpu = cb_cpu; target_cpu = padata_cpu_hash(pd); + padata->cpu = target_cpu; queue = per_cpu_ptr(pd->pqueue, target_cpu); spin_lock(>parallel.lock); @@ -363,10 +364,21 @@ void padata_do_serial(struct padata_priv *padata) int cpu; struct padata_parallel_queue *pqueue; struct parallel_data *pd; + int reorder_via_wq = 0; pd = padata->pd; cpu = get_cpu(); + + /* We need to run on the same CPU padata_do_parallel(.., padata, ..) +* was called on -- or, at least, enqueue the padata object into the +* correct per-cpu queue. +*/ + if (cpu != padata->cpu) { + reorder_via_wq = 1; + cpu = padata->cpu; + } + pqueue = per_cpu_ptr(pd->pqueue, cpu); spin_lock(>reorder.lock); @@ -376,7 +388,13 @@ void padata_do_serial(struct padata_priv *padata) put_cpu(); - padata_reorder(pd); + /* If we're running on the wrong CPU, call padata_reorder() via a +* kernel worker. +*/ + if (reorder_via_wq) + queue_work_on(cpu, pd->pinst->wq, >reorder_work); + else + padata_reorder(pd); } EXPORT_SYMBOL(padata_do_serial); -- 1.7.10.4
[PATCH 2/3] padata: ensure the reorder timer callback runs on the correct CPU
The reorder timer function runs on the CPU where the timer interrupt was handled which is not necessarily one of the CPUs of the 'pcpu' CPU mask set. Ensure the padata_reorder() callback runs on the correct CPU, which is one in the 'pcpu' CPU mask set and, preferrably, the next expected one. Do so by comparing the current CPU with the expected target CPU. If they match, call padata_reorder() right away. If they differ, schedule a work item on the target CPU that does the padata_reorder() call for us. Signed-off-by: Mathias Krause <mini...@googlemail.com> --- include/linux/padata.h |2 ++ kernel/padata.c| 43 ++- 2 files changed, 44 insertions(+), 1 deletion(-) diff --git a/include/linux/padata.h b/include/linux/padata.h index 2f9c1f93b1ce..5c0175bbc179 100644 --- a/include/linux/padata.h +++ b/include/linux/padata.h @@ -85,6 +85,7 @@ struct padata_serial_queue { * @swork: work struct for serialization. * @pd: Backpointer to the internal control structure. * @work: work struct for parallelization. + * @reorder_work: work struct for reordering. * @num_obj: Number of objects that are processed by this cpu. * @cpu_index: Index of the cpu. */ @@ -93,6 +94,7 @@ struct padata_parallel_queue { struct padata_listreorder; struct parallel_data *pd; struct work_structwork; + struct work_structreorder_work; atomic_t num_obj; int cpu_index; }; diff --git a/kernel/padata.c b/kernel/padata.c index 1b9b4bac4a9b..b4066147bce4 100644 --- a/kernel/padata.c +++ b/kernel/padata.c @@ -275,11 +275,51 @@ static void padata_reorder(struct parallel_data *pd) return; } +static void invoke_padata_reorder(struct work_struct *work) +{ + struct padata_parallel_queue *pqueue; + struct parallel_data *pd; + + local_bh_disable(); + pqueue = container_of(work, struct padata_parallel_queue, reorder_work); + pd = pqueue->pd; + padata_reorder(pd); + local_bh_enable(); +} + static void padata_reorder_timer(unsigned long arg) { struct parallel_data *pd = (struct parallel_data *)arg; + unsigned int weight; + int target_cpu, cpu; - padata_reorder(pd); + cpu = get_cpu(); + + /* We don't lock pd here to not interfere with parallel processing +* padata_reorder() calls on other CPUs. We just need any CPU out of +* the cpumask.pcpu set. It would be nice if it's the right one but +* it doesn't matter if we're off to the next one by using an outdated +* pd->processed value. +*/ + weight = cpumask_weight(pd->cpumask.pcpu); + target_cpu = padata_index_to_cpu(pd, pd->processed % weight); + + /* ensure to call the reorder callback on the correct CPU */ + if (cpu != target_cpu) { + struct padata_parallel_queue *pqueue; + struct padata_instance *pinst; + + /* The timer function is serialized wrt itself -- no locking +* needed. +*/ + pinst = pd->pinst; + pqueue = per_cpu_ptr(pd->pqueue, target_cpu); + queue_work_on(target_cpu, pinst->wq, >reorder_work); + } else { + padata_reorder(pd); + } + + put_cpu(); } static void padata_serial_worker(struct work_struct *serial_work) @@ -399,6 +439,7 @@ static void padata_init_pqueues(struct parallel_data *pd) __padata_list_init(>reorder); __padata_list_init(>parallel); INIT_WORK(>work, padata_parallel_worker); + INIT_WORK(>reorder_work, invoke_padata_reorder); atomic_set(>num_obj, 0); } } -- 1.7.10.4
[PATCH 1/3] padata: set cpu_index of unused CPUs to -1
The parallel queue per-cpu data structure gets initialized only for CPUs in the 'pcpu' CPU mask set. This is not sufficient as the reorder timer may run on a different CPU and might wrongly decide it's the target CPU for the next reorder item as per-cpu memory gets memset(0) and we might be waiting for the first CPU in cpumask.pcpu, i.e. cpu_index 0. Make the '__this_cpu_read(pd->pqueue->cpu_index) == next_queue->cpu_index' compare in padata_get_next() fail in this case by initializing the cpu_index member of all per-cpu parallel queues. Use -1 for unused ones. Signed-off-by: Mathias Krause <mini...@googlemail.com> --- kernel/padata.c |8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/kernel/padata.c b/kernel/padata.c index 868f947166d7..1b9b4bac4a9b 100644 --- a/kernel/padata.c +++ b/kernel/padata.c @@ -384,8 +384,14 @@ static void padata_init_pqueues(struct parallel_data *pd) struct padata_parallel_queue *pqueue; cpu_index = 0; - for_each_cpu(cpu, pd->cpumask.pcpu) { + for_each_possible_cpu(cpu) { pqueue = per_cpu_ptr(pd->pqueue, cpu); + + if (!cpumask_test_cpu(cpu, pd->cpumask.pcpu)) { + pqueue->cpu_index = -1; + continue; + } + pqueue->pd = pd; pqueue->cpu_index = cpu_index; cpu_index++; -- 1.7.10.4
[PATCH 0/3] padata cpu awareness fixes
Hi Steffen, Herbert, this series solves multiple issues of padata I ran into while trying to make use of pcrypt for IPsec. The first issue is that the reorder timer callback may run on a CPU that isn't part of the parallel set, as it runs on the CPU where the timer interrupt gets handled. As a result, padata_reorder() may run on a CPU with uninitialized per-cpu parallel queues, so the __this_cpu_read() in padata_get_next() refers to an uninitialized cpu_index. However, as per-cpu memory gets memset(0) on allocation time, it'll read 0. If the next CPU index we're waiting for is 0 too, the compare will succeed and we'll return with -ENODATA, making padata_reorder() think there's nothing to do, stop any pending timer and return immediately. This is wrong as the pending timer might have been the one to trigger the needed reordering, but, as it was canceled, will never happen -- if no other parallel requests arrive afterwards. That issue is addressed with the first two patches, ensuring we're not using a bogus cpu_index value for the compare and thereby not wrongly cancel a pending timer. The second patch then ensures the reorder timer callback runs on the correct CPU or, at least, on a CPU from the parallel set to generate forward progress. The third patch addresses a similar issues for the serialization callback. We implicitly require padata_do_serial() to be called on the very same CPU padata_do_parallel() was called on to ensure the correct ordering of requests -- and correct functioning of padata, even. If the algorithm we're parallelizing is asynchronous itself, that invariant might not be true, as, e.g. crypto hardware may execute the callback on the CPU its interrupt gets handled on which isn't necessarily the one the request got issued on. To handle that issue we track the CPU we're supposed to handle the request on and ensure we'll call padata_reorder() on that CPU when padata_do_serial() gets called -- either by already running on the corect CPU or by deferring the call to a worker. Please apply! Mathias Krause (3): padata: set cpu_index of unused CPUs to -1 padata: ensure the reorder timer callback runs on the correct CPU padata: ensure padata_do_serial() runs on the correct CPU include/linux/padata.h |4 +++ kernel/padata.c| 71 ++-- 2 files changed, 72 insertions(+), 3 deletions(-) -- 1.7.10.4
echainiv not working as supposed to be?
Hi Herbert, I'm a little bit confused on how echainiv is supposed to work. I think it's doing a few things wrongly, I just can't decide what's actually wrong as the intended mode of operation is unclear to me. At least, as-is, the code isn't quite operating as advertised in the comment at the top of the file. So, in hope you could enlighten me, here's my understanding of the current _implementation_ of echainiv (assuming aligned req->iv, for simplicity): For each request it XORs the salt into the provided IV (req->iv). For ESP the provided IV is the sequence number or, at least, parts of it. The thus modified IV gets written into the *destination* buffer (req->dst) which might be a different buffer than the source buffer (not in the ESP case, though, as it passes the same sg for src and dst). Afterwards, the per-cpu IV gets copied over into req->iv, effectively overwriting the generated XORed value. Then the inner crypto request gets initiated which may finish synchronously or asynchronously. Either way, the per-cpu IV gets updated with the new value from subreq->iv, which should be equal to req->iv in the normal case. Things that are unclear to me: 1/ Why is the XORed IV written to the destination buffer and not the source buffer? Isn't the sub-request supposed to use the IV from either req->src or req->iv -- and especially *not* from req->dst? 2/ Why does the XORed IV gets overwritten with the per-cpu IV prior passing it on to the sub-request, making all possible IV locations in req->src, req->dst and req->iv *all* be different? Moreover, the behaviour of 2/ actually leads to the fact, that the very first IV on each CPU will be a null IV -- not a salted sequence number. All following IVs will be the result of the (or better "some") sub-request's IV update, stored into the per-cpu variable -- still no salted sequence numbers, though. That behaviour is a bug, IMHO, as this clearly differs from what is described in the comment. However, I'm uncertain on how to fix it, as the intended mode of operation is unclear to me... Should only the first IV of each crypto transform be the salted sequence number and all following the result of the sub-request's IV update, therefore not stored in a single per-cpu variable but some echainiv context specific one? 3/ Why is echainiv using per-cpu IVs in the first place? Shouldn't those be crypto context specific? Currently we're happily mixing IVs from different transforms (with possibly different IV sizes!) with each other via the per-cpu variables. That might be an "issue" if one echainiv user can maliciously mess with the IVs of another user, e.g. via AF_ALG. So, should echainiv use a per-context per-cpu array instead that -- for simplicity -- gets initialised with random bytes and will be updated as it's now, just not with a single global per-cpu array, but a per-context one? That would allow us to still have a lock-less IV generator but one that cannot be accidentally / maliciously be tampered with by other echainiv users. Thanks, Mathias -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] crypto: user - re-add size check for CRYPTO_MSG_GETALG
On 22 June 2016 at 21:03, Stephan Mueller <smuel...@chronox.de> wrote: > Am Mittwoch, 22. Juni 2016, 20:29:37 schrieb Mathias Krause: > > Hi Mathias, > >> Commit 9aa867e46565 ("crypto: user - Add CRYPTO_MSG_DELRNG") >> accidentally removed the minimum size check for CRYPTO_MSG_GETALG >> netlink messages. This allows userland to send a truncated >> CRYPTO_MSG_GETALG message as short as a netlink header only making >> crypto_report() operate on uninitialized memory by accessing data >> beyond the end of the netlink message. >> >> Fix this be re-adding the minimum required size of CRYPTO_MSG_GETALG >> messages to the crypto_msg_min[] array. > > I was playing with the adding of the GETALG value as you did to track down the > issue fixed with eed1e1afd8d542d9644534c1b712599b5d680007 ("crypto: user - no > parsing of CRYPTO_MSG_GETALG") in the cryptodev-2.6 tree. Oh, I haven't noticed this commit. :D Just looked at Linus' master and crypto-2.6/master. > It did not occur to me that it fixes another bug. Yet, with this addition, it > would be possible to revert the patch eed1e1afd8d542d9644534c1b712599b5d680007 > as your patch fixes the issue too. But my fix can also stay as it does not > hurt either. Well, it does. Commit eed1e1afd8d542d9644534c1b712599b5d680007 is really just a workaround for the underlying issue. It does not fix the bug of the missing minimal size check for CRYPTO_MSG_GETALG, it just disables any further size checks for CRYPTO_MSG_GETALG. Putting my patch on top of yours will still not fix the issue of the missing minimal size check as your patch explicitly excludes CRYPTO_MSG_GETALG from any size checks. So it needs to be reverted, sorry :/ > What is your take on that? I'd say to revert eed1e1afd8d542d9644534c1b712599b5d680007 and fix the issue for real with my patch in crypto-2.6/master, i.e. the upcoming v4.7. That adds back the size check so userland can't play tricks with us. The patch already contains the Cc to stable, so the fix will eventually end up in the LTS kernel releases used by distros, so this regression should be fixed in a few weeks. Regards, Mathias -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] crypto: user - re-add size check for CRYPTO_MSG_GETALG
Commit 9aa867e46565 ("crypto: user - Add CRYPTO_MSG_DELRNG") accidentally removed the minimum size check for CRYPTO_MSG_GETALG netlink messages. This allows userland to send a truncated CRYPTO_MSG_GETALG message as short as a netlink header only making crypto_report() operate on uninitialized memory by accessing data beyond the end of the netlink message. Fix this be re-adding the minimum required size of CRYPTO_MSG_GETALG messages to the crypto_msg_min[] array. Fixes: 9aa867e46565 ("crypto: user - Add CRYPTO_MSG_DELRNG") Cc: sta...@vger.kernel.org # v4.2 Signed-off-by: Mathias Krause <mini...@googlemail.com> Cc: Steffen Klassert <steffen.klass...@secunet.com> --- This should go on top of crypto-2.6/master. crypto/crypto_user.c |1 + 1 file changed, 1 insertion(+) diff --git a/crypto/crypto_user.c b/crypto/crypto_user.c index 43fe85f20d57..7097a3395b25 100644 --- a/crypto/crypto_user.c +++ b/crypto/crypto_user.c @@ -455,6 +455,7 @@ static const int crypto_msg_min[CRYPTO_NR_MSGTYPES] = { [CRYPTO_MSG_NEWALG - CRYPTO_MSG_BASE] = MSGSIZE(crypto_user_alg), [CRYPTO_MSG_DELALG - CRYPTO_MSG_BASE] = MSGSIZE(crypto_user_alg), [CRYPTO_MSG_UPDATEALG - CRYPTO_MSG_BASE] = MSGSIZE(crypto_user_alg), + [CRYPTO_MSG_GETALG - CRYPTO_MSG_BASE] = MSGSIZE(crypto_user_alg), [CRYPTO_MSG_DELRNG - CRYPTO_MSG_BASE] = 0, }; -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] crypto: user - lock crypto_alg_list on alg dump
We miss to take the crypto_alg_sem semaphore when traversing the crypto_alg_list for CRYPTO_MSG_GETALG dumps. This allows a race with crypto_unregister_alg() removing algorithms from the list while we're still traversing it, thereby leading to a use-after-free as show below: [ 3482.071639] general protection fault: [#1] SMP [ 3482.075639] Modules linked in: aes_x86_64 glue_helper lrw ablk_helper cryptd gf128mul ipv6 pcspkr serio_raw virtio_net microcode virtio_pci virtio_ring virtio sr_mod cdrom [last unloaded: aesni_intel] [ 3482.075639] CPU: 1 PID: 11065 Comm: crconf Not tainted 4.3.4-grsec+ #126 [ 3482.075639] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014 [ 3482.075639] task: 88001cd41a40 ti: 88001cd422c8 task.ti: 88001cd422c8 [ 3482.075639] RIP: 0010:[] [] strncpy+0x13/0x30 [ 3482.075639] RSP: 0018:88001f713b60 EFLAGS: 00010202 [ 3482.075639] RAX: 88001f6c4430 RBX: 88001f6c43a0 RCX: 88001f6c4430 [ 3482.075639] RDX: 0040 RSI: fefefefefefeff16 RDI: 88001f6c4430 [ 3482.075639] RBP: 88001f713b60 R08: 88001f6c4470 R09: 88001f6c4480 [ 3482.075639] R10: 0002 R11: 0246 R12: 88001ce2aa28 [ 3482.075639] R13: 88093700 R14: 88001f5e4bf8 R15: 3b20 [ 3482.075639] FS: 033826fa2700() GS:88001e90() knlGS: [ 3482.075639] CS: 0010 DS: ES: CR0: 80050033 [ 3482.075639] CR2: ff600400 CR3: 139ec000 CR4: 001606f0 [ 3482.075639] Stack: [ 3482.075639] 88001f713bd8 936ccd00 88001e5c4200 88093700 [ 3482.075639] 88001f713bd0 938ef4bf 3b20 [ 3482.075639] 88001f5e4bf8 88001f5e4848 3b20 [ 3482.075639] Call Trace: [ 3482.075639] [] crypto_report_alg+0xc0/0x3e0 [ 3482.075639] [] ? __alloc_skb+0x16f/0x300 [ 3482.075639] [] crypto_dump_report+0x6a/0x90 [ 3482.075639] [] netlink_dump+0x147/0x2e0 [ 3482.075639] [] __netlink_dump_start+0x159/0x190 [ 3482.075639] [] crypto_user_rcv_msg+0xc3/0x130 [ 3482.075639] [] ? crypto_report_alg+0x3e0/0x3e0 [ 3482.075639] [] ? alg_test_crc32c+0x120/0x120 [ 3482.075639] [] ? __netlink_lookup+0xd5/0x120 [ 3482.075639] [] ? crypto_add_alg+0x1d0/0x1d0 [ 3482.075639] [] netlink_rcv_skb+0xe1/0x130 [ 3482.075639] [] crypto_netlink_rcv+0x28/0x40 [ 3482.075639] [] netlink_unicast+0x108/0x180 [ 3482.075639] [] netlink_sendmsg+0x541/0x770 [ 3482.075639] [] sock_sendmsg+0x21/0x40 [ 3482.075639] [] SyS_sendto+0xf3/0x130 [ 3482.075639] [] ? bad_area_nosemaphore+0x13/0x20 [ 3482.075639] [] ? __do_page_fault+0x80/0x3a0 [ 3482.075639] [] entry_SYSCALL_64_fastpath+0x12/0x6e [ 3482.075639] Code: 88 4a ff 75 ed 5d 48 0f ba 2c 24 3f c3 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 85 d2 48 89 f8 48 89 f9 4c 8d 04 17 48 89 e5 74 15 <0f> b6 16 80 fa 01 88 11 48 83 de ff 48 83 c1 01 4c 39 c1 75 eb [ 3482.075639] RIP [] strncpy+0x13/0x30 To trigger the race run the following loops simultaneously for a while: $ while : ; do modprobe aesni-intel; rmmod aesni-intel; done $ while : ; do crconf show all > /dev/null; done Fix the race by taking the crypto_alg_sem read lock, thereby preventing crypto_unregister_alg() from modifying the algorithm list during the dump. This bug has been detected by the PaX memory sanitize feature. Signed-off-by: Mathias Krause <mini...@googlemail.com> Cc: Steffen Klassert <steffen.klass...@secunet.com> Cc: PaX Team <pagee...@freemail.hu> --- crypto/crypto_user.c |6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/crypto/crypto_user.c b/crypto/crypto_user.c index 237f3795cfaa..43fe85f20d57 100644 --- a/crypto/crypto_user.c +++ b/crypto/crypto_user.c @@ -499,6 +499,7 @@ static int crypto_user_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh) if (link->dump == NULL) return -EINVAL; + down_read(_alg_sem); list_for_each_entry(alg, _alg_list, cra_list) dump_alloc += CRYPTO_REPORT_MAXSIZE; @@ -508,8 +509,11 @@ static int crypto_user_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh) .done = link->done, .min_dump_alloc = dump_alloc, }; - return netlink_dump_start(crypto_nlsk, skb, nlh, ); + err = netlink_dump_start(crypto_nlsk, skb, nlh, ); } + up_read(_alg_sem); + + return err; } err = nlmsg_parse(nlh, crypto_msg_min[type], attrs, CRYPTOCFGA_MAX, -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BISECTED] 4943ba16 (include crypto- module prefix) breaks wifi
On 18 February 2015 at 07:34, George Spelvin li...@horizon.com wrote: It's trying to load modules named: crypto-ccm(aes) crypto-ccm(aes)-all crypto-ctr(aes) crypto-ctr(aes)-all depmod -n doesn't show any aliases with parens in their names, That's okay. Also that it fails to load these as it'll fall back trying to load modules for the templates in that case, as can be seen in your log: Wed Feb 18 06:58:10 UTC 2015 /sbin/modprobe -q -- crypto-ctr What's curious, however, that it only tries to load the template for ctr, not for ccm. :/ Are the logs complete? Could you please simplify your /sbin/x/modprobe wrapper to just output the modprobe call, as Kees suggested? Also, could you please provide the output of depmod -n | grep crypto-? There should be lines for crypto-ccm and crypto-ctr if you build them as modules. Mathias -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BISECTED] 4943ba16 (include crypto- module prefix) breaks wifi
On 17 February 2015 at 04:09, George Spelvin li...@horizon.com wrote: I discovered when (belatedly) testing 3.19-rc7 the other week that my laptop wifi was broken and would no longer associate. Apparently this is causing some necessary crypto algorithms to fail to load, breaking my wifi. Perhaps I'm displaying my ignorance of what's supposed to happen, but shouldn't make install have installed some files with names like /lib/modules/`uname r`/kernel/crypto/crypto-*.ko? No, the module names do not change. They just got another module alias with the crypto- prefix. Or is it something only I'm hitting because I have a lot of common crypto algorithms statically compiled into my kernel? CONFIG_CRYPTO_CBC=y CONFIG_CRYPTO_HMAC=y CONFIG_CRYPTO_MD5=y CONFIG_CRYPTO_SHA1=y CONFIG_CRYPTO_AES=y CONFIG_CRYPTO_AES_586=y CONFIG_CRYPTO_ARC4=y Trying this on kernel 4943ba16 produces instead an endless loop of: wlan1: SME: Trying to authenticate with aa:bb:cc:dd:ee:ff (SSID='FOO' freq=24xx MHz) wlan1: Trying to associate with aa:bb:cc:dd:ee:ff (SSID='FOO' freq=24xx MHz) wlan1: Associated with aa:bb:cc:dd:ee:ff wlan1: WPA: Failed to set PTK to the driver (alg=3 keylen=16 bssid=aa:bb:cc:dd:ee:ff) wlan1: CTRL-EVENT-DISCONNECTED bssid=aa:bb:cc:dd:ee:ff reason=1 The kernel logs are not particularly informative. They just go through the usual successful series, but end with wlan1: RxAssocResp from aa:bb:cc:dd:ee:ff (capab=0x431 status=0 aid=1) wlan1: associated wlan1: deauthenticating from 11:bb:cc:dd:ee:ff by local choice (Reason: 1=UNSPECIFIED) While successful connection ends before that last line. Commit 4943ba16bbc2 was incomplete and could have caused regressions like the above. Those should have been fixed with commits 4943ba16bbc2 + 3e14dcf7cb80. However, those should be in v3.19-rc7 already, so I'm not much of a help here :( Mathias -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 6/6] crypto: x86/des3_ede - drop bogus module aliases
This module implements variations of des3_ede only. Drop the bogus module aliases for des. Cc: Jussi Kivilinna jussi.kivili...@iki.fi Signed-off-by: Mathias Krause mini...@googlemail.com --- arch/x86/crypto/des3_ede_glue.c |2 -- 1 file changed, 2 deletions(-) diff --git a/arch/x86/crypto/des3_ede_glue.c b/arch/x86/crypto/des3_ede_glue.c index 38a14f818ef1..d6fc59df 100644 --- a/arch/x86/crypto/des3_ede_glue.c +++ b/arch/x86/crypto/des3_ede_glue.c @@ -504,6 +504,4 @@ MODULE_LICENSE(GPL); MODULE_DESCRIPTION(Triple DES EDE Cipher Algorithm, asm optimized); MODULE_ALIAS_CRYPTO(des3_ede); MODULE_ALIAS_CRYPTO(des3_ede-asm); -MODULE_ALIAS_CRYPTO(des); -MODULE_ALIAS_CRYPTO(des-asm); MODULE_AUTHOR(Jussi Kivilinna jussi.kivili...@iki.fi); -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/6] crypto: sparc64/md5 - fix module description
MD5 is not SHA1. Cc: David S. Miller da...@davemloft.net Signed-off-by: Mathias Krause mini...@googlemail.com --- arch/sparc/crypto/md5_glue.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/sparc/crypto/md5_glue.c b/arch/sparc/crypto/md5_glue.c index 64c7ff5f72a9..b688731d7ede 100644 --- a/arch/sparc/crypto/md5_glue.c +++ b/arch/sparc/crypto/md5_glue.c @@ -183,7 +183,7 @@ module_init(md5_sparc64_mod_init); module_exit(md5_sparc64_mod_fini); MODULE_LICENSE(GPL); -MODULE_DESCRIPTION(MD5 Secure Hash Algorithm, sparc64 md5 opcode accelerated); +MODULE_DESCRIPTION(MD5 Message Digest Algorithm, sparc64 md5 opcode accelerated); MODULE_ALIAS_CRYPTO(md5); -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/6] crypto: sparc64/camellia - fix module alias
The module alias should be camellia, not aes. Cc: David S. Miller da...@davemloft.net Signed-off-by: Mathias Krause mini...@googlemail.com --- arch/sparc/crypto/camellia_glue.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/sparc/crypto/camellia_glue.c b/arch/sparc/crypto/camellia_glue.c index 641f55cb61c3..6bf2479a12fb 100644 --- a/arch/sparc/crypto/camellia_glue.c +++ b/arch/sparc/crypto/camellia_glue.c @@ -322,6 +322,6 @@ module_exit(camellia_sparc64_mod_fini); MODULE_LICENSE(GPL); MODULE_DESCRIPTION(Camellia Cipher Algorithm, sparc64 camellia opcode accelerated); -MODULE_ALIAS_CRYPTO(aes); +MODULE_ALIAS_CRYPTO(camellia); #include crop_devid.c -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/6] crypto: add missing crypto module aliases
Commit 5d26a105b5a7 (crypto: prefix module autoloading with crypto-) changed the automatic module loading when requesting crypto algorithms to prefix all module requests with crypto-. This requires all crypto modules to have a crypto specific module alias even if their file name would otherwise match the requested crypto algorithm. Even though commit 5d26a105b5a7 added those aliases for a vast amount of modules, it was missing a few. Add the required MODULE_ALIAS_CRYPTO annotations to those files to make them get loaded automatically, again. This fixes, e.g., requesting 'ecb(blowfish-generic)', which used to work with kernels v3.18 and below. Also change MODULE_ALIAS() lines to MODULE_ALIAS_CRYPTO(). The former won't work for crypto modules any more. Fixes: 5d26a105b5a7 (crypto: prefix module autoloading with crypto-) Cc: Kees Cook keesc...@chromium.org Signed-off-by: Mathias Krause mini...@googlemail.com --- arch/powerpc/crypto/sha1.c |1 + arch/x86/crypto/sha-mb/sha1_mb.c |2 +- crypto/aes_generic.c |1 + crypto/ansi_cprng.c |1 + crypto/blowfish_generic.c|1 + crypto/camellia_generic.c|1 + crypto/cast5_generic.c |1 + crypto/cast6_generic.c |1 + crypto/crc32c_generic.c |1 + crypto/crct10dif_generic.c |1 + crypto/des_generic.c |7 --- crypto/ghash-generic.c |1 + crypto/krng.c|1 + crypto/salsa20_generic.c |1 + crypto/serpent_generic.c |1 + crypto/sha1_generic.c|1 + crypto/sha256_generic.c |2 ++ crypto/sha512_generic.c |2 ++ crypto/tea.c |1 + crypto/tgr192.c |1 + crypto/twofish_generic.c |1 + crypto/wp512.c |1 + 22 files changed, 27 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/crypto/sha1.c b/arch/powerpc/crypto/sha1.c index d3feba5a275f..c154cebc1041 100644 --- a/arch/powerpc/crypto/sha1.c +++ b/arch/powerpc/crypto/sha1.c @@ -154,4 +154,5 @@ module_exit(sha1_powerpc_mod_fini); MODULE_LICENSE(GPL); MODULE_DESCRIPTION(SHA1 Secure Hash Algorithm); +MODULE_ALIAS_CRYPTO(sha1); MODULE_ALIAS_CRYPTO(sha1-powerpc); diff --git a/arch/x86/crypto/sha-mb/sha1_mb.c b/arch/x86/crypto/sha-mb/sha1_mb.c index a225a5ca1037..fd9f6b035b16 100644 --- a/arch/x86/crypto/sha-mb/sha1_mb.c +++ b/arch/x86/crypto/sha-mb/sha1_mb.c @@ -931,4 +931,4 @@ module_exit(sha1_mb_mod_fini); MODULE_LICENSE(GPL); MODULE_DESCRIPTION(SHA1 Secure Hash Algorithm, multi buffer accelerated); -MODULE_ALIAS(sha1); +MODULE_ALIAS_CRYPTO(sha1); diff --git a/crypto/aes_generic.c b/crypto/aes_generic.c index 9b3c54c1cbe8..3dd101144a58 100644 --- a/crypto/aes_generic.c +++ b/crypto/aes_generic.c @@ -1475,3 +1475,4 @@ module_exit(aes_fini); MODULE_DESCRIPTION(Rijndael (AES) Cipher Algorithm); MODULE_LICENSE(Dual BSD/GPL); MODULE_ALIAS_CRYPTO(aes); +MODULE_ALIAS_CRYPTO(aes-generic); diff --git a/crypto/ansi_cprng.c b/crypto/ansi_cprng.c index b4485a108389..6f5bebc9bf01 100644 --- a/crypto/ansi_cprng.c +++ b/crypto/ansi_cprng.c @@ -477,3 +477,4 @@ MODULE_PARM_DESC(dbg, Boolean to enable debugging (0/1 == off/on)); module_init(prng_mod_init); module_exit(prng_mod_fini); MODULE_ALIAS_CRYPTO(stdrng); +MODULE_ALIAS_CRYPTO(ansi_cprng); diff --git a/crypto/blowfish_generic.c b/crypto/blowfish_generic.c index 7bd71f02d0dd..87b392a77a93 100644 --- a/crypto/blowfish_generic.c +++ b/crypto/blowfish_generic.c @@ -139,3 +139,4 @@ module_exit(blowfish_mod_fini); MODULE_LICENSE(GPL); MODULE_DESCRIPTION(Blowfish Cipher Algorithm); MODULE_ALIAS_CRYPTO(blowfish); +MODULE_ALIAS_CRYPTO(blowfish-generic); diff --git a/crypto/camellia_generic.c b/crypto/camellia_generic.c index 1b74c5a3e891..a02286bf319e 100644 --- a/crypto/camellia_generic.c +++ b/crypto/camellia_generic.c @@ -1099,3 +1099,4 @@ module_exit(camellia_fini); MODULE_DESCRIPTION(Camellia Cipher Algorithm); MODULE_LICENSE(GPL); MODULE_ALIAS_CRYPTO(camellia); +MODULE_ALIAS_CRYPTO(camellia-generic); diff --git a/crypto/cast5_generic.c b/crypto/cast5_generic.c index 84c86db67ec7..df5c72629383 100644 --- a/crypto/cast5_generic.c +++ b/crypto/cast5_generic.c @@ -550,3 +550,4 @@ module_exit(cast5_mod_fini); MODULE_LICENSE(GPL); MODULE_DESCRIPTION(Cast5 Cipher Algorithm); MODULE_ALIAS_CRYPTO(cast5); +MODULE_ALIAS_CRYPTO(cast5-generic); diff --git a/crypto/cast6_generic.c b/crypto/cast6_generic.c index f408f0bd8de2..058c8d755d03 100644 --- a/crypto/cast6_generic.c +++ b/crypto/cast6_generic.c @@ -292,3 +292,4 @@ module_exit(cast6_mod_fini); MODULE_LICENSE(GPL); MODULE_DESCRIPTION(Cast6 Cipher Algorithm); MODULE_ALIAS_CRYPTO(cast6); +MODULE_ALIAS_CRYPTO(cast6-generic); diff --git a/crypto/crc32c_generic.c b/crypto/crc32c_generic.c index 2a062025749d..06f1b60f02b2 100644 --- a/crypto/crc32c_generic.c +++ b/crypto/crc32c_generic.c @@ -171,4 +171,5
[PATCH] crypto: aesni - fix by8 variant for 128 bit keys
The by8 counter mode optimization is broken for 128 bit keys with input data longer than 128 bytes. It uses the wrong key material for en- and decryption. The key registers xkey0, xkey4, xkey8 and xkey12 need to be preserved in case we're handling more than 128 bytes of input data -- they won't get reloaded after the initial load. They must therefore be (a) loaded on the first iteration and (b) be preserved for the latter ones. The implementation for 128 bit keys does not comply with (a) nor (b). Fix this by bringing the implementation back to its original source and correctly load the key registers and preserve their values by *not* re-using the registers for other purposes. Kudos to James for reporting the issue and providing a test case showing the discrepancies. Reported-by: James Yonan ja...@openvpn.net Cc: Chandramouli Narayanan mo...@linux.intel.com Cc: sta...@vger.kernel.org # v3.18 Signed-off-by: Mathias Krause mini...@googlemail.com --- James, this should fix the issue for you, too -- it did for me, using your test case. Feel free to add your 'Tested-by' if it does. This patch should go on top of crypto-2.6.git#master as it's a fix that needs to go to stable as well. arch/x86/crypto/aes_ctrby8_avx-x86_64.S | 46 +++ 1 file changed, 35 insertions(+), 11 deletions(-) diff --git a/arch/x86/crypto/aes_ctrby8_avx-x86_64.S b/arch/x86/crypto/aes_ctrby8_avx-x86_64.S index 2df2a0298f5a..a916c4a61165 100644 --- a/arch/x86/crypto/aes_ctrby8_avx-x86_64.S +++ b/arch/x86/crypto/aes_ctrby8_avx-x86_64.S @@ -208,7 +208,7 @@ ddq_add_8: .if (klen == KEY_128) .if (load_keys) - vmovdqa 3*16(p_keys), xkeyA + vmovdqa 3*16(p_keys), xkey4 .endif .else vmovdqa 3*16(p_keys), xkeyA @@ -224,7 +224,7 @@ ddq_add_8: add $(16*by), p_in .if (klen == KEY_128) - vmovdqa 4*16(p_keys), xkey4 + vmovdqa 4*16(p_keys), xkeyB .else .if (load_keys) vmovdqa 4*16(p_keys), xkey4 @@ -234,7 +234,12 @@ ddq_add_8: .set i, 0 .rept by club XDATA, i - vaesenc xkeyA, var_xdata, var_xdata /* key 3 */ + /* key 3 */ + .if (klen == KEY_128) + vaesenc xkey4, var_xdata, var_xdata + .else + vaesenc xkeyA, var_xdata, var_xdata + .endif .set i, (i +1) .endr @@ -243,13 +248,18 @@ ddq_add_8: .set i, 0 .rept by club XDATA, i - vaesenc xkey4, var_xdata, var_xdata /* key 4 */ + /* key 4 */ + .if (klen == KEY_128) + vaesenc xkeyB, var_xdata, var_xdata + .else + vaesenc xkey4, var_xdata, var_xdata + .endif .set i, (i +1) .endr .if (klen == KEY_128) .if (load_keys) - vmovdqa 6*16(p_keys), xkeyB + vmovdqa 6*16(p_keys), xkey8 .endif .else vmovdqa 6*16(p_keys), xkeyB @@ -267,12 +277,17 @@ ddq_add_8: .set i, 0 .rept by club XDATA, i - vaesenc xkeyB, var_xdata, var_xdata /* key 6 */ + /* key 6 */ + .if (klen == KEY_128) + vaesenc xkey8, var_xdata, var_xdata + .else + vaesenc xkeyB, var_xdata, var_xdata + .endif .set i, (i +1) .endr .if (klen == KEY_128) - vmovdqa 8*16(p_keys), xkey8 + vmovdqa 8*16(p_keys), xkeyB .else .if (load_keys) vmovdqa 8*16(p_keys), xkey8 @@ -288,7 +303,7 @@ ddq_add_8: .if (klen == KEY_128) .if (load_keys) - vmovdqa 9*16(p_keys), xkeyA + vmovdqa 9*16(p_keys), xkey12 .endif .else vmovdqa 9*16(p_keys), xkeyA @@ -297,7 +312,12 @@ ddq_add_8: .set i, 0 .rept by club XDATA, i - vaesenc xkey8, var_xdata, var_xdata /* key 8 */ + /* key 8 */ + .if (klen == KEY_128) + vaesenc xkeyB, var_xdata, var_xdata + .else + vaesenc xkey8, var_xdata, var_xdata + .endif .set i, (i +1) .endr @@ -306,7 +326,12 @@ ddq_add_8: .set i, 0 .rept by club XDATA, i - vaesenc xkeyA, var_xdata, var_xdata /* key 9 */ + /* key 9 */ + .if (klen == KEY_128) + vaesenc xkey12, var_xdata, var_xdata + .else
Re: [PATCH] crypto: aesni - disable by8 AVX CTR optimization
Hi James, On 11 December 2014 at 09:52, James Yonan ja...@openvpn.net wrote: I'm seeing some anomalous results with the by8 AVX CTR optimization in 3.18. the patch you're replying to actually *disabled* the by8 variant for v3.17 as it had another bug related to wrong counter handling in GCM. The fix for that particular issue only made it to v3.18, so the code got re-enabled only for v3.18. But it looks like that there's yet another bug :/ In particular, crypto_aead_encrypt appears to produce different ciphertext from the same plaintext depending on whether or not the optimization is enabled. See the attached patch to tcrypt that demonstrates the discrepancy. I can reproduce your observations, so I can confirm the difference, when using the by8 variant compared to other AES implementations. When applying this very patch (commit 7da4b29d496b (crypto: aesni - disable by8 AVX CTR optimization)) -- the patch that disables the by8 variant -- on top of v3.18 the discrepancies are gone. So the behavior is bound to the by8 optimization, only. As it was Chandramouli, who contributed the code, maybe he has a clue what's wrong here. Chandramouli? Mathias James -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] crypto: include crypto- module prefix in template
On 24 November 2014 at 20:17, Kees Cook keesc...@chromium.org wrote: This adds the module loading prefix crypto- to the template lookup as well. For example, attempting to load 'vfat(blowfish)' via AF_ALG now correctly includes the crypto- prefix at every level, correctly rejecting vfat: net-pf-38 algif-hash crypto-vfat(blowfish) crypto-vfat(blowfish)-all crypto-vfat Reported-by: Mathias Krause mini...@googlemail.com Signed-off-by: Kees Cook keesc...@chromium.org --- crypto/algapi.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) That commit will break the user API again as ciphers like 'cbc(aes)' won't work any more -- as the cbc module won't be loaded. You're missing the MODULE_ALIAS_CRYPTO() annotaions for all the crypto templates -- cbc, ctr, xts, hmac, ... Regards, Mathias -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] crypto: prefix module autoloading with crypto-
On 18 November 2014 01:45, Kees Cook keesc...@chromium.org wrote: On Mon, Nov 17, 2014 at 3:20 PM, Mathias Krause mini...@googlemail.com wrote: On 17 November 2014 21:02, Kees Cook keesc...@chromium.org wrote: This prefixes all crypto module loading with crypto- so we never run the risk of exposing module auto-loading to userspace via a crypto API, as demonstrated by Mathias Krause: https://lkml.org/lkml/2013/3/4/70 Signed-off-by: Kees Cook keesc...@chromium.org --- v2: - added missing #include, thanks to minipli - built with allmodconfig [...snip...] diff --git a/include/linux/crypto.h b/include/linux/crypto.h index d45e949699ea..d14230f6e977 100644 --- a/include/linux/crypto.h +++ b/include/linux/crypto.h @@ -26,6 +26,13 @@ #include linux/uaccess.h /* + * Autoloaded crypto modules should only use a prefixed name to avoid allowing + * arbitrary modules to be loaded. + */ +#define MODULE_ALIAS_CRYPTO(name) \ + MODULE_ALIAS(crypto- name) This would break userland relying on the old aliases, e.g. 'modprobe aes' no longer works. Why not have both aliases, one with the crypto- prefix for on-demand loading within the crypto API and one without for manual loading from userland? E.g., something like this: #define MODULE_ALIAS_CRYPTO(name) \ MODULE_ALIAS(name); \ MODULE_ALIAS(crypto- name) That would prevent the userland breakage and still achieves the goal of restricting the request_module() call offered by the means of the AF_ALG API. That was my intention originally, and I should go back to it. The trouble is with the use of __UNIQUE_ID in the MODULE_ALIAS macro. It uses __LINE__ to produce the id, so the suggested macro expansion (which is what I started with) won't work on non-gcc compilers. I haven't found any solutions for C89 version of gcc's __COUNTER__, and I haven't found any C89 ways to force a macro to be expanded as being multi-line. Well, clang should support it as well, according to [1]. But still, a compiler independent solution would be nice. Anyway, the __COUNTER__ support is gcc = 4.3 only. So, according to Documentation/Changes, stating gcc 3.2 is the minimum supported version for compiling the kernel, this would be a no-go, too. [1] http://clang.llvm.org/docs/LanguageExtensions.html#builtin-macros I'd like to avoid having to open-code both MODULE_ALIAS and MODULE_ALIAS_CRYPTO in each module's source. Anyone see some sneaky way to accomplish this? Unfortunately, I do not ... beside this, maybe: #define MODULE_ALIAS_CRYPTO(name) \ __MODULE_INFO(alias, alias_userland, name); \ __MODULE_INFO(alias, alias_crypto, crypto- name) Looks ugly, but works. ;) Regards, Mathias -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] crypto: prefix module autoloading with crypto-
On 17 November 2014 16:09, Herbert Xu herb...@gondor.apana.org.au wrote: On Fri, Nov 14, 2014 at 01:34:59PM -0800, Kees Cook wrote: This prefixes all crypto module loading with crypto- so we never run the risk of exposing module auto-loading to userspace via a crypto API, as demonstrated by Mathias Krause: https://lkml.org/lkml/2013/3/4/70 Signed-off-by: Kees Cook keesc...@chromium.org Sorry but this doesn't build for me: CC [M] drivers/crypto/qat/qat_common/adf_ctl_drv.o drivers/crypto/qat/qat_common/adf_ctl_drv.c:490:21: error: expected declaration specifiers or ‘...’ before string constant make[3]: *** [drivers/crypto/qat/qat_common/adf_ctl_drv.o] Error 1 make[2]: *** [drivers/crypto/qat/qat_common] Error 2 make[1]: *** [drivers/crypto/qat] Error 2 make[1]: *** Waiting for unfinished jobs Looks like drivers/crypto/qat/qat_common/adf_ctl_drv.c is missing '#include linux/crypto.h'. Mathias -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] crypto: prefix module autoloading with crypto-
On 17 November 2014 21:02, Kees Cook keesc...@chromium.org wrote: This prefixes all crypto module loading with crypto- so we never run the risk of exposing module auto-loading to userspace via a crypto API, as demonstrated by Mathias Krause: https://lkml.org/lkml/2013/3/4/70 Signed-off-by: Kees Cook keesc...@chromium.org --- v2: - added missing #include, thanks to minipli - built with allmodconfig --- arch/arm/crypto/aes_glue.c | 4 ++-- arch/arm/crypto/sha1_glue.c | 2 +- arch/arm/crypto/sha1_neon_glue.c| 2 +- arch/arm/crypto/sha512_neon_glue.c | 4 ++-- arch/arm64/crypto/aes-ce-ccm-glue.c | 2 +- arch/arm64/crypto/aes-glue.c| 8 arch/powerpc/crypto/sha1.c | 2 +- arch/s390/crypto/aes_s390.c | 2 +- arch/s390/crypto/des_s390.c | 4 ++-- arch/s390/crypto/ghash_s390.c | 2 +- arch/s390/crypto/sha1_s390.c| 2 +- arch/s390/crypto/sha256_s390.c | 4 ++-- arch/s390/crypto/sha512_s390.c | 4 ++-- arch/sparc/crypto/aes_glue.c| 2 +- arch/sparc/crypto/camellia_glue.c | 2 +- arch/sparc/crypto/crc32c_glue.c | 2 +- arch/sparc/crypto/des_glue.c| 2 +- arch/sparc/crypto/md5_glue.c| 2 +- arch/sparc/crypto/sha1_glue.c | 2 +- arch/sparc/crypto/sha256_glue.c | 4 ++-- arch/sparc/crypto/sha512_glue.c | 4 ++-- arch/x86/crypto/aes_glue.c | 4 ++-- arch/x86/crypto/aesni-intel_glue.c | 2 +- arch/x86/crypto/blowfish_glue.c | 4 ++-- arch/x86/crypto/camellia_aesni_avx2_glue.c | 4 ++-- arch/x86/crypto/camellia_aesni_avx_glue.c | 4 ++-- arch/x86/crypto/camellia_glue.c | 4 ++-- arch/x86/crypto/cast5_avx_glue.c| 2 +- arch/x86/crypto/cast6_avx_glue.c| 2 +- arch/x86/crypto/crc32-pclmul_glue.c | 4 ++-- arch/x86/crypto/crc32c-intel_glue.c | 4 ++-- arch/x86/crypto/crct10dif-pclmul_glue.c | 4 ++-- arch/x86/crypto/des3_ede_glue.c | 8 arch/x86/crypto/ghash-clmulni-intel_glue.c | 2 +- arch/x86/crypto/salsa20_glue.c | 4 ++-- arch/x86/crypto/serpent_avx2_glue.c | 4 ++-- arch/x86/crypto/serpent_avx_glue.c | 2 +- arch/x86/crypto/serpent_sse2_glue.c | 2 +- arch/x86/crypto/sha1_ssse3_glue.c | 2 +- arch/x86/crypto/sha256_ssse3_glue.c | 4 ++-- arch/x86/crypto/sha512_ssse3_glue.c | 4 ++-- arch/x86/crypto/twofish_avx_glue.c | 2 +- arch/x86/crypto/twofish_glue.c | 4 ++-- arch/x86/crypto/twofish_glue_3way.c | 4 ++-- crypto/842.c| 1 + crypto/aes_generic.c| 2 +- crypto/ansi_cprng.c | 2 +- crypto/anubis.c | 1 + crypto/api.c| 4 ++-- crypto/arc4.c | 1 + crypto/blowfish_generic.c | 2 +- crypto/camellia_generic.c | 2 +- crypto/cast5_generic.c | 2 +- crypto/cast6_generic.c | 2 +- crypto/ccm.c| 4 ++-- crypto/crc32.c | 1 + crypto/crc32c_generic.c | 2 +- crypto/crct10dif_generic.c | 2 +- crypto/crypto_null.c| 6 +++--- crypto/ctr.c| 2 +- crypto/deflate.c| 2 +- crypto/des_generic.c| 2 +- crypto/fcrypt.c | 1 + crypto/gcm.c| 6 +++--- crypto/ghash-generic.c | 2 +- crypto/khazad.c | 1 + crypto/krng.c | 2 +- crypto/lz4.c| 1 + crypto/lz4hc.c | 1 + crypto/lzo.c| 1 + crypto/md4.c| 2 +- crypto/md5.c| 1 + crypto/michael_mic.c| 1 + crypto/rmd128.c | 1 + crypto/rmd160.c | 1 + crypto/rmd256.c | 1 + crypto/rmd320.c | 1 + crypto/salsa20_generic.c| 2 +- crypto/seed.c | 1 + crypto/serpent_generic.c| 4 ++-- crypto/sha1_generic.c | 2 +- crypto/sha256_generic.c | 4 ++-- crypto/sha512_generic.c | 4 ++-- crypto/tea.c| 4 ++-- crypto/tgr192.c | 4 ++-- crypto
[PATCH 0/3] crypto: aesni - fix and re-enable by8 CTR variant
This series fixes the counter overflow handling of the by8 CTR variant which lead to failing cryptomgr tests and, in turn, disabling this optimization with commit 7da4b29d496b. Patch 1 fixes the bug, patch 2 removes some unused defines (left overs from the unification of the initial source files) and patch 3 re-enables the code. The fix was tested by me, doing tcrypt and dm-crypt tests. It was also tested by Romain who initially reported the issue. The patches should go on top of crypto-2.6.git. In case this doesn't get merged for v3.17, patches 1 and 3 may be cc'ed to stable to propagate the fix. Please apply! Thanks, Mathias Mathias Krause (3): crypto: aesni - fix counter overflow handling in by8 variant crypto: aesni - remove unused defines in by8 variant Revert crypto: aesni - disable by8 AVX CTR optimization arch/x86/crypto/aes_ctrby8_avx-x86_64.S | 20 +++- arch/x86/crypto/aesni-intel_glue.c |4 ++-- 2 files changed, 17 insertions(+), 7 deletions(-) -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/3] Revert crypto: aesni - disable by8 AVX CTR optimization
This reverts commit 7da4b29d496b1389d3a29b55d3668efecaa08ebd. Now, that the issue is fixed, we can re-enable the code. Signed-off-by: Mathias Krause mini...@googlemail.com Cc: Chandramouli Narayanan mo...@linux.intel.com --- arch/x86/crypto/aesni-intel_glue.c |4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/x86/crypto/aesni-intel_glue.c b/arch/x86/crypto/aesni-intel_glue.c index a7ccd57f19e4..888950f29fd9 100644 --- a/arch/x86/crypto/aesni-intel_glue.c +++ b/arch/x86/crypto/aesni-intel_glue.c @@ -481,7 +481,7 @@ static void ctr_crypt_final(struct crypto_aes_ctx *ctx, crypto_inc(ctrblk, AES_BLOCK_SIZE); } -#if 0 /* temporary disabled due to failing crypto tests */ +#ifdef CONFIG_AS_AVX static void aesni_ctr_enc_avx_tfm(struct crypto_aes_ctx *ctx, u8 *out, const u8 *in, unsigned int len, u8 *iv) { @@ -1522,7 +1522,7 @@ static int __init aesni_init(void) aesni_gcm_dec_tfm = aesni_gcm_dec; } aesni_ctr_enc_tfm = aesni_ctr_enc; -#if 0 /* temporary disabled due to failing crypto tests */ +#ifdef CONFIG_AS_AVX if (cpu_has_avx) { /* optimize performance of ctr mode encryption transform */ aesni_ctr_enc_tfm = aesni_ctr_enc_avx_tfm; -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/3] crypto: aesni - fix counter overflow handling in by8 variant
The by8 CTR AVX implementation fails to propperly handle counter overflows. That was the reason it got disabled in commit 7da4b29d496b (crypto: aesni - disable by8 AVX CTR optimization). Fix the overflow handling by incrementing the counter block as a double quad word, i.e. a 128 bit, and testing for overflows afterwards. We need to use VPTEST to do so as VPADD* does not set the flags itself and silently drops the carry bit. As this change adds branches to the hot path, minor performance regressions might be a side effect. But, OTOH, we now have a conforming implementation -- the preferable goal. A tcrypt test on a SandyBridge system (i7-2620M) showed almost identical numbers for the old and this version with differences within the noise range. A dm-crypt test with the fixed version gave even slightly better results for this version. So the performance impact might not be as big as expected. Tested-by: Romain Francoise rom...@orebokech.com Signed-off-by: Mathias Krause mini...@googlemail.com Cc: Chandramouli Narayanan mo...@linux.intel.com --- arch/x86/crypto/aes_ctrby8_avx-x86_64.S | 17 +++-- 1 file changed, 15 insertions(+), 2 deletions(-) diff --git a/arch/x86/crypto/aes_ctrby8_avx-x86_64.S b/arch/x86/crypto/aes_ctrby8_avx-x86_64.S index f091f122ed24..a029bc744244 100644 --- a/arch/x86/crypto/aes_ctrby8_avx-x86_64.S +++ b/arch/x86/crypto/aes_ctrby8_avx-x86_64.S @@ -108,6 +108,10 @@ byteswap_const: .octa 0x000102030405060708090A0B0C0D0E0F +ddq_low_msk: + .octa 0x +ddq_high_add_1: + .octa 0x0001 ddq_add_1: .octa 0x0001 ddq_add_2: @@ -169,7 +173,12 @@ ddq_add_8: .rept (by - 1) club DDQ_DATA, i club XDATA, i - vpaddd var_ddq_add(%rip), xcounter, var_xdata + vpaddq var_ddq_add(%rip), xcounter, var_xdata + vptest ddq_low_msk(%rip), var_xdata + jnz 1f + vpaddq ddq_high_add_1(%rip), var_xdata, var_xdata + vpaddq ddq_high_add_1(%rip), xcounter, xcounter + 1: vpshufb xbyteswap, var_xdata, var_xdata .set i, (i +1) .endr @@ -178,7 +187,11 @@ ddq_add_8: vpxor xkey0, xdata0, xdata0 club DDQ_DATA, by - vpaddd var_ddq_add(%rip), xcounter, xcounter + vpaddq var_ddq_add(%rip), xcounter, xcounter + vptest ddq_low_msk(%rip), xcounter + jnz 1f + vpaddq ddq_high_add_1(%rip), xcounter, xcounter + 1: .set i, 1 .rept (by - 1) -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/3] crypto: aesni - remove unused defines in by8 variant
The defines for xkey3, xkey6 and xkey9 are not used in the code. They're probably left overs from merging the three source files for 128, 192 and 256 bit AES. They can safely be removed. Signed-off-by: Mathias Krause mini...@googlemail.com Cc: Chandramouli Narayanan mo...@linux.intel.com --- arch/x86/crypto/aes_ctrby8_avx-x86_64.S |3 --- 1 file changed, 3 deletions(-) diff --git a/arch/x86/crypto/aes_ctrby8_avx-x86_64.S b/arch/x86/crypto/aes_ctrby8_avx-x86_64.S index a029bc744244..2df2a0298f5a 100644 --- a/arch/x86/crypto/aes_ctrby8_avx-x86_64.S +++ b/arch/x86/crypto/aes_ctrby8_avx-x86_64.S @@ -79,9 +79,6 @@ #define xcounter %xmm8 #define xbyteswap %xmm9 #define xkey0 %xmm10 -#define xkey3 %xmm11 -#define xkey6 %xmm12 -#define xkey9 %xmm13 #define xkey4 %xmm11 #define xkey8 %xmm12 #define xkey12 %xmm13 -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: v3.17-rc5: alg: skcipher: Test 4 failed on encryption for ctr-aes-aesni
On 25 September 2014 00:23, chandramouli narayanan mo...@linux.intel.com wrote: On Mon, 2014-09-22 at 00:28 +0200, Mathias Krause wrote: What might be worth noting, the failing test uses an IV value of \xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFD, or, in other words, a counter value that'll wrap after processing three blocks. The Crypto++ implementation explicitly states, it can handle the wrap around (see [1]). Dumping the IV before and after the cryptomgr tests shows, the by8 implementation only handles the lower 32 bit as a counter. Looking at RFC 3686, it defines the counter block as a 128 bit combination of nonce, IV and a 32 bit counter value. It also defines the initial value of the counter part (1) and how it should be incremented (increment the whole counter block, i.e. the 128 bit value). However, it also states that the maximum number blocks per packet are (2^32)-1. So, according to the RFC, the wrap around cannot happen -- not even for the 32 bit part of the counter block. However the other aesni-backed implementation does handle the wrap around just fine. It does so by doing the increment on a integer register so it can use the carry flag to detect the wrap around. Changing the by8 variant would be straight forward, but I fear performance implications... :/ It will take some time for me to debug this issue. Is there a list of test vectors to debug with? The failing test is aes_ctr_enc_tv_template[3] from crypto/testmgr.h. It fails because of a wrong handling of counter overflows. I'm already working on a patch that fixes the counter overflow issue. It passes the cryptomgr test but I like to do some more thorough tests. Especially some performance measurements as we now have branches in the hot path. I don't know if we should still rush fix this for v3.17 or delay this for the next merge window. The offending code was already disabled in Linus' tree and the fixes would be small enough to be backported if there is interest. Regards, Mathias thanks -mouli -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] crypto: aesni - disable by8 AVX CTR optimization
The by8 implementation introduced in commit 22cddcc7df8f (crypto: aes - AES CTR x86_64 by8 AVX optimization) is failing crypto tests as it handles counter block overflows differently. It only accounts the right most 32 bit as a counter -- not the whole block as all other implementations do. This makes it fail the cryptomgr test #4 that specifically tests this corner case. As we're quite late in the release cycle, just disable the by8 variant for now. Reported-by: Romain Francoise rom...@orebokech.com Signed-off-by: Mathias Krause mini...@googlemail.com Cc: Chandramouli Narayanan mo...@linux.intel.com --- I'll try to create a real fix next week but I can't promise much. If Linus releases v3.17 early, as he has mentioned in his -rc6 announcement, we should hot fix this by just disabling the by8 variant. The real fix would add the necessary counter block handling to the asm code and revert this commit afterwards. Reverting the whole code is not necessary, imho. Would that be okay for you, Herbert? --- arch/x86/crypto/aesni-intel_glue.c |4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/x86/crypto/aesni-intel_glue.c b/arch/x86/crypto/aesni-intel_glue.c index 888950f29fd9..a7ccd57f19e4 100644 --- a/arch/x86/crypto/aesni-intel_glue.c +++ b/arch/x86/crypto/aesni-intel_glue.c @@ -481,7 +481,7 @@ static void ctr_crypt_final(struct crypto_aes_ctx *ctx, crypto_inc(ctrblk, AES_BLOCK_SIZE); } -#ifdef CONFIG_AS_AVX +#if 0 /* temporary disabled due to failing crypto tests */ static void aesni_ctr_enc_avx_tfm(struct crypto_aes_ctx *ctx, u8 *out, const u8 *in, unsigned int len, u8 *iv) { @@ -1522,7 +1522,7 @@ static int __init aesni_init(void) aesni_gcm_dec_tfm = aesni_gcm_dec; } aesni_ctr_enc_tfm = aesni_ctr_enc; -#ifdef CONFIG_AS_AVX +#if 0 /* temporary disabled due to failing crypto tests */ if (cpu_has_avx) { /* optimize performance of ctr mode encryption transform */ aesni_ctr_enc_tfm = aesni_ctr_enc_avx_tfm; -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: v3.17-rc5: alg: skcipher: Test 4 failed on encryption for ctr-aes-aesni
On 17 September 2014 13:29, Herbert Xu herb...@gondor.apana.org.au wrote: On Tue, Sep 16, 2014 at 10:01:07PM +0200, Romain Francoise wrote: On Mon, Sep 15, 2014 at 10:36:46AM +0200, Romain Francoise wrote: I upgraded from v3.16 to v3.17-rc5 and the ctr-aes-aesni encryption test fails, which makes my IPsec tunnels unhappy (see trace below). Before I start bisecting (2cddcc7df8fd3 is probably my first guess), is this already known? Sep 15 08:07:56 silenus kernel: [ 35.137145] alg: skcipher: Test 4 failed on encryption for ctr-aes-aesni [...] Update: reverting 2cddcc7df8 (crypto: aes - AES CTR x86_64 by8 AVX optimization) fixes this. This machine is Sandy Bridge (Core i7-2600), and the problem doesn't seem to occur with the exact same kernel image on Ivy Bridge (Xeon E3-1240v2). Thanks for bisecting. If we can't fix this quickly then we should just revert it for now. [Adding James and Sean as they're stated as contact information] I compared the implementation against the original code from Intel referenced in the source file and found a few differences. But even after removing those, the code still generates the same error. So if Chandramouli does not come up with something, we should revert it, as the reference implementation from Intel is a) either wrong or b) used wrongly in the Linux kernel. What might be worth noting, the failing test uses an IV value of \xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFD, or, in other words, a counter value that'll wrap after processing three blocks. The Crypto++ implementation explicitly states, it can handle the wrap around (see [1]). Dumping the IV before and after the cryptomgr tests shows, the by8 implementation only handles the lower 32 bit as a counter. Looking at RFC 3686, it defines the counter block as a 128 bit combination of nonce, IV and a 32 bit counter value. It also defines the initial value of the counter part (1) and how it should be incremented (increment the whole counter block, i.e. the 128 bit value). However, it also states that the maximum number blocks per packet are (2^32)-1. So, according to the RFC, the wrap around cannot happen -- not even for the 32 bit part of the counter block. However the other aesni-backed implementation does handle the wrap around just fine. It does so by doing the increment on a integer register so it can use the carry flag to detect the wrap around. Changing the by8 variant would be straight forward, but I fear performance implications... :/ Looking back at the test vector, even if it might be inappropriate for IPsec, it is still valid for AES-CTR in the general case. So IMHO the by8 implementation is wrong and should be fixed -- or reverted, for that matter. James, Sean: Have you observed any interoperability problems with the Intel reference implementation for the AVX by8 variant of AES-CTR? Especially, have you tested the code for wrapping counter values? Regards, Mathias [1] http://www.cryptopp.com/wiki/CTR_Mode -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: v3.17-rc5: alg: skcipher: Test 4 failed on encryption for ctr-aes-aesni
On 16 September 2014 22:01, Romain Francoise rom...@orebokech.com wrote: On Mon, Sep 15, 2014 at 10:36:46AM +0200, Romain Francoise wrote: I upgraded from v3.16 to v3.17-rc5 and the ctr-aes-aesni encryption test fails, which makes my IPsec tunnels unhappy (see trace below). Before I start bisecting (2cddcc7df8fd3 is probably my first guess), is this already known? Sep 15 08:07:56 silenus kernel: [ 35.137145] alg: skcipher: Test 4 failed on encryption for ctr-aes-aesni [...] I do not get the above message on a SandyBridge i7-2620M, even though the module makes use of the by8 variant on my system, too: [0.340626] AVX version of gcm_enc/dec engaged. [0.340627] AES CTR mode by8 optimization enabled [0.341273] alg: No test for __gcm-aes-aesni (__driver-gcm-aes-aesni) Update: reverting 2cddcc7df8 (crypto: aes - AES CTR x86_64 by8 AVX optimization) fixes this. This machine is Sandy Bridge (Core i7-2600), and the problem doesn't seem to occur with the exact same kernel image on Ivy Bridge (Xeon E3-1240v2). Can you please provide the full kernel log and /proc/cpuinfo of those machines? It would be interesting to know which variant was used on those machines -- the new by8 or the old one. Thanks, Mathias -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: v3.17-rc5: alg: skcipher: Test 4 failed on encryption for ctr-aes-aesni
On 17 September 2014 22:10, Mathias Krause mini...@googlemail.com wrote: On 16 September 2014 22:01, Romain Francoise rom...@orebokech.com wrote: On Mon, Sep 15, 2014 at 10:36:46AM +0200, Romain Francoise wrote: I upgraded from v3.16 to v3.17-rc5 and the ctr-aes-aesni encryption test fails, which makes my IPsec tunnels unhappy (see trace below). Before I start bisecting (2cddcc7df8fd3 is probably my first guess), is this already known? Sep 15 08:07:56 silenus kernel: [ 35.137145] alg: skcipher: Test 4 failed on encryption for ctr-aes-aesni [...] I do not get the above message on a SandyBridge i7-2620M, even though the module makes use of the by8 variant on my system, too: Never mind! Playing a little with crconf to instantiate 'ctr(aes)' and I can see the failing test now too. Thanks, Mathias -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] x86, crypto: Check if gas supports CRC32
On Wed, Jul 30, 2014 at 06:47:01PM +0200, Borislav Petkov wrote: Building current kernel with some old toolchain (gcc 4.1.2 and gas 2.17) chokes with: arch/x86/crypto/crc32c-pcl-intel-asm_64.S: Assembler messages: arch/x86/crypto/crc32c-pcl-intel-asm_64.S:128: Error: no such instruction: `crc32b %bl,%r8d' arch/x86/crypto/crc32c-pcl-intel-asm_64.S:204: Error: no such instruction: `crc32q -i*8(%rcx),%r8' ... due to the fact that gas doesn't know the CRC32 instruction. Check that before building. Cc: linux-crypto@vger.kernel.org Signed-off-by: Borislav Petkov b...@suse.de --- So this is one possibility to address this. Code already has the ifdeffery around crc_pcl() which is implemented in crc32c-pcl-intel-asm_64.S so we can piggyback on that and not build that file if gas doesn't know CRC32. If no CRC32 support, it still builds fine silently, however it would be better to probably say that due to old toolchain, kernel doesn't include fast CRC32 stuff in crc32c-intel.ko. Hmmm. arch/x86/crypto/Makefile| 8 +++- arch/x86/crypto/crc32c-intel_glue.c | 7 --- scripts/gas-can-do-crc32.sh | 12 3 files changed, 23 insertions(+), 4 deletions(-) create mode 100755 scripts/gas-can-do-crc32.sh diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile index 61d6e281898b..707bf7ecb903 100644 --- a/arch/x86/crypto/Makefile +++ b/arch/x86/crypto/Makefile @@ -83,7 +83,13 @@ ifeq ($(avx2_supported),yes) sha1-ssse3-y += sha1_avx2_x86_64_asm.o endif crc32c-intel-y := crc32c-intel_glue.o -crc32c-intel-$(CONFIG_64BIT) += crc32c-pcl-intel-asm_64.o + +gas_can_crc32 := $(srctree)/scripts/gas-can-do-crc32.sh +CONFIG_GAS_SUPPORTS_CRC32=$(shell $(CONFIG_SHELL) $(gas_can_crc32) $(AS)) + This looks too complicated. We do have as-instr for exactly those kind of tests. And, in fact, looking at arch/x86/Makefile we already have one for crc32: asinstr += $(call as-instr,crc32l %eax$(comma)%eax,-DCONFIG_AS_CRC32=1) So you can just used CONFIG_AS_CRC32 for your tests and drop the shell script. +ifdef CONFIG_64BIT +crc32c-intel-$(CONFIG_GAS_SUPPORTS_CRC32) += crc32c-pcl-intel-asm_64.o +endif s/CONFIG_GAS_SUPPORTS_CRC32/CONFIG_AS_CRC32/ here and for all further uses. crc32-pclmul-y := crc32-pclmul_asm.o crc32-pclmul_glue.o sha256-ssse3-y := sha256-ssse3-asm.o sha256-avx-asm.o sha256-avx2-asm.o sha256_ssse3_glue.o sha512-ssse3-y := sha512-ssse3-asm.o sha512-avx-asm.o sha512-avx2-asm.o sha512_ssse3_glue.o diff --git a/arch/x86/crypto/crc32c-intel_glue.c b/arch/x86/crypto/crc32c-intel_glue.c index 6812ad98355c..4ce8e2db2984 100644 --- a/arch/x86/crypto/crc32c-intel_glue.c +++ b/arch/x86/crypto/crc32c-intel_glue.c @@ -46,7 +46,7 @@ #define REX_PRE #endif -#ifdef CONFIG_X86_64 +#if defined(CONFIG_X86_64) defined(CONFIG_GAS_SUPPORTS_CRC32) /* * use carryless multiply version of crc32c when buffer * size is = 512 (when eager fpu is enabled) or @@ -181,7 +181,7 @@ static int crc32c_intel_cra_init(struct crypto_tfm *tfm) return 0; } -#ifdef CONFIG_X86_64 +#if defined(CONFIG_X86_64) defined(CONFIG_GAS_SUPPORTS_CRC32) static int crc32c_pcl_intel_update(struct shash_desc *desc, const u8 *data, unsigned int len) { @@ -257,7 +257,8 @@ static int __init crc32c_intel_mod_init(void) { if (!x86_match_cpu(crc32c_cpu_id)) return -ENODEV; -#ifdef CONFIG_X86_64 + +#if defined(CONFIG_X86_64) defined(CONFIG_GAS_SUPPORTS_CRC32) if (cpu_has_pclmulqdq) { alg.update = crc32c_pcl_intel_update; alg.finup = crc32c_pcl_intel_finup; diff --git a/scripts/gas-can-do-crc32.sh b/scripts/gas-can-do-crc32.sh new file mode 100755 index ..88ff476bd3cc --- /dev/null +++ b/scripts/gas-can-do-crc32.sh @@ -0,0 +1,12 @@ +#!/bin/sh + +TMP=$(mktemp) + +echo crc32 %rax,%rbx | $* --warn - -o $TMP 2/dev/null +if [ $? -eq 0 ]; then + echo y +else + echo n +fi + +rm -f $TMP -- 2.0.0 Regards, Mathias -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] x86, crypto: Check if gas supports CRC32
On Wed, Jul 30, 2014 at 11:19:37PM +0200, Mathias Krause wrote: On Wed, Jul 30, 2014 at 06:47:01PM +0200, Borislav Petkov wrote: [...] This looks too complicated. We do have as-instr for exactly those kind of tests. And, in fact, looking at arch/x86/Makefile we already have one for crc32: asinstr += $(call as-instr,crc32l %eax$(comma)%eax,-DCONFIG_AS_CRC32=1) So you can just used CONFIG_AS_CRC32 for your tests and drop the shell script. +ifdef CONFIG_64BIT +crc32c-intel-$(CONFIG_GAS_SUPPORTS_CRC32) += crc32c-pcl-intel-asm_64.o +endif s/CONFIG_GAS_SUPPORTS_CRC32/CONFIG_AS_CRC32/ here and for all further uses. Gah, CONFIG_AS_CRC32 gets defined as a preprocessor symbol only so cannot be used in makefiles. So crc32c-pcl-intel-asm_64.S needs a #ifdef CONFIG_AS_CRC32 guard and still be compiled for CONFIG_64BIT, as it is now. It'll be an empty object for older binutils versions not supporting the crc32 instruction. Sorry for the confusion. Regards, Mathias -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] x86, crypto: Check if gas supports CRC32
On 31 July 2014 00:11, Borislav Petkov b...@alien8.de wrote: On Wed, Jul 30, 2014 at 11:28:14PM +0200, Mathias Krause wrote: Gah, CONFIG_AS_CRC32 gets defined as a preprocessor symbol only so cannot be used in makefiles. So crc32c-pcl-intel-asm_64.S needs a #ifdef CONFIG_AS_CRC32 guard and still be compiled for CONFIG_64BIT, as it is now. It'll be an empty object for older binutils versions not supporting the crc32 instruction. Yeah, that makes it all even simpler, thanks! We're still b0rked though: arch/x86/crypto/crct10dif-pcl-asm_64.S: Assembler messages: arch/x86/crypto/crct10dif-pcl-asm_64.S:147: Error: no such instruction: `pclmulqdq $0x0,%xmm10,%xmm0' arch/x86/crypto/crct10dif-pcl-asm_64.S:148: Error: no such instruction: `pclmulqdq $0x11,%xmm10,%xmm8' arch/x86/crypto/crct10dif-pcl-asm_64.S:149: Error: no such instruction: `pclmulqdq $0x0,%xmm10,%xmm1' ... and need checking for more instructions. I'll play with this more tomorrow. You probably can reuse the AVX test for this -- either the CONFIG_AS_AVX preprocessor one or the $(avx_supported) make one, local to arch/x86/crypto/Makefile. Even though the CLMUL feature has not much to with AVX (it has a dedicated CPUID feature bit), support for it in binutils was added together with AVX support, see https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=c0f3af977b0f28a0dc5a620110b8dcf9d8286f84 Regards, Mathias Good night :-) -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 1/1] crypto: AES CTR x86_64 by8 AVX optimization
On 9 June 2014 19:41, chandramouli narayanan mo...@linux.intel.com wrote: [...] @@ -1493,6 +1521,14 @@ static int __init aesni_init(void) aesni_gcm_enc_tfm = aesni_gcm_enc; aesni_gcm_dec_tfm = aesni_gcm_dec; } + aesni_ctr_enc_tfm = aesni_ctr_enc; +#ifdef CONFIG_AS_AVX + if (cpu_has_avx) { + /* optimize performance of ctr mode encryption trasform */ The trasform typo is also still there. :/ + aesni_ctr_enc_tfm = aesni_ctr_enc_avx_tfm; + pr_info(AES CTR mode by8 optimization enabled\n); + } +#endif #endif err = crypto_fpu_init(); Mathias -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v5 1/1] crypto: AES CTR x86_64 by8 AVX optimization
) + +/* + * routine to do AES256 CTR enc/decrypt by8 + * XMM registers are clobbered. + * Saving/restoring must be done at a higher level + * aes_ctr_enc_256_avx_by8(void *in, void *iv, void *keys, void *out, + * unsigned int num_bytes) + */ +ENTRY(aes_ctr_enc_256_avx_by8) + /* call the aes main loop */ + do_aes_ctrmain KEY_256 + +ENDPROC(aes_ctr_enc_256_avx_by8) diff --git a/arch/x86/crypto/aesni-intel_glue.c b/arch/x86/crypto/aesni-intel_glue.c index 948ad0e..888950f 100644 --- a/arch/x86/crypto/aesni-intel_glue.c +++ b/arch/x86/crypto/aesni-intel_glue.c @@ -105,6 +105,9 @@ void crypto_fpu_exit(void); #define AVX_GEN4_OPTSIZE 4096 #ifdef CONFIG_X86_64 + +static void (*aesni_ctr_enc_tfm)(struct crypto_aes_ctx *ctx, u8 *out, + const u8 *in, unsigned int len, u8 *iv); asmlinkage void aesni_ctr_enc(struct crypto_aes_ctx *ctx, u8 *out, const u8 *in, unsigned int len, u8 *iv); @@ -155,6 +158,12 @@ asmlinkage void aesni_gcm_dec(void *ctx, u8 *out, #ifdef CONFIG_AS_AVX +asmlinkage void aes_ctr_enc_128_avx_by8(const u8 *in, u8 *iv, + void *keys, u8 *out, unsigned int num_bytes); +asmlinkage void aes_ctr_enc_192_avx_by8(const u8 *in, u8 *iv, + void *keys, u8 *out, unsigned int num_bytes); +asmlinkage void aes_ctr_enc_256_avx_by8(const u8 *in, u8 *iv, + void *keys, u8 *out, unsigned int num_bytes); /* * asmlinkage void aesni_gcm_precomp_avx_gen2() * gcm_data *my_ctx_data, context data @@ -472,6 +481,25 @@ static void ctr_crypt_final(struct crypto_aes_ctx *ctx, crypto_inc(ctrblk, AES_BLOCK_SIZE); } +#ifdef CONFIG_AS_AVX +static void aesni_ctr_enc_avx_tfm(struct crypto_aes_ctx *ctx, u8 *out, + const u8 *in, unsigned int len, u8 *iv) +{ + /* +* based on key length, override with the by8 version +* of ctr mode encryption/decryption for improved performance +* aes_set_key_common() ensures that key length is one of +* {128,192,256} +*/ + if (ctx-key_length == AES_KEYSIZE_128) + aes_ctr_enc_128_avx_by8(in, iv, (void *)ctx, out, len); + else if (ctx-key_length == AES_KEYSIZE_192) + aes_ctr_enc_192_avx_by8(in, iv, (void *)ctx, out, len); + else + aes_ctr_enc_256_avx_by8(in, iv, (void *)ctx, out, len); +} +#endif + static int ctr_crypt(struct blkcipher_desc *desc, struct scatterlist *dst, struct scatterlist *src, unsigned int nbytes) @@ -486,8 +514,8 @@ static int ctr_crypt(struct blkcipher_desc *desc, kernel_fpu_begin(); while ((nbytes = walk.nbytes) = AES_BLOCK_SIZE) { - aesni_ctr_enc(ctx, walk.dst.virt.addr, walk.src.virt.addr, - nbytes AES_BLOCK_MASK, walk.iv); + aesni_ctr_enc_tfm(ctx, walk.dst.virt.addr, walk.src.virt.addr, + nbytes AES_BLOCK_MASK, walk.iv); nbytes = AES_BLOCK_SIZE - 1; err = blkcipher_walk_done(desc, walk, nbytes); } @@ -1493,6 +1521,14 @@ static int __init aesni_init(void) aesni_gcm_enc_tfm = aesni_gcm_enc; aesni_gcm_dec_tfm = aesni_gcm_dec; } + aesni_ctr_enc_tfm = aesni_ctr_enc; +#ifdef CONFIG_AS_AVX + if (cpu_has_avx) { + /* optimize performance of ctr mode encryption transform */ + aesni_ctr_enc_tfm = aesni_ctr_enc_avx_tfm; + pr_info(AES CTR mode by8 optimization enabled\n); + } +#endif #endif err = crypto_fpu_init(); -- 1.8.2.1 Patch is Reviewed-by: Mathias Krause mini...@googlemail.com Thanks, Chandramouli! Regards, Mathias -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 1/1] crypto: AES CTR x86_64 by8 AVX optimization
, + const u8 *in, unsigned int len, u8 *iv); asmlinkage void aesni_ctr_enc(struct crypto_aes_ctx *ctx, u8 *out, const u8 *in, unsigned int len, u8 *iv); @@ -155,6 +158,12 @@ asmlinkage void aesni_gcm_dec(void *ctx, u8 *out, #ifdef CONFIG_AS_AVX +asmlinkage void aes_ctr_enc_128_avx_by8(const u8 *in, u8 *iv, + void *keys, u8 *out, unsigned int num_bytes); +asmlinkage void aes_ctr_enc_192_avx_by8(const u8 *in, u8 *iv, + void *keys, u8 *out, unsigned int num_bytes); +asmlinkage void aes_ctr_enc_256_avx_by8(const u8 *in, u8 *iv, + void *keys, u8 *out, unsigned int num_bytes); /* * asmlinkage void aesni_gcm_precomp_avx_gen2() * gcm_data *my_ctx_data, context data @@ -472,6 +481,25 @@ static void ctr_crypt_final(struct crypto_aes_ctx *ctx, crypto_inc(ctrblk, AES_BLOCK_SIZE); } +#ifdef CONFIG_AS_AVX +static void aesni_ctr_enc_avx_tfm(struct crypto_aes_ctx *ctx, u8 *out, + const u8 *in, unsigned int len, u8 *iv) +{ + /* + * based on key length, override with the by8 version + * of ctr mode encryption/decryption for improved performance + * aes_set_key_common() ensures that key length is one of + * {128,192,256} + */ + if (ctx-key_length == AES_KEYSIZE_128) + aes_ctr_enc_128_avx_by8(in, iv, (void *)ctx, out, len); + else if (ctx-key_length == AES_KEYSIZE_192) + aes_ctr_enc_192_avx_by8(in, iv, (void *)ctx, out, len); + else if (ctx-key_length == AES_KEYSIZE_256) You can make that last one an unconditional else, because it *has* to be that key size. Otherwise someone managed to tamper with the crypto context and what is worse than unsing the wrong key size for the encryption algorithm is doing no encryption at all. + aes_ctr_enc_256_avx_by8(in, iv, (void *)ctx, out, len); +} +#endif + static int ctr_crypt(struct blkcipher_desc *desc, struct scatterlist *dst, struct scatterlist *src, unsigned int nbytes) @@ -486,8 +514,8 @@ static int ctr_crypt(struct blkcipher_desc *desc, kernel_fpu_begin(); while ((nbytes = walk.nbytes) = AES_BLOCK_SIZE) { - aesni_ctr_enc(ctx, walk.dst.virt.addr, walk.src.virt.addr, - nbytes AES_BLOCK_MASK, walk.iv); + aesni_ctr_enc_tfm(ctx, walk.dst.virt.addr, walk.src.virt.addr, + nbytes AES_BLOCK_MASK, walk.iv); nbytes = AES_BLOCK_SIZE - 1; err = blkcipher_walk_done(desc, walk, nbytes); } @@ -1493,6 +1521,14 @@ static int __init aesni_init(void) aesni_gcm_enc_tfm = aesni_gcm_enc; aesni_gcm_dec_tfm = aesni_gcm_dec; } + aesni_ctr_enc_tfm = aesni_ctr_enc; +#ifdef CONFIG_AS_AVX + if (cpu_has_aes cpu_has_avx) { The cpu_has_aes test is still redundant. + /* optimize performance of ctr mode encryption trasform */ + aesni_ctr_enc_tfm = aesni_ctr_enc_avx_tfm; + pr_info(AES CTR mode by8 optimization enabled\n); + } +#endif #endif err = crypto_fpu_init(); -- 1.8.2.1 With the above nitpicks handled: Reviewed-by: Mathias Krause mini...@googlemail.com Thanks, Mathias -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 1/1] crypto: AES CTR x86_64 by8 AVX optimization
On Tue, Jun 03, 2014 at 05:41:14PM -0700, chandramouli narayanan wrote: This patch introduces by8 AES CTR mode AVX optimization inspired by Intel Optimized IPSEC Cryptograhpic library. For additional information, please see: http://downloadcenter.intel.com/Detail_Desc.aspx?agr=YDwnldID=22972 The functions aes_ctr_enc_128_avx_by8(), aes_ctr_enc_192_avx_by8() and aes_ctr_enc_256_avx_by8() are adapted from Intel Optimized IPSEC Cryptographic library. When both AES and AVX features are enabled in a platform, the glue code in AESNI module overrieds the existing by4 CTR mode en/decryption with the by8 AES CTR mode en/decryption. On a Haswell desktop, with turbo disabled and all cpus running at maximum frequency, the by8 CTR mode optimization shows better performance results across data key sizes as measured by tcrypt. The average performance improvement of the by8 version over the by4 version is as follows: For 128 bit key and data sizes = 256 bytes, there is a 10-16% improvement. For 192 bit key and data sizes = 256 bytes, there is a 20-22% improvement. For 256 bit key and data sizes = 256 bytes, there is a 20-25% improvement. Nice improvement :) How does it perform on older processors that do have a penalty for unaligned loads (vmovdqu), e.g. SandyBridge? If those perform worse it might be wise to extend the CPU feature test in the glue code by a model test to enable the by8 variant only for Haswell and newer processors that don't have such a penalty. A typical run of tcrypt with AES CTR mode encryption of the by4 and by8 optimization shows the following results: tcrypt with by4 AES CTR mode encryption optimization on a Haswell Desktop: --- testing speed of __ctr-aes-aesni encryption test 0 (128 bit key, 16 byte blocks): 1 operation in 343 cycles (16 bytes) test 1 (128 bit key, 64 byte blocks): 1 operation in 336 cycles (64 bytes) test 2 (128 bit key, 256 byte blocks): 1 operation in 491 cycles (256 bytes) test 3 (128 bit key, 1024 byte blocks): 1 operation in 1130 cycles (1024 bytes) test 4 (128 bit key, 8192 byte blocks): 1 operation in 7309 cycles (8192 bytes) test 5 (192 bit key, 16 byte blocks): 1 operation in 346 cycles (16 bytes) test 6 (192 bit key, 64 byte blocks): 1 operation in 361 cycles (64 bytes) test 7 (192 bit key, 256 byte blocks): 1 operation in 543 cycles (256 bytes) test 8 (192 bit key, 1024 byte blocks): 1 operation in 1321 cycles (1024 bytes) test 9 (192 bit key, 8192 byte blocks): 1 operation in 9649 cycles (8192 bytes) test 10 (256 bit key, 16 byte blocks): 1 operation in 369 cycles (16 bytes) test 11 (256 bit key, 64 byte blocks): 1 operation in 366 cycles (64 bytes) test 12 (256 bit key, 256 byte blocks): 1 operation in 595 cycles (256 bytes) test 13 (256 bit key, 1024 byte blocks): 1 operation in 1531 cycles (1024 bytes) test 14 (256 bit key, 8192 byte blocks): 1 operation in 10522 cycles (8192 bytes) testing speed of __ctr-aes-aesni decryption test 0 (128 bit key, 16 byte blocks): 1 operation in 336 cycles (16 bytes) test 1 (128 bit key, 64 byte blocks): 1 operation in 350 cycles (64 bytes) test 2 (128 bit key, 256 byte blocks): 1 operation in 487 cycles (256 bytes) test 3 (128 bit key, 1024 byte blocks): 1 operation in 1129 cycles (1024 bytes) test 4 (128 bit key, 8192 byte blocks): 1 operation in 7287 cycles (8192 bytes) test 5 (192 bit key, 16 byte blocks): 1 operation in 350 cycles (16 bytes) test 6 (192 bit key, 64 byte blocks): 1 operation in 359 cycles (64 bytes) test 7 (192 bit key, 256 byte blocks): 1 operation in 635 cycles (256 bytes) test 8 (192 bit key, 1024 byte blocks): 1 operation in 1324 cycles (1024 bytes) test 9 (192 bit key, 8192 byte blocks): 1 operation in 9595 cycles (8192 bytes) test 10 (256 bit key, 16 byte blocks): 1 operation in 364 cycles (16 bytes) test 11 (256 bit key, 64 byte blocks): 1 operation in 377 cycles (64 bytes) test 12 (256 bit key, 256 byte blocks): 1 operation in 604 cycles (256 bytes) test 13 (256 bit key, 1024 byte blocks): 1 operation in 1527 cycles (1024 bytes) test 14 (256 bit key, 8192 byte blocks): 1 operation in 10549 cycles (8192 bytes) tcrypt with by8 AES CTR mode encryption optimization on a Haswell Desktop: --- testing speed of __ctr-aes-aesni encryption test 0 (128 bit key, 16 byte blocks): 1 operation in 340 cycles (16 bytes) test 1 (128 bit key, 64 byte blocks): 1 operation in 330 cycles (64 bytes) test 2 (128 bit key, 256 byte blocks): 1 operation in 450 cycles (256 bytes) test 3 (128 bit key, 1024 byte blocks): 1 operation in 1043 cycles (1024 bytes) test 4 (128 bit key, 8192 byte blocks): 1 operation in 6597 cycles (8192 bytes) test 5 (192 bit key, 16 byte blocks): 1 operation in 339 cycles (16 bytes) test 6 (192 bit key, 64 byte blocks): 1 operation in 352 cycles
Re: [PATCH 0/3] crypto: x86/sha1 - regression and other fixes
On 24 March 2014 18:29, chandramouli narayanan mo...@linux.intel.com wrote: On Mon, 2014-03-24 at 17:10 +0100, Mathias Krause wrote: The recent addition of the AVX2 variant of the SHA1 hash function wrongly disabled the AVX variant by introducing a flaw in the feature test. Fixed in patch 1. The alignment calculations of the AVX2 assembler implementation are questionable, too. Especially the page alignment of the stack pointer is broken in multiple ways. Fixed in patch 2. In patch 3 another issue for code alignment is fixed. Please apply! Mathias Krause (3): crypto: x86/sha1 - re-enable the AVX variant crypto: x86/sha1 - fix stack alignment of AVX2 variant crypto: x86/sha1 - reduce size of the AVX2 asm implementation arch/x86/crypto/sha1_avx2_x86_64_asm.S |8 ++-- arch/x86/crypto/sha1_ssse3_glue.c | 26 -- 2 files changed, 18 insertions(+), 16 deletions(-) Your fixes are the right on mark. I went through your patches and tested them and found to be correct. Thanks for double-checking! Sorry for causing regression and missing alignment issues in the patches I submitted. No problem with that. But as I'm not subscribed to the linux-crypto mailing list I haven't seen your earlier submissions. Otherwise I would have objected earlier. ;) Thanks, Mathias -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/3] crypto: x86/sha1 - re-enable the AVX variant
Commit 7c1da8d0d0 crypto: sha - SHA1 transform x86_64 AVX2 accidentally disabled the AVX variant by making the avx_usable() test not only fail in case the CPU doesn't support AVX or OSXSAVE but also if it doesn't support AVX2. Fix that regression by splitting up the AVX/AVX2 test into two functions. Also test for the BMI1 extension in the avx2_usable() test as the AVX2 implementation not only makes use of BMI2 but also BMI1 instructions. Cc: Chandramouli Narayanan mo...@linux.intel.com Cc: H. Peter Anvin h...@zytor.com Cc: Marek Vasut ma...@denx.de Signed-off-by: Mathias Krause mini...@googlemail.com --- arch/x86/crypto/sha1_ssse3_glue.c | 26 -- 1 file changed, 16 insertions(+), 10 deletions(-) diff --git a/arch/x86/crypto/sha1_ssse3_glue.c b/arch/x86/crypto/sha1_ssse3_glue.c index 139a55c04d..74d16ef707 100644 --- a/arch/x86/crypto/sha1_ssse3_glue.c +++ b/arch/x86/crypto/sha1_ssse3_glue.c @@ -208,11 +208,7 @@ static bool __init avx_usable(void) { u64 xcr0; -#if defined(CONFIG_AS_AVX2) - if (!cpu_has_avx || !cpu_has_avx2 || !cpu_has_osxsave) -#else if (!cpu_has_avx || !cpu_has_osxsave) -#endif return false; xcr0 = xgetbv(XCR_XFEATURE_ENABLED_MASK); @@ -224,11 +220,23 @@ static bool __init avx_usable(void) return true; } + +#ifdef CONFIG_AS_AVX2 +static bool __init avx2_usable(void) +{ + if (avx_usable() cpu_has_avx2 boot_cpu_has(X86_FEATURE_BMI1) + boot_cpu_has(X86_FEATURE_BMI2)) + return true; + + return false; +} +#endif #endif static int __init sha1_ssse3_mod_init(void) { char *algo_name; + /* test for SSSE3 first */ if (cpu_has_ssse3) { sha1_transform_asm = sha1_transform_ssse3; @@ -238,13 +246,11 @@ static int __init sha1_ssse3_mod_init(void) #ifdef CONFIG_AS_AVX /* allow AVX to override SSSE3, it's a little faster */ if (avx_usable()) { - if (cpu_has_avx) { - sha1_transform_asm = sha1_transform_avx; - algo_name = AVX; - } + sha1_transform_asm = sha1_transform_avx; + algo_name = AVX; #ifdef CONFIG_AS_AVX2 - if (cpu_has_avx2 boot_cpu_has(X86_FEATURE_BMI2)) { - /* allow AVX2 to override AVX, it's a little faster */ + /* allow AVX2 to override AVX, it's a little faster */ + if (avx2_usable()) { sha1_transform_asm = sha1_apply_transform_avx2; algo_name = AVX2; } -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/3] crypto: x86/sha1 - fix stack alignment of AVX2 variant
The AVX2 implementation might waste up to a page of stack memory because of a wrong alignment calculation. This will, in the worst case, increase the stack usage of sha1_transform_avx2() alone to 5.4 kB -- way to big for a kernel function. Even worse, it might also allocate *less* bytes than needed if the stack pointer is already aligned bacause in that case the 'sub %rbx, %rsp' is effectively moving the stack pointer upwards, not downwards. Fix those issues by changing and simplifying the alignment calculation to use a 32 byte alignment, the alignment really needed. Cc: Chandramouli Narayanan mo...@linux.intel.com Cc: H. Peter Anvin h...@zytor.com Cc: Marek Vasut ma...@denx.de Signed-off-by: Mathias Krause mini...@googlemail.com --- arch/x86/crypto/sha1_avx2_x86_64_asm.S |7 ++- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/arch/x86/crypto/sha1_avx2_x86_64_asm.S b/arch/x86/crypto/sha1_avx2_x86_64_asm.S index 4f348544d1..bacac22b20 100644 --- a/arch/x86/crypto/sha1_avx2_x86_64_asm.S +++ b/arch/x86/crypto/sha1_avx2_x86_64_asm.S @@ -636,9 +636,7 @@ _loop3: /* Align stack */ mov %rsp, %rbx - and $(0x1000-1), %rbx - sub $(8+32), %rbx - sub %rbx, %rsp + and $~(0x20-1), %rsp push%rbx sub $RESERVE_STACK, %rsp @@ -665,8 +663,7 @@ _loop3: avx2_zeroupper add $RESERVE_STACK, %rsp - pop %rbx - add %rbx, %rsp + pop %rsp pop %r15 pop %r14 -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/3] crypto: x86/sha1 - reduce size of the AVX2 asm implementation
There is really no need to page align sha1_transform_avx2. The default alignment is just fine. This is not the hot code but only the entry point, after all. Cc: Chandramouli Narayanan mo...@linux.intel.com Cc: H. Peter Anvin h...@zytor.com Cc: Marek Vasut ma...@denx.de Signed-off-by: Mathias Krause mini...@googlemail.com --- arch/x86/crypto/sha1_avx2_x86_64_asm.S |1 - 1 file changed, 1 deletion(-) diff --git a/arch/x86/crypto/sha1_avx2_x86_64_asm.S b/arch/x86/crypto/sha1_avx2_x86_64_asm.S index bacac22b20..1cd792db15 100644 --- a/arch/x86/crypto/sha1_avx2_x86_64_asm.S +++ b/arch/x86/crypto/sha1_avx2_x86_64_asm.S @@ -623,7 +623,6 @@ _loop3: */ .macro SHA1_VECTOR_ASM name ENTRY(\name) - .align 4096 push%rbx push%rbp -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/3] crypto: x86/sha1 - regression and other fixes
The recent addition of the AVX2 variant of the SHA1 hash function wrongly disabled the AVX variant by introducing a flaw in the feature test. Fixed in patch 1. The alignment calculations of the AVX2 assembler implementation are questionable, too. Especially the page alignment of the stack pointer is broken in multiple ways. Fixed in patch 2. In patch 3 another issue for code alignment is fixed. Please apply! Mathias Krause (3): crypto: x86/sha1 - re-enable the AVX variant crypto: x86/sha1 - fix stack alignment of AVX2 variant crypto: x86/sha1 - reduce size of the AVX2 asm implementation arch/x86/crypto/sha1_avx2_x86_64_asm.S |8 ++-- arch/x86/crypto/sha1_ssse3_glue.c | 26 -- 2 files changed, 18 insertions(+), 16 deletions(-) -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] padata: Fix wrong usage of rcu_dereference()
A kernel with enabled lockdep complains about the wrong usage of rcu_dereference() under a rcu_read_lock_bh() protected region. === [ INFO: suspicious RCU usage. ] 3.13.0-rc1+ #126 Not tainted --- linux/kernel/padata.c:115 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 1, debug_locks = 1 1 lock held by cryptomgr_test/153: #0: (rcu_read_lock_bh){.+}, at: [8115c235] padata_do_parallel+0x5/0x270 Fix that by using rcu_dereference_bh() instead. Signed-off-by: Mathias Krause mini...@googlemail.com Cc: Steffen Klassert steffen.klass...@secunet.com --- kernel/padata.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/padata.c b/kernel/padata.c index 2abd25d79c..161402f0b5 100644 --- a/kernel/padata.c +++ b/kernel/padata.c @@ -112,7 +112,7 @@ int padata_do_parallel(struct padata_instance *pinst, rcu_read_lock_bh(); - pd = rcu_dereference(pinst-pd); + pd = rcu_dereference_bh(pinst-pd); err = -EINVAL; if (!(pinst-flags PADATA_INIT) || pinst-flags PADATA_INVALID) -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] crypto: pcrypt - Fix wrong usage of rcu_dereference()
A kernel with enabled lockdep complains about the wrong usage of rcu_dereference() under a rcu_read_lock_bh() protected region. === [ INFO: suspicious RCU usage. ] 3.13.0-rc1+ #126 Not tainted --- linux/crypto/pcrypt.c:81 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 1, debug_locks = 1 1 lock held by cryptomgr_test/153: #0: (rcu_read_lock_bh){.+}, at: [812c8075] pcrypt_do_parallel.isra.2+0x5/0x200 Fix that by using rcu_dereference_bh() instead. Signed-off-by: Mathias Krause mini...@googlemail.com Cc: Steffen Klassert steffen.klass...@secunet.com Cc: Herbert Xu herb...@gondor.apana.org.au Cc: David S. Miller da...@davemloft.net --- crypto/pcrypt.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/crypto/pcrypt.c b/crypto/pcrypt.c index f8c920cafe..309d345ead 100644 --- a/crypto/pcrypt.c +++ b/crypto/pcrypt.c @@ -78,7 +78,7 @@ static int pcrypt_do_parallel(struct padata_priv *padata, unsigned int *cb_cpu, cpu = *cb_cpu; rcu_read_lock_bh(); - cpumask = rcu_dereference(pcrypt-cb_cpumask); + cpumask = rcu_dereference_bh(pcrypt-cb_cpumask); if (cpumask_test_cpu(cpu, cpumask-mask)) goto out; -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/2] pcrypt/padata rcu fixes
Two small RCU related fixes, lockdep complained about. Please apply! Mathias Krause (2): crypto: pcrypt - Fix wrong usage of rcu_dereference() padata: Fix wrong usage of rcu_dereference() crypto/pcrypt.c |2 +- kernel/padata.c |2 +- 2 files changed, 2 insertions(+), 2 deletions(-) -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] padata: make the sequence counter an atomic_t
On 08.10.2013 14:08, Steffen Klassert wrote: On Wed, Oct 02, 2013 at 03:40:45PM +0200, Mathias Krause wrote: Using a spinlock to atomically increase a counter sounds wrong -- we've atomic_t for this! Also move 'seq_nr' to a different cache line than 'lock' to reduce cache line trashing. This has the nice side effect of decreasing the size of struct parallel_data from 192 to 128 bytes for a x86-64 build, e.g. occupying only two instead of three cache lines. Those changes results in a 5% performance increase on an IPsec test run using pcrypt. Btw. the seq_lock spinlock was never explicitly initialized -- one more reason to get rid of it. Signed-off-by: Mathias Krause mathias.kra...@secunet.com Acked-by: Steffen Klassert steffen.klass...@secunet.com Herbert can you take this one? Ping, Herbert? Anything wrong with the patch? -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] padata: make the sequence counter an atomic_t
On 25.10.2013 11:26, Herbert Xu wrote: On Fri, Oct 25, 2013 at 10:20:48AM +0200, Mathias Krause wrote: On 08.10.2013 14:08, Steffen Klassert wrote: On Wed, Oct 02, 2013 at 03:40:45PM +0200, Mathias Krause wrote: Using a spinlock to atomically increase a counter sounds wrong -- we've atomic_t for this! Also move 'seq_nr' to a different cache line than 'lock' to reduce cache line trashing. This has the nice side effect of decreasing the size of struct parallel_data from 192 to 128 bytes for a x86-64 build, e.g. occupying only two instead of three cache lines. Those changes results in a 5% performance increase on an IPsec test run using pcrypt. Btw. the seq_lock spinlock was never explicitly initialized -- one more reason to get rid of it. Signed-off-by: Mathias Krause mathias.kra...@secunet.com Acked-by: Steffen Klassert steffen.klass...@secunet.com Herbert can you take this one? Ping, Herbert? Anything wrong with the patch? Sorry I don't seem to have this patch in my mail box. Can you resend it please? I send it to linux-crypto and Steffen only. Will resend it directed to you, now. Thanks! -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RESEND] padata: make the sequence counter an atomic_t
Using a spinlock to atomically increase a counter sounds wrong -- we've atomic_t for this! Also move 'seq_nr' to a different cache line than 'lock' to reduce cache line trashing. This has the nice side effect of decreasing the size of struct parallel_data from 192 to 128 bytes for a x86-64 build, e.g. occupying only two instead of three cache lines. Those changes results in a 5% performance increase on an IPsec test run using pcrypt. Btw. the seq_lock spinlock was never explicitly initialized -- one more reason to get rid of it. Signed-off-by: Mathias Krause mathias.kra...@secunet.com Acked-by: Steffen Klassert steffen.klass...@secunet.com --- include/linux/padata.h |3 +-- kernel/padata.c|9 - 2 files changed, 5 insertions(+), 7 deletions(-) diff --git a/include/linux/padata.h b/include/linux/padata.h index 86292be..4386946 100644 --- a/include/linux/padata.h +++ b/include/linux/padata.h @@ -129,10 +129,9 @@ struct parallel_data { struct padata_serial_queue __percpu *squeue; atomic_treorder_objects; atomic_trefcnt; + atomic_tseq_nr; struct padata_cpumask cpumask; spinlock_t lock cacheline_aligned; - spinlock_t seq_lock; - unsigned intseq_nr; unsigned intprocessed; struct timer_list timer; }; diff --git a/kernel/padata.c b/kernel/padata.c index 07af2c9..2abd25d 100644 --- a/kernel/padata.c +++ b/kernel/padata.c @@ -46,6 +46,7 @@ static int padata_index_to_cpu(struct parallel_data *pd, int cpu_index) static int padata_cpu_hash(struct parallel_data *pd) { + unsigned int seq_nr; int cpu_index; /* @@ -53,10 +54,8 @@ static int padata_cpu_hash(struct parallel_data *pd) * seq_nr mod. number of cpus in use. */ - spin_lock(pd-seq_lock); - cpu_index = pd-seq_nr % cpumask_weight(pd-cpumask.pcpu); - pd-seq_nr++; - spin_unlock(pd-seq_lock); + seq_nr = atomic_inc_return(pd-seq_nr); + cpu_index = seq_nr % cpumask_weight(pd-cpumask.pcpu); return padata_index_to_cpu(pd, cpu_index); } @@ -429,7 +428,7 @@ static struct parallel_data *padata_alloc_pd(struct padata_instance *pinst, padata_init_pqueues(pd); padata_init_squeues(pd); setup_timer(pd-timer, padata_reorder_timer, (unsigned long)pd); - pd-seq_nr = 0; + atomic_set(pd-seq_nr, -1); atomic_set(pd-reorder_objects, 0); atomic_set(pd-refcnt, 0); pd-pinst = pinst; -- 1.7.2.5 -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/5] crypto: authenc - Export key parsing helper function
AEAD key parsing is duplicated to multiple places in the kernel. Add a common helper function to consolidate that functionality. Cc: Herbert Xu herb...@gondor.apana.org.au Cc: David S. Miller da...@davemloft.net Signed-off-by: Mathias Krause mathias.kra...@secunet.com --- crypto/authenc.c | 48 - include/crypto/authenc.h | 12 ++- 2 files changed, 41 insertions(+), 19 deletions(-) diff --git a/crypto/authenc.c b/crypto/authenc.c index ffce19d..0fda5ba 100644 --- a/crypto/authenc.c +++ b/crypto/authenc.c @@ -52,40 +52,52 @@ static void authenc_request_complete(struct aead_request *req, int err) aead_request_complete(req, err); } -static int crypto_authenc_setkey(struct crypto_aead *authenc, const u8 *key, -unsigned int keylen) +int crypto_authenc_extractkeys(struct crypto_authenc_keys *keys, const u8 *key, + unsigned int keylen) { - unsigned int authkeylen; - unsigned int enckeylen; - struct crypto_authenc_ctx *ctx = crypto_aead_ctx(authenc); - struct crypto_ahash *auth = ctx-auth; - struct crypto_ablkcipher *enc = ctx-enc; - struct rtattr *rta = (void *)key; + struct rtattr *rta = (struct rtattr *)key; struct crypto_authenc_key_param *param; - int err = -EINVAL; if (!RTA_OK(rta, keylen)) - goto badkey; + return -EINVAL; if (rta-rta_type != CRYPTO_AUTHENC_KEYA_PARAM) - goto badkey; + return -EINVAL; if (RTA_PAYLOAD(rta) sizeof(*param)) - goto badkey; + return -EINVAL; param = RTA_DATA(rta); - enckeylen = be32_to_cpu(param-enckeylen); + keys-enckeylen = be32_to_cpu(param-enckeylen); key += RTA_ALIGN(rta-rta_len); keylen -= RTA_ALIGN(rta-rta_len); - if (keylen enckeylen) - goto badkey; + if (keylen keys-enckeylen) + return -EINVAL; - authkeylen = keylen - enckeylen; + keys-authkeylen = keylen - keys-enckeylen; + keys-authkey = key; + keys-enckey = key + keys-authkeylen; + + return 0; +} +EXPORT_SYMBOL_GPL(crypto_authenc_extractkeys); + +static int crypto_authenc_setkey(struct crypto_aead *authenc, const u8 *key, +unsigned int keylen) +{ + struct crypto_authenc_ctx *ctx = crypto_aead_ctx(authenc); + struct crypto_ahash *auth = ctx-auth; + struct crypto_ablkcipher *enc = ctx-enc; + struct crypto_authenc_keys keys; + int err = -EINVAL; + + if (crypto_authenc_extractkeys(keys, key, keylen) != 0) + goto badkey; crypto_ahash_clear_flags(auth, CRYPTO_TFM_REQ_MASK); crypto_ahash_set_flags(auth, crypto_aead_get_flags(authenc) CRYPTO_TFM_REQ_MASK); - err = crypto_ahash_setkey(auth, key, authkeylen); + err = crypto_ahash_setkey(auth, keys.authkey, keys.authkeylen); crypto_aead_set_flags(authenc, crypto_ahash_get_flags(auth) CRYPTO_TFM_RES_MASK); @@ -95,7 +107,7 @@ static int crypto_authenc_setkey(struct crypto_aead *authenc, const u8 *key, crypto_ablkcipher_clear_flags(enc, CRYPTO_TFM_REQ_MASK); crypto_ablkcipher_set_flags(enc, crypto_aead_get_flags(authenc) CRYPTO_TFM_REQ_MASK); - err = crypto_ablkcipher_setkey(enc, key + authkeylen, enckeylen); + err = crypto_ablkcipher_setkey(enc, keys.enckey, keys.enckeylen); crypto_aead_set_flags(authenc, crypto_ablkcipher_get_flags(enc) CRYPTO_TFM_RES_MASK); diff --git a/include/crypto/authenc.h b/include/crypto/authenc.h index e47b044..6775059 100644 --- a/include/crypto/authenc.h +++ b/include/crypto/authenc.h @@ -23,5 +23,15 @@ struct crypto_authenc_key_param { __be32 enckeylen; }; -#endif /* _CRYPTO_AUTHENC_H */ +struct crypto_authenc_keys { + const u8 *authkey; + const u8 *enckey; + + unsigned int authkeylen; + unsigned int enckeylen; +}; +int crypto_authenc_extractkeys(struct crypto_authenc_keys *keys, const u8 *key, + unsigned int keylen); + +#endif /* _CRYPTO_AUTHENC_H */ -- 1.7.2.5 -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/5] crypto: picoxcell - Simplify and harden key parsing
Use the common helper function crypto_authenc_extractkeys() for key parsing. Also ensure the auth key won't overflow the hash_ctx buffer. Cc: Jamie Iles ja...@jamieiles.com Cc: Herbert Xu herb...@gondor.apana.org.au Cc: David S. Miller da...@davemloft.net Signed-off-by: Mathias Krause mathias.kra...@secunet.com --- Not tested as I've no such hardware, nor the needed cross compiler! drivers/crypto/picoxcell_crypto.c | 32 1 files changed, 8 insertions(+), 24 deletions(-) diff --git a/drivers/crypto/picoxcell_crypto.c b/drivers/crypto/picoxcell_crypto.c index 888f7f4..a6175ba 100644 --- a/drivers/crypto/picoxcell_crypto.c +++ b/drivers/crypto/picoxcell_crypto.c @@ -495,45 +495,29 @@ static int spacc_aead_setkey(struct crypto_aead *tfm, const u8 *key, { struct spacc_aead_ctx *ctx = crypto_aead_ctx(tfm); struct spacc_alg *alg = to_spacc_alg(tfm-base.__crt_alg); - struct rtattr *rta = (void *)key; - struct crypto_authenc_key_param *param; - unsigned int authkeylen, enckeylen; + struct crypto_authenc_keys keys; int err = -EINVAL; - if (!RTA_OK(rta, keylen)) + if (crypto_authenc_extractkeys(keys, key, keylen) != 0) goto badkey; - if (rta-rta_type != CRYPTO_AUTHENC_KEYA_PARAM) + if (keys.enckeylen AES_MAX_KEY_SIZE) goto badkey; - if (RTA_PAYLOAD(rta) sizeof(*param)) - goto badkey; - - param = RTA_DATA(rta); - enckeylen = be32_to_cpu(param-enckeylen); - - key += RTA_ALIGN(rta-rta_len); - keylen -= RTA_ALIGN(rta-rta_len); - - if (keylen enckeylen) - goto badkey; - - authkeylen = keylen - enckeylen; - - if (enckeylen AES_MAX_KEY_SIZE) + if (keys.authkeylen sizeof(ctx-hash_ctx)) goto badkey; if ((alg-ctrl_default SPACC_CRYPTO_ALG_MASK) == SPA_CTRL_CIPH_ALG_AES) - err = spacc_aead_aes_setkey(tfm, key + authkeylen, enckeylen); + err = spacc_aead_aes_setkey(tfm, keys.enckey, keys.enckeylen); else - err = spacc_aead_des_setkey(tfm, key + authkeylen, enckeylen); + err = spacc_aead_des_setkey(tfm, keys.enckey, keys.enckeylen); if (err) goto badkey; - memcpy(ctx-hash_ctx, key, authkeylen); - ctx-hash_key_len = authkeylen; + memcpy(ctx-hash_ctx, keys.authkey, keys.authkeylen); + ctx-hash_key_len = keys.authkeylen; return 0; -- 1.7.2.5 -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/5] crypto: ixp4xx - Simplify and harden key parsing
Use the common helper function crypto_authenc_extractkeys() for key parsing. Also ensure the keys do fit into the corresponding buffers. Otherwise memory corruption might occur. Cc: Christian Hohnstaedt chohnsta...@innominate.com Cc: Herbert Xu herb...@gondor.apana.org.au Cc: David S. Miller da...@davemloft.net Signed-off-by: Mathias Krause mathias.kra...@secunet.com --- Not tested as I've no such hardware, nor the needed cross compiler! drivers/crypto/ixp4xx_crypto.c | 26 +- 1 files changed, 9 insertions(+), 17 deletions(-) diff --git a/drivers/crypto/ixp4xx_crypto.c b/drivers/crypto/ixp4xx_crypto.c index 21180d6..153f73c 100644 --- a/drivers/crypto/ixp4xx_crypto.c +++ b/drivers/crypto/ixp4xx_crypto.c @@ -1159,32 +1159,24 @@ static int aead_setkey(struct crypto_aead *tfm, const u8 *key, unsigned int keylen) { struct ixp_ctx *ctx = crypto_aead_ctx(tfm); - struct rtattr *rta = (struct rtattr *)key; - struct crypto_authenc_key_param *param; + struct crypto_authenc_keys keys; - if (!RTA_OK(rta, keylen)) - goto badkey; - if (rta-rta_type != CRYPTO_AUTHENC_KEYA_PARAM) - goto badkey; - if (RTA_PAYLOAD(rta) sizeof(*param)) + if (crypto_authenc_extractkeys(keys, key, keylen) != 0) goto badkey; - param = RTA_DATA(rta); - ctx-enckey_len = be32_to_cpu(param-enckeylen); - - key += RTA_ALIGN(rta-rta_len); - keylen -= RTA_ALIGN(rta-rta_len); + if (keys.authkeylen sizeof(ctx-authkey)) + goto badkey; - if (keylen ctx-enckey_len) + if (keys.enckeylen sizeof(ctx-enckey)) goto badkey; - ctx-authkey_len = keylen - ctx-enckey_len; - memcpy(ctx-enckey, key + ctx-authkey_len, ctx-enckey_len); - memcpy(ctx-authkey, key, ctx-authkey_len); + memcpy(ctx-authkey, keys.authkey, keys.authkeylen); + memcpy(ctx-enckey, keys.enckey, keys.enckeylen); + ctx-authkey_len = keys.authkeylen; + ctx-enckey_len = keys.enckeylen; return aead_setup(tfm, crypto_aead_authsize(tfm)); badkey: - ctx-enckey_len = 0; crypto_aead_set_flags(tfm, CRYPTO_TFM_RES_BAD_KEY_LEN); return -EINVAL; } -- 1.7.2.5 -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/5] crypto: authencesn - Simplify key parsing
Use the common helper function crypto_authenc_extractkeys() for key parsing. Cc: Herbert Xu herb...@gondor.apana.org.au Cc: David S. Miller da...@davemloft.net Signed-off-by: Mathias Krause mathias.kra...@secunet.com --- crypto/authencesn.c | 26 -- 1 files changed, 4 insertions(+), 22 deletions(-) diff --git a/crypto/authencesn.c b/crypto/authencesn.c index ab53762..8ed5b47 100644 --- a/crypto/authencesn.c +++ b/crypto/authencesn.c @@ -59,37 +59,19 @@ static void authenc_esn_request_complete(struct aead_request *req, int err) static int crypto_authenc_esn_setkey(struct crypto_aead *authenc_esn, const u8 *key, unsigned int keylen) { - unsigned int authkeylen; - unsigned int enckeylen; struct crypto_authenc_esn_ctx *ctx = crypto_aead_ctx(authenc_esn); struct crypto_ahash *auth = ctx-auth; struct crypto_ablkcipher *enc = ctx-enc; - struct rtattr *rta = (void *)key; - struct crypto_authenc_key_param *param; + struct crypto_authenc_keys keys; int err = -EINVAL; - if (!RTA_OK(rta, keylen)) + if (crypto_authenc_extractkeys(keys, key, keylen) != 0) goto badkey; - if (rta-rta_type != CRYPTO_AUTHENC_KEYA_PARAM) - goto badkey; - if (RTA_PAYLOAD(rta) sizeof(*param)) - goto badkey; - - param = RTA_DATA(rta); - enckeylen = be32_to_cpu(param-enckeylen); - - key += RTA_ALIGN(rta-rta_len); - keylen -= RTA_ALIGN(rta-rta_len); - - if (keylen enckeylen) - goto badkey; - - authkeylen = keylen - enckeylen; crypto_ahash_clear_flags(auth, CRYPTO_TFM_REQ_MASK); crypto_ahash_set_flags(auth, crypto_aead_get_flags(authenc_esn) CRYPTO_TFM_REQ_MASK); - err = crypto_ahash_setkey(auth, key, authkeylen); + err = crypto_ahash_setkey(auth, keys.authkey, keys.authkeylen); crypto_aead_set_flags(authenc_esn, crypto_ahash_get_flags(auth) CRYPTO_TFM_RES_MASK); @@ -99,7 +81,7 @@ static int crypto_authenc_esn_setkey(struct crypto_aead *authenc_esn, const u8 * crypto_ablkcipher_clear_flags(enc, CRYPTO_TFM_REQ_MASK); crypto_ablkcipher_set_flags(enc, crypto_aead_get_flags(authenc_esn) CRYPTO_TFM_REQ_MASK); - err = crypto_ablkcipher_setkey(enc, key + authkeylen, enckeylen); + err = crypto_ablkcipher_setkey(enc, keys.enckey, keys.enckeylen); crypto_aead_set_flags(authenc_esn, crypto_ablkcipher_get_flags(enc) CRYPTO_TFM_RES_MASK); -- 1.7.2.5 -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/5] Authenc key parsing consolidation
This series removes the code duplication of authenc key parsing by introducing a common helper function crypto_authenc_extractkeys() in patch 1. Patches 2 to 5 change all remaining places to use the new helper. Patches 3 and 4 also fix potential memory corruptions by ensuring the supplied keys won't overflow there respective buffers. I was unable to test patches 3 to 5 as I don't have the needed hardware for these devices -- not even a cross compiler for those architectures. In case patches 3 and 4 are enqueued for stable, patch 1 needs to be as well, as it's a prerequisite for those. Please apply! Mathias Krause (5): crypto: authenc - Export key parsing helper function crypto: authencesn - Simplify key parsing crypto: ixp4xx - Simplify and harden key parsing crypto: picoxcell - Simplify and harden key parsing crypto: talitos - Simplify key parsing crypto/authenc.c | 48 +++-- crypto/authencesn.c | 26 +++- drivers/crypto/ixp4xx_crypto.c| 26 +++- drivers/crypto/picoxcell_crypto.c | 32 ++-- drivers/crypto/talitos.c | 35 ++ include/crypto/authenc.h | 12 - 6 files changed, 70 insertions(+), 109 deletions(-) -- 1.7.2.5 -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/5] crypto: talitos - Simplify key parsing
Use the common helper function crypto_authenc_extractkeys() for key parsing. Cc: Kim Phillips kim.phill...@freescale.com Cc: Herbert Xu herb...@gondor.apana.org.au Cc: David S. Miller da...@davemloft.net Signed-off-by: Mathias Krause mathias.kra...@secunet.com --- Not tested as I've no such hardware, nor the needed cross compiler! drivers/crypto/talitos.c | 35 --- 1 files changed, 8 insertions(+), 27 deletions(-) diff --git a/drivers/crypto/talitos.c b/drivers/crypto/talitos.c index 661dc3e..f6f7c68 100644 --- a/drivers/crypto/talitos.c +++ b/drivers/crypto/talitos.c @@ -671,39 +671,20 @@ static int aead_setkey(struct crypto_aead *authenc, const u8 *key, unsigned int keylen) { struct talitos_ctx *ctx = crypto_aead_ctx(authenc); - struct rtattr *rta = (void *)key; - struct crypto_authenc_key_param *param; - unsigned int authkeylen; - unsigned int enckeylen; - - if (!RTA_OK(rta, keylen)) - goto badkey; + struct crypto_authenc_keys keys; - if (rta-rta_type != CRYPTO_AUTHENC_KEYA_PARAM) + if (crypto_authenc_extractkeys(keys, key, keylen) != 0) goto badkey; - if (RTA_PAYLOAD(rta) sizeof(*param)) + if (keys.authkeylen + keys.enckeylen TALITOS_MAX_KEY_SIZE) goto badkey; - param = RTA_DATA(rta); - enckeylen = be32_to_cpu(param-enckeylen); - - key += RTA_ALIGN(rta-rta_len); - keylen -= RTA_ALIGN(rta-rta_len); - - if (keylen enckeylen) - goto badkey; + memcpy(ctx-key, keys.authkey, keys.authkeylen); + memcpy(ctx-key[keys.authkeylen], keys.enckey, keys.enckeylen); - authkeylen = keylen - enckeylen; - - if (keylen TALITOS_MAX_KEY_SIZE) - goto badkey; - - memcpy(ctx-key, key, keylen); - - ctx-keylen = keylen; - ctx-enckeylen = enckeylen; - ctx-authkeylen = authkeylen; + ctx-keylen = keys.authkeylen + keys.enckeylen; + ctx-enckeylen = keys.enckeylen; + ctx-authkeylen = keys.authkeylen; return 0; -- 1.7.2.5 -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] padata: make the sequence counter an atomic_t
Using a spinlock to atomically increase a counter sounds wrong -- we've atomic_t for this! Also move 'seq_nr' to a different cache line than 'lock' to reduce cache line trashing. This has the nice side effect of decreasing the size of struct parallel_data from 192 to 128 bytes for a x86-64 build, e.g. occupying only two instead of three cache lines. Those changes results in a 5% performance increase on an IPsec test run using pcrypt. Btw. the seq_lock spinlock was never explicitly initialized -- one more reason to get rid of it. Signed-off-by: Mathias Krause mathias.kra...@secunet.com --- include/linux/padata.h |3 +-- kernel/padata.c|9 - 2 files changed, 5 insertions(+), 7 deletions(-) diff --git a/include/linux/padata.h b/include/linux/padata.h index 86292be..4386946 100644 --- a/include/linux/padata.h +++ b/include/linux/padata.h @@ -129,10 +129,9 @@ struct parallel_data { struct padata_serial_queue __percpu *squeue; atomic_treorder_objects; atomic_trefcnt; + atomic_tseq_nr; struct padata_cpumask cpumask; spinlock_t lock cacheline_aligned; - spinlock_t seq_lock; - unsigned intseq_nr; unsigned intprocessed; struct timer_list timer; }; diff --git a/kernel/padata.c b/kernel/padata.c index 07af2c9..2abd25d 100644 --- a/kernel/padata.c +++ b/kernel/padata.c @@ -46,6 +46,7 @@ static int padata_index_to_cpu(struct parallel_data *pd, int cpu_index) static int padata_cpu_hash(struct parallel_data *pd) { + unsigned int seq_nr; int cpu_index; /* @@ -53,10 +54,8 @@ static int padata_cpu_hash(struct parallel_data *pd) * seq_nr mod. number of cpus in use. */ - spin_lock(pd-seq_lock); - cpu_index = pd-seq_nr % cpumask_weight(pd-cpumask.pcpu); - pd-seq_nr++; - spin_unlock(pd-seq_lock); + seq_nr = atomic_inc_return(pd-seq_nr); + cpu_index = seq_nr % cpumask_weight(pd-cpumask.pcpu); return padata_index_to_cpu(pd, cpu_index); } @@ -429,7 +428,7 @@ static struct parallel_data *padata_alloc_pd(struct padata_instance *pinst, padata_init_pqueues(pd); padata_init_squeues(pd); setup_timer(pd-timer, padata_reorder_timer, (unsigned long)pd); - pd-seq_nr = 0; + atomic_set(pd-seq_nr, -1); atomic_set(pd-reorder_objects, 0); atomic_set(pd-refcnt, 0); pd-pinst = pinst; -- 1.7.2.5 -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] crypto: algif - suppress sending source address information in recvmsg
On Wed, Apr 10, 2013 at 8:26 AM, Herbert Xu herb...@gondor.apana.org.au wrote: On Wed, Apr 10, 2013 at 08:21:51AM +0200, Mathias Krause wrote: On Wed, Apr 10, 2013 at 5:31 AM, Herbert Xu herb...@gondor.apana.org.au wrote: On Sun, Apr 07, 2013 at 02:05:39PM +0200, Mathias Krause wrote: The current code does not set the msg_namelen member to 0 and therefore makes net/socket.c leak the local sockaddr_storage variable to userland -- 128 bytes of kernel stack memory. Fix that. Signed-off-by: Mathias Krause mini...@googlemail.com Patch applied. Thanks! Thanks, but that patch shouldn't have been applied to cryptodev but crypto instead, and probably queued up for stable as well. I missed the 'Cc: stable # v2.6.38'. My bad. OK, I'll move it across and add the stable Cc. Any specific reason you're not pushing it to Linus for inclusion in v3.9? Mathias -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] crypto: user - constify netlink dispatch table
There is no need to modify the netlink dispatch table at runtime and making it const even makes the resulting object file slightly smaller. Cc: Steffen Klassert steffen.klass...@secunet.com Signed-off-by: Mathias Krause mini...@googlemail.com --- crypto/crypto_user.c |4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/crypto/crypto_user.c b/crypto/crypto_user.c index 35d700a..77f6c4e 100644 --- a/crypto/crypto_user.c +++ b/crypto/crypto_user.c @@ -426,7 +426,7 @@ static const struct nla_policy crypto_policy[CRYPTOCFGA_MAX+1] = { #undef MSGSIZE -static struct crypto_link { +static const struct crypto_link { int (*doit)(struct sk_buff *, struct nlmsghdr *, struct nlattr **); int (*dump)(struct sk_buff *, struct netlink_callback *); int (*done)(struct netlink_callback *); @@ -442,7 +442,7 @@ static struct crypto_link { static int crypto_user_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh) { struct nlattr *attrs[CRYPTOCFGA_MAX+1]; - struct crypto_link *link; + const struct crypto_link *link; int type, err; type = nlh-nlmsg_type; -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] crypto: user - fix empty string test in report API
On Mon, Feb 4, 2013 at 2:15 PM, Herbert Xu herb...@gondor.apana.org.au wrote: On Sun, Feb 03, 2013 at 12:09:01PM +0100, Mathias Krause wrote: The current test for empty strings fails because it is testing the address of a field, not a pointer. So the test will always be true. Test for the string length instead. Signed-off-by: Mathias Krause mini...@googlemail.com Cc: Steffen Klassert steffen.klass...@secunet.com Good catch. However, what if cru_driver_name isn't NUL-terminated? Your objection is totally valid, sure. And my initial idea wouldn't have that problem as it would just test for the first character to be '\0', i.e. do something like that: - if (!p-cru_driver_name) + if (!p-cru_driver_name[0]) But then I looked how the other code in the crypto user API does refer to string lengths related to cru_driver_name and switched to strlen(). So the other code is (potentially) vulnerable to non-NUL-terminated strings, too. So, I think we need another patch that adds sanity checks for non-NUL-terminated strings. I can do this, maybe this evening, and send out a new version of the patch series if you like me to. Regards, Mathias -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] crypto: user - fix empty string test in report API
The current test for empty strings fails because it is testing the address of a field, not a pointer. So the test will always be true. Test for the string length instead. Signed-off-by: Mathias Krause mini...@googlemail.com Cc: Steffen Klassert steffen.klass...@secunet.com --- crypto/crypto_user.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/crypto/crypto_user.c b/crypto/crypto_user.c index f6d9baf..1b9a9a1 100644 --- a/crypto/crypto_user.c +++ b/crypto/crypto_user.c @@ -196,7 +196,7 @@ static int crypto_report(struct sk_buff *in_skb, struct nlmsghdr *in_nlh, struct crypto_dump_info info; int err; - if (!p-cru_driver_name) + if (!strlen(p-cru_driver_name)) return -EINVAL; alg = crypto_alg_match(p, 1); -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] crypto: user - fix info leaks in report API
Three errors resulting in kernel memory disclosure: 1/ The structures used for the netlink based crypto algorithm report API are located on the stack. As snprintf() does not fill the remainder of the buffer with null bytes, those stack bytes will be disclosed to users of the API. Switch to strncpy() to fix this. 2/ crypto_report_one() does not initialize all field of struct crypto_user_alg. Fix this to fix the heap info leak. 3/ For the module name we should copy only as many bytes as module_name() returns -- not as much as the destination buffer could hold. But the current code does not and therefore copies random data from behind the end of the module name, as the module name is always shorter than CRYPTO_MAX_ALG_NAME. Also switch to use strncpy() to copy the algorithm's name and driver_name. They are strings, after all. Signed-off-by: Mathias Krause mini...@googlemail.com Cc: Steffen Klassert steffen.klass...@secunet.com --- crypto/ablkcipher.c | 12 ++-- crypto/aead.c|9 - crypto/ahash.c |2 +- crypto/blkcipher.c |6 +++--- crypto/crypto_user.c | 22 +++--- crypto/pcompress.c |3 +-- crypto/rng.c |2 +- crypto/shash.c |3 ++- 8 files changed, 29 insertions(+), 30 deletions(-) diff --git a/crypto/ablkcipher.c b/crypto/ablkcipher.c index 533de95..7d4a8d2 100644 --- a/crypto/ablkcipher.c +++ b/crypto/ablkcipher.c @@ -388,9 +388,9 @@ static int crypto_ablkcipher_report(struct sk_buff *skb, struct crypto_alg *alg) { struct crypto_report_blkcipher rblkcipher; - snprintf(rblkcipher.type, CRYPTO_MAX_ALG_NAME, %s, ablkcipher); - snprintf(rblkcipher.geniv, CRYPTO_MAX_ALG_NAME, %s, -alg-cra_ablkcipher.geniv ?: default); + strncpy(rblkcipher.type, ablkcipher, sizeof(rblkcipher.type)); + strncpy(rblkcipher.geniv, alg-cra_ablkcipher.geniv ?: default, + sizeof(rblkcipher.geniv)); rblkcipher.blocksize = alg-cra_blocksize; rblkcipher.min_keysize = alg-cra_ablkcipher.min_keysize; @@ -469,9 +469,9 @@ static int crypto_givcipher_report(struct sk_buff *skb, struct crypto_alg *alg) { struct crypto_report_blkcipher rblkcipher; - snprintf(rblkcipher.type, CRYPTO_MAX_ALG_NAME, %s, givcipher); - snprintf(rblkcipher.geniv, CRYPTO_MAX_ALG_NAME, %s, -alg-cra_ablkcipher.geniv ?: built-in); + strncpy(rblkcipher.type, givcipher, sizeof(rblkcipher.type)); + strncpy(rblkcipher.geniv, alg-cra_ablkcipher.geniv ?: built-in, + sizeof(rblkcipher.geniv)); rblkcipher.blocksize = alg-cra_blocksize; rblkcipher.min_keysize = alg-cra_ablkcipher.min_keysize; diff --git a/crypto/aead.c b/crypto/aead.c index 0b8121e..27bc487 100644 --- a/crypto/aead.c +++ b/crypto/aead.c @@ -117,9 +117,8 @@ static int crypto_aead_report(struct sk_buff *skb, struct crypto_alg *alg) struct crypto_report_aead raead; struct aead_alg *aead = alg-cra_aead; - snprintf(raead.type, CRYPTO_MAX_ALG_NAME, %s, aead); - snprintf(raead.geniv, CRYPTO_MAX_ALG_NAME, %s, -aead-geniv ?: built-in); + strncpy(raead.type, aead, sizeof(raead.type)); + strncpy(raead.geniv, aead-geniv ?: built-in, sizeof(raead.geniv)); raead.blocksize = alg-cra_blocksize; raead.maxauthsize = aead-maxauthsize; @@ -203,8 +202,8 @@ static int crypto_nivaead_report(struct sk_buff *skb, struct crypto_alg *alg) struct crypto_report_aead raead; struct aead_alg *aead = alg-cra_aead; - snprintf(raead.type, CRYPTO_MAX_ALG_NAME, %s, nivaead); - snprintf(raead.geniv, CRYPTO_MAX_ALG_NAME, %s, aead-geniv); + strncpy(raead.type, nivaead, sizeof(raead.type)); + strncpy(raead.geniv, aead-geniv, sizeof(raead.geniv)); raead.blocksize = alg-cra_blocksize; raead.maxauthsize = aead-maxauthsize; diff --git a/crypto/ahash.c b/crypto/ahash.c index 3887856..793a27f 100644 --- a/crypto/ahash.c +++ b/crypto/ahash.c @@ -404,7 +404,7 @@ static int crypto_ahash_report(struct sk_buff *skb, struct crypto_alg *alg) { struct crypto_report_hash rhash; - snprintf(rhash.type, CRYPTO_MAX_ALG_NAME, %s, ahash); + strncpy(rhash.type, ahash, sizeof(rhash.type)); rhash.blocksize = alg-cra_blocksize; rhash.digestsize = __crypto_hash_alg_common(alg)-digestsize; diff --git a/crypto/blkcipher.c b/crypto/blkcipher.c index a8d85a1..c44e014 100644 --- a/crypto/blkcipher.c +++ b/crypto/blkcipher.c @@ -499,9 +499,9 @@ static int crypto_blkcipher_report(struct sk_buff *skb, struct crypto_alg *alg) { struct crypto_report_blkcipher rblkcipher; - snprintf(rblkcipher.type, CRYPTO_MAX_ALG_NAME, %s, blkcipher); - snprintf(rblkcipher.geniv, CRYPTO_MAX_ALG_NAME, %s, -alg-cra_blkcipher.geniv ?: default); + strncpy(rblkcipher.type, blkcipher, sizeof(rblkcipher.type)); + strncpy
Re: Linux 3.6-rc5
On Sun, Sep 09, 2012 at 12:19:58PM -0700, Herbert Xu wrote: On Sun, Sep 09, 2012 at 11:13:02AM +0200, Romain Francoise wrote: Still seeing this BUG with -rc5, that I originally reported here: http://marc.info/?l=linux-crypto-vgerm=134653220530264w=2 [ 26.362567] [ cut here ] [ 26.362583] kernel BUG at crypto/scatterwalk.c:37! [ 26.362606] invalid opcode: [#1] SMP Can you try blacklisting/not loading sha1_ssse3 and aesni_intel to see which one of them is causing this crash? Of course if you can still reproduce this without loading either of them that would also be interesting to know. It happens with the C variants of SHA1 and AES, too. You can easily trigger the bug with Steffen's crconf[1]: $ crconf add alg authenc(hmac(sha1-generic),cbc(aes-generic)) type 3 So the problem is likely not related to sha1-ssse3.ko or aesni-intel.ko. Regards, Mathias [1] http://sourceforge.net/projects/crconf/ -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Linux 3.6-rc5
On Sun, Sep 09, 2012 at 02:00:00PM -0700, Herbert Xu wrote: On Sun, Sep 09, 2012 at 10:09:10PM +0200, Mathias Krause wrote: It happens with the C variants of SHA1 and AES, too. You can easily trigger the bug with Steffen's crconf[1]: $ crconf add alg authenc(hmac(sha1-generic),cbc(aes-generic)) type 3 So the problem is likely not related to sha1-ssse3.ko or aesni-intel.ko. Thanks! I think this patch should fix the problem. Can someone please confirm this? crypto: authenc - Fix crash with zero-length assoc data The authenc code doesn't deal with zero-length associated data correctly and ends up constructing a zero-length sg entry which causes a crash when it's fed into the crypto system. This patch fixes this by avoiding the code-path that triggers the SG construction if we have no associated data. This isn't the most optimal fix as it means that we'll end up using the fallback code-path even when we could still execute the digest function. However, this isn't a big deal as nobody but the test path would supply zero-length associated data. Reported-by: Romain Francoise rom...@orebokech.com Signed-off-by: Herbert Xu herb...@gondor.apana.org.au Looks good to me. diff --git a/crypto/authenc.c b/crypto/authenc.c index 5ef7ba6..d0583a4 100644 --- a/crypto/authenc.c +++ b/crypto/authenc.c @@ -336,7 +336,7 @@ static int crypto_authenc_genicv(struct aead_request *req, u8 *iv, cryptlen += ivsize; } - if (sg_is_last(assoc)) { + if (req-assoclen sg_is_last(assoc)) { authenc_ahash_fn = crypto_authenc_ahash; sg_init_table(asg, 2); sg_set_page(asg, sg_page(assoc), assoc-length, assoc-offset); @@ -490,7 +490,7 @@ static int crypto_authenc_iverify(struct aead_request *req, u8 *iv, cryptlen += ivsize; } - if (sg_is_last(assoc)) { + if (req-assoclen sg_is_last(assoc)) { authenc_ahash_fn = crypto_authenc_ahash; sg_init_table(asg, 2); sg_set_page(asg, sg_page(assoc), assoc-length, assoc-offset); Tested-by: Mathias Krause mini...@googlemail.com -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] crypto: aesni-intel - fix unaligned cbc decrypt for x86-32
On Thu, May 31, 2012 at 7:27 AM, Herbert Xu herb...@gondor.apana.org.au wrote: On Wed, May 30, 2012 at 01:43:08AM +0200, Mathias Krause wrote: The 32 bit variant of cbc(aes) decrypt is using instructions requiring 128 bit aligned memory locations but fails to ensure this constraint in the code. Fix this by loading the data into intermediate registers with load unaligned instructions. This fixes reported general protection faults related to aesni. References: https://bugzilla.kernel.org/show_bug.cgi?id=43223 Reported-by: Daniel gark...@mailueberfall.de Cc: sta...@kernel.org [v2.6.39+] Signed-off-by: Mathias Krause mini...@googlemail.com Have measured this against increasing alignmask to 15? No, but the latter will likely be much slower as it would need to memmove the data if it's not aligned, right? My patch essentially just breaks the combined XOR a memory operand with a register operation into two -- load memory into register, then XOR with registers. It shouldn't be much slower compared to the current version. But it fixes a bug the current version exposes when working on unaligned data. That said, I did some micro benchmark on pxor (%edx), %xmm0 vs. movups (%edx), %xmm1; pxor %xmm1, %xmm0 and observed the latter might be even slightly faster! But changing the code to perform better is out of scope for this patch as it should just fix the bug in the code. We can increase performance in a follow up patch. Mathias -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] crypto: aesni-intel - fix unaligned cbc decrypt for x86-32
The 32 bit variant of cbc(aes) decrypt is using instructions requiring 128 bit aligned memory locations but fails to ensure this constraint in the code. Fix this by loading the data into intermediate registers with load unaligned instructions. This fixes reported general protection faults related to aesni. References: https://bugzilla.kernel.org/show_bug.cgi?id=43223 Reported-by: Daniel gark...@mailueberfall.de Cc: sta...@kernel.org [v2.6.39+] Signed-off-by: Mathias Krause mini...@googlemail.com --- arch/x86/crypto/aesni-intel_asm.S |6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/arch/x86/crypto/aesni-intel_asm.S b/arch/x86/crypto/aesni-intel_asm.S index be6d9e3..3470624 100644 --- a/arch/x86/crypto/aesni-intel_asm.S +++ b/arch/x86/crypto/aesni-intel_asm.S @@ -2460,10 +2460,12 @@ ENTRY(aesni_cbc_dec) pxor IN3, STATE4 movaps IN4, IV #else - pxor (INP), STATE2 - pxor 0x10(INP), STATE3 pxor IN1, STATE4 movaps IN2, IV + movups (INP), IN1 + pxor IN1, STATE2 + movups 0x10(INP), IN2 + pxor IN2, STATE3 #endif movups STATE1, (OUTP) movups STATE2, 0x10(OUTP) -- 1.7.10 -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] crypto: sha1 - use Kbuild supplied flags for AVX test
Commit ea4d26ae (raid5: add AVX optimized RAID5 checksumming) introduced x86/ arch wide defines for AFLAGS and CFLAGS indicating AVX support in binutils based on the same test we have in x86/crypto/ right now. To minimize duplication drop our implementation in favour to the one in x86/. Signed-off-by: Mathias Krause mini...@googlemail.com --- This should be applied to cryptodev-2.6.git after it contains the above mentioned commit, e.g. after cryptodev-2.6.git rebased to/merged v3.5-rc1. arch/x86/crypto/Makefile |7 --- arch/x86/crypto/sha1_ssse3_asm.S |2 +- arch/x86/crypto/sha1_ssse3_glue.c |6 +++--- 3 files changed, 4 insertions(+), 11 deletions(-) diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile index e191ac0..479f95a 100644 --- a/arch/x86/crypto/Makefile +++ b/arch/x86/crypto/Makefile @@ -34,12 +34,5 @@ salsa20-x86_64-y := salsa20-x86_64-asm_64.o salsa20_glue.o serpent-sse2-x86_64-y := serpent-sse2-x86_64-asm_64.o serpent_sse2_glue.o aesni-intel-y := aesni-intel_asm.o aesni-intel_glue.o fpu.o - ghash-clmulni-intel-y := ghash-clmulni-intel_asm.o ghash-clmulni-intel_glue.o - -# enable AVX support only when $(AS) can actually assemble the instructions -ifeq ($(call as-instr,vpxor %xmm0$(comma)%xmm1$(comma)%xmm2,yes,no),yes) -AFLAGS_sha1_ssse3_asm.o += -DSHA1_ENABLE_AVX_SUPPORT -CFLAGS_sha1_ssse3_glue.o += -DSHA1_ENABLE_AVX_SUPPORT -endif sha1-ssse3-y := sha1_ssse3_asm.o sha1_ssse3_glue.o diff --git a/arch/x86/crypto/sha1_ssse3_asm.S b/arch/x86/crypto/sha1_ssse3_asm.S index b2c2f57..49d6987 100644 --- a/arch/x86/crypto/sha1_ssse3_asm.S +++ b/arch/x86/crypto/sha1_ssse3_asm.S @@ -468,7 +468,7 @@ W_PRECALC_SSSE3 */ SHA1_VECTOR_ASM sha1_transform_ssse3 -#ifdef SHA1_ENABLE_AVX_SUPPORT +#ifdef CONFIG_AS_AVX .macro W_PRECALC_AVX diff --git a/arch/x86/crypto/sha1_ssse3_glue.c b/arch/x86/crypto/sha1_ssse3_glue.c index f916499..4a11a9d 100644 --- a/arch/x86/crypto/sha1_ssse3_glue.c +++ b/arch/x86/crypto/sha1_ssse3_glue.c @@ -35,7 +35,7 @@ asmlinkage void sha1_transform_ssse3(u32 *digest, const char *data, unsigned int rounds); -#ifdef SHA1_ENABLE_AVX_SUPPORT +#ifdef CONFIG_AS_AVX asmlinkage void sha1_transform_avx(u32 *digest, const char *data, unsigned int rounds); #endif @@ -184,7 +184,7 @@ static struct shash_alg alg = { } }; -#ifdef SHA1_ENABLE_AVX_SUPPORT +#ifdef CONFIG_AS_AVX static bool __init avx_usable(void) { u64 xcr0; @@ -209,7 +209,7 @@ static int __init sha1_ssse3_mod_init(void) if (cpu_has_ssse3) sha1_transform_asm = sha1_transform_ssse3; -#ifdef SHA1_ENABLE_AVX_SUPPORT +#ifdef CONFIG_AS_AVX /* allow AVX to override SSSE3, it's a little faster */ if (avx_usable()) sha1_transform_asm = sha1_transform_avx; -- 1.5.6.5 -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 0/2] crypto, x86: assembler implementation of SHA1
On Fri, Aug 5, 2011 at 9:49 AM, Herbert Xu herb...@gondor.apana.org.au wrote: On Thu, Aug 04, 2011 at 08:19:23PM +0200, Mathias Krause wrote: This patch series adds an assembler implementation for the SHA1 hash algorithm for the x86-64 architecture. Its raw hash performance can be more than 2 times faster than the generic C implementation. This gives a real world benefit for IPsec with an throughput increase of up to +35%. For concrete numbers have a look at the second patch. I'll apply this as soon as rc1 is out. Thanks! PS if you have time please look into the asynchronous version. I've converted the implementation to ahash. It was pretty easy in the end as I had a reference to look at -- the ghash implementation. But I'm pretty unhappy how many layers of indirection are in there now even for the irq_fpu_usable() case -- not to mention that the !irq_fpu_usable() is much more heavyweight but, at least, it can now always make use of the SSE3 variant. (See the attached patch, but please don't apply it. It's compile tested only! Also I'm not comfortable with the ahash version right now.) I haven't made any benchmarks yet because I've no access to the test machine right now. But I suspect it'll be much slower for small chunks and *maybe* faster for the IPsec case I'm interested in because it can now always use the SSE3 variant. What I'm thinking about is having both, a shash and an ahash variant so the user can actually chose which interface fits its needs best -- e.g., dm-crypt might want to use the shash interface and xfrm the ahash one. Do you think something like that is possible? And if so, how would dm-crypt automatically chose the shash and xfrm the ahash variant? Regards, Mathias 0001-crypto-sha1-convert-to-ahash-interface.patch Description: Binary data
Re: [PATCH v2 2/2] crypto, x86: SSSE3 based SHA1 implementation for x86-64
Hi Max, 2011/8/8 Locktyukhin, Maxim maxim.locktyuk...@intel.com: I'd like to note that at Intel we very much appreciate Mathias effort to port/integrate this implementation into Linux kernel! $0.02 re tcrypt perf numbers below: I believe something must be terribly broken with the tcrypt measurements 20 (and more) cycles per byte shown below are not reasonable numbers for SHA-1 - ~6 c/b (as can be seen in some of the results for Core2) is the expected results ... so, while relative improvement seen is sort of consistent, the absolute performance numbers are very much off (and yes Sandy Bridge on AVX code is expected to be faster than Core2/SSSE3 - ~5.2 c/b vs. ~5.8 c/b on the level of the sha1_update() call to me more precise) this does not affect the proposed patch in any way, it looks like tcrypt's timing problem to me - I'd even venture a guess that it may be due to the use of RDTSC (that gets affected significantly by Turbo/EIST, TSC is isotropic in time but not with the core clock domain, i.e. RDTSC cannot be used to measure core cycles without at least disabling EIST and Turbo, or doing runtime adjustment of actual bus/core clock ratio vs. the standard ratio always used by TSC - I could elaborate more if someone is interested) I found the Sandy Bridge numbers odd too but suspected, it might be because of the laptop platform. The SSSE3 numbers on this platform were slightly lower than the AVX numbers and that for still way off the ones for the Core2 system. But your explanation fits well, too. It might be EIST or Turbo mode that tampered with the numbers. Another, maybe more likely point might be the overhead Andy mentioned. thanks again, -Max Mathias -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 2/2] crypto, x86: SSSE3 based SHA1 implementation for x86-64
On Thu, Aug 11, 2011 at 4:50 PM, Andy Lutomirski l...@mit.edu wrote: I have vague plans to clean up extended state handling and make kernel_fpu_begin work efficiently from any context. (i.e. the first kernel_fpu_begin after a context switch could take up to ~60 ns on Sandy Bridge, but further calls to kernel_fpu_begin would be a single branch.) The current code that handles context switches when user code is using extended state is terrible and will almost certainly become faster in the near future. Sounds good! This would not only improve the performance of sha1_ssse3 but of aesni as well. Hopefully I'll have patches for 3.2 or 3.3. IOW, please don't introduce another thing like the fpu crypto module quite yet unless there's a good reason. I'm looking forward to deleting the fpu module entirely. I've no intention to. So please go ahead and do so. Mathias -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 2/2] crypto, x86: SSSE3 based SHA1 implementation for x86-64
On Thu, Aug 4, 2011 at 8:44 AM, Herbert Xu herb...@gondor.apana.org.au wrote: On Sun, Jul 24, 2011 at 07:53:14PM +0200, Mathias Krause wrote: With this algorithm I was able to increase the throughput of a single IPsec link from 344 Mbit/s to 464 Mbit/s on a Core 2 Quad CPU using the SSSE3 variant -- a speedup of +34.8%. Were you testing this on the transmit side or the receive side? I was running an iperf test on two directly connected systems. Both sides showed me those numbers (iperf server and client). As the IPsec receive code path usually runs in a softirq context, does this code have any effect there at all? It does. Just have a look at how fpu_available() is implemented: ,-[ arch/x86/include/asm/i387.h ] | static inline bool irq_fpu_usable(void) | { | struct pt_regs *regs; | | return !in_interrupt() || !(regs = get_irq_regs()) || \ | user_mode(regs) || (read_cr0() X86_CR0_TS); | } ` So, it'll fail in softirq context when the softirq interrupted a kernel thread or TS in CR0 is set. When it interrupted a userland thread that hasn't the TS flag set in CR0, i.e. the CPU won't generate an exception when we use the FPU, it'll work in softirq context, too. With a busy userland making extensive use of the FPU it'll almost always have to fall back to the generic implementation, right. However, using this module on an IPsec gateway with no real userland at all, you get a nice performance gain. This is pretty similar to the situation with the Intel AES code. Over there they solved it by using the asynchronous interface and deferring the processing to a work queue. This also avoids the situation where you have an FPU/SSE using process that also tries to transmit over IPsec thrashing the FPU state. Interesting. I'll look into this. Now I'm still happy to take this because hashing is very different from ciphers in that some users tend to hash small amounts of data all the time. Those users will typically use the shash interface that you provide here. So I'm interested to know how much of an improvement this is for those users ( 64 bytes). Anything below 64 byte will i(and has to) be padded to a full block, i.e. 64 bytes. If you run the tcrypt speed tests that should provide some useful info. I've summarized the mean values of five consecutive tcrypt runs from two different systems. The first system is an Intel Core i7 2620M based notebook running at 2.70 GHz. It's a Sandy Bridge processor so could make use of the AVX variant. The second system was an Intel Core 2 Quad Xeon system running at 2.40 GHz -- no AVX, but SSSE3. Since the output of tcrypt is a little awkward to read, I've condensed it slightly to make it (hopefully) more readable. Please interpret the table as follow: The triple in the first column is (byte blocks | bytes per update | updates), c/B is cycles per byte. Here are the numbers for the first system: sha1-genericsha1-ssse3 (AVX) ( 16 | 16 | 1): 9.65 MiB/s, 266.2 c/B 12.93 MiB/s, 200.0 c/B ( 64 | 16 | 4):19.05 MiB/s, 140.2 c/B 25.27 MiB/s, 105.6 c/B ( 64 | 64 | 1):21.35 MiB/s, 119.2 c/B 29.29 MiB/s, 87.0 c/B ( 256 | 16 | 16):28.81 MiB/s, 88.8 c/B 37.70 MiB/s, 68.4 c/B ( 256 | 64 | 4):34.58 MiB/s, 74.0 c/B 47.16 MiB/s, 54.8 c/B ( 256 | 256 | 1):37.44 MiB/s, 68.0 c/B 69.01 MiB/s, 36.8 c/B (1024 | 16 | 64):33.55 MiB/s, 76.2 c/B 43.77 MiB/s, 59.0 c/B (1024 | 256 | 4):45.12 MiB/s, 58.0 c/B 88.90 MiB/s, 28.8 c/B (1024 | 1024 | 1):46.69 MiB/s, 54.0 c/B104.39 MiB/s, 25.6 c/B (2048 | 16 | 128):34.66 MiB/s, 74.0 c/B 44.93 MiB/s, 57.2 c/B (2048 | 256 | 8):46.81 MiB/s, 54.0 c/B 93.83 MiB/s, 27.0 c/B (2048 | 1024 | 2):48.28 MiB/s, 52.4 c/B110.98 MiB/s, 23.0 c/B (2048 | 2048 | 1):48.69 MiB/s, 52.0 c/B114.26 MiB/s, 22.0 c/B (4096 | 16 | 256):35.15 MiB/s, 72.6 c/B 45.53 MiB/s, 56.0 c/B (4096 | 256 | 16):47.69 MiB/s, 53.0 c/B 96.46 MiB/s, 26.0 c/B (4096 | 1024 | 4):49.24 MiB/s, 51.0 c/B114.36 MiB/s, 22.0 c/B (4096 | 4096 | 1):49.77 MiB/s, 51.0 c/B119.80 MiB/s, 21.0 c/B (8192 | 16 | 512):35.46 MiB/s, 72.2 c/B 45.84 MiB/s, 55.8 c/B (8192 | 256 | 32):48.15 MiB/s, 53.0 c/B 97.83 MiB/s, 26.0 c/B (8192 | 1024 | 8):49.73 MiB/s, 51.0 c/B116.35 MiB/s, 22.0 c/B (8192 | 4096 | 2):50.10 MiB/s, 50.8 c/B121.66 MiB/s, 21.0 c/B (8192 | 8192 | 1):50.25 MiB/s, 50.8 c/B121.87 MiB/s, 21.0 c/B For the second system I got the following numbers: sha1-genericsha1-ssse3 (SSSE3) ( 16 | 16 | 1):27.23 MiB/s, 106.6 c/B 32.86 MiB/s, 73.8 c/B ( 64 | 16 | 4):51.67 MiB/s, 54.0 c/B 61.90 MiB/s, 37.8 c/B ( 64 | 64 | 1):62.44 MiB/s, 44.2 c/B 74.16 MiB/s, 31.6 c/B
Re: [PATCH v2 2/2] crypto, x86: SSSE3 based SHA1 implementation for x86-64
On Thu, Aug 4, 2011 at 7:05 PM, Mathias Krause mini...@googlemail.com wrote: It does. Just have a look at how fpu_available() is implemented: read: irq_fpu_usable() -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 0/2] crypto, x86: assembler implementation of SHA1
This patch series adds an assembler implementation for the SHA1 hash algorithm for the x86-64 architecture. Its raw hash performance can be more than 2 times faster than the generic C implementation. This gives a real world benefit for IPsec with an throughput increase of up to +35%. For concrete numbers have a look at the second patch. This implementation is currently x86-64 only but might be ported to 32 bit with some effort in a follow up patch. (I had no time to do this yet.) Note: The SSSE3 is no typo, it's Supplemental SSE3. v3 changes: - removed #ifdef in first patch v2 changes: - fixed typo in Makefile making AVX version unusable - whitespace fixes for the .S file Regards, Mathias Mathias Krause (2): crypto, sha1: export sha1_update for reuse crypto, x86: SSSE3 based SHA1 implementation for x86-64 arch/x86/crypto/Makefile |8 + arch/x86/crypto/sha1_ssse3_asm.S | 558 + arch/x86/crypto/sha1_ssse3_glue.c | 240 arch/x86/include/asm/cpufeature.h |3 + crypto/Kconfig| 10 + crypto/sha1_generic.c |9 +- include/crypto/sha.h |3 + 7 files changed, 827 insertions(+), 4 deletions(-) create mode 100644 arch/x86/crypto/sha1_ssse3_asm.S create mode 100644 arch/x86/crypto/sha1_ssse3_glue.c -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 1/2] crypto, sha1: export sha1_update for reuse
Export the update function as crypto_sha1_update() to not have the need to reimplement the same algorithm for each SHA-1 implementation. This way the generic SHA-1 implementation can be used as fallback for other implementations that fail to run under certain circumstances, like the need for an FPU context while executing in IRQ context. Signed-off-by: Mathias Krause mini...@googlemail.com --- crypto/sha1_generic.c |9 + include/crypto/sha.h |3 +++ 2 files changed, 8 insertions(+), 4 deletions(-) diff --git a/crypto/sha1_generic.c b/crypto/sha1_generic.c index 0416091..0b6d907 100644 --- a/crypto/sha1_generic.c +++ b/crypto/sha1_generic.c @@ -36,7 +36,7 @@ static int sha1_init(struct shash_desc *desc) return 0; } -static int sha1_update(struct shash_desc *desc, const u8 *data, +int crypto_sha1_update(struct shash_desc *desc, const u8 *data, unsigned int len) { struct sha1_state *sctx = shash_desc_ctx(desc); @@ -70,6 +70,7 @@ static int sha1_update(struct shash_desc *desc, const u8 *data, return 0; } +EXPORT_SYMBOL(crypto_sha1_update); /* Add padding and return the message digest. */ @@ -86,10 +87,10 @@ static int sha1_final(struct shash_desc *desc, u8 *out) /* Pad out to 56 mod 64 */ index = sctx-count 0x3f; padlen = (index 56) ? (56 - index) : ((64+56) - index); - sha1_update(desc, padding, padlen); + crypto_sha1_update(desc, padding, padlen); /* Append length */ - sha1_update(desc, (const u8 *)bits, sizeof(bits)); + crypto_sha1_update(desc, (const u8 *)bits, sizeof(bits)); /* Store state in digest */ for (i = 0; i 5; i++) @@ -120,7 +121,7 @@ static int sha1_import(struct shash_desc *desc, const void *in) static struct shash_alg alg = { .digestsize = SHA1_DIGEST_SIZE, .init = sha1_init, - .update = sha1_update, + .update = crypto_sha1_update, .final = sha1_final, .export = sha1_export, .import = sha1_import, diff --git a/include/crypto/sha.h b/include/crypto/sha.h index 069e85b..83e6be5 100644 --- a/include/crypto/sha.h +++ b/include/crypto/sha.h @@ -82,4 +82,7 @@ struct sha512_state { u8 buf[SHA512_BLOCK_SIZE]; }; +extern int crypto_sha1_update(struct shash_desc *desc, const u8 *data, + unsigned int len); + #endif -- 1.5.6.5 -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 2/2] crypto, x86: SSSE3 based SHA1 implementation for x86-64
This is an assembler implementation of the SHA1 algorithm using the Supplemental SSE3 (SSSE3) instructions or, when available, the Advanced Vector Extensions (AVX). Testing with the tcrypt module shows the raw hash performance is up to 2.3 times faster than the C implementation, using 8k data blocks on a Core 2 Duo T5500. For the smalest data set (16 byte) it is still 25% faster. Since this implementation uses SSE/YMM registers it cannot safely be used in every situation, e.g. while an IRQ interrupts a kernel thread. The implementation falls back to the generic SHA1 variant, if using the SSE/YMM registers is not possible. With this algorithm I was able to increase the throughput of a single IPsec link from 344 Mbit/s to 464 Mbit/s on a Core 2 Quad CPU using the SSSE3 variant -- a speedup of +34.8%. Saving and restoring SSE/YMM state might make the actual throughput fluctuate when there are FPU intensive userland applications running. For example, meassuring the performance using iperf2 directly on the machine under test gives wobbling numbers because iperf2 uses the FPU for each packet to check if the reporting interval has expired (in the above test I got min/max/avg: 402/484/464 MBit/s). Using this algorithm on a IPsec gateway gives much more reasonable and stable numbers, albeit not as high as in the directly connected case. Here is the result from an RFC 2544 test run with a EXFO Packet Blazer FTB-8510: frame sizesha1-generic sha1-ssse3delta 64 byte 37.5 MBit/s37.5 MBit/s 0.0% 128 byte 56.3 MBit/s62.5 MBit/s +11.0% 256 byte 87.5 MBit/s 100.0 MBit/s +14.3% 512 byte131.3 MBit/s 150.0 MBit/s +14.2% 1024 byte162.5 MBit/s 193.8 MBit/s +19.3% 1280 byte175.0 MBit/s 212.5 MBit/s +21.4% 1420 byte175.0 MBit/s 218.7 MBit/s +25.0% 1518 byte150.0 MBit/s 181.2 MBit/s +20.8% The throughput for the largest frame size is lower than for the previous size because the IP packets need to be fragmented in this case to make there way through the IPsec tunnel. Signed-off-by: Mathias Krause mini...@googlemail.com Cc: Maxim Locktyukhin maxim.locktyuk...@intel.com --- arch/x86/crypto/Makefile |8 + arch/x86/crypto/sha1_ssse3_asm.S | 558 + arch/x86/crypto/sha1_ssse3_glue.c | 240 arch/x86/include/asm/cpufeature.h |3 + crypto/Kconfig| 10 + 5 files changed, 819 insertions(+), 0 deletions(-) create mode 100644 arch/x86/crypto/sha1_ssse3_asm.S create mode 100644 arch/x86/crypto/sha1_ssse3_glue.c diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile index c04f1b7..57c7f7b 100644 --- a/arch/x86/crypto/Makefile +++ b/arch/x86/crypto/Makefile @@ -13,6 +13,7 @@ obj-$(CONFIG_CRYPTO_AES_NI_INTEL) += aesni-intel.o obj-$(CONFIG_CRYPTO_GHASH_CLMUL_NI_INTEL) += ghash-clmulni-intel.o obj-$(CONFIG_CRYPTO_CRC32C_INTEL) += crc32c-intel.o +obj-$(CONFIG_CRYPTO_SHA1_SSSE3) += sha1-ssse3.o aes-i586-y := aes-i586-asm_32.o aes_glue.o twofish-i586-y := twofish-i586-asm_32.o twofish_glue.o @@ -25,3 +26,10 @@ salsa20-x86_64-y := salsa20-x86_64-asm_64.o salsa20_glue.o aesni-intel-y := aesni-intel_asm.o aesni-intel_glue.o fpu.o ghash-clmulni-intel-y := ghash-clmulni-intel_asm.o ghash-clmulni-intel_glue.o + +# enable AVX support only when $(AS) can actually assemble the instructions +ifeq ($(call as-instr,vpxor %xmm0$(comma)%xmm1$(comma)%xmm2,yes,no),yes) +AFLAGS_sha1_ssse3_asm.o += -DSHA1_ENABLE_AVX_SUPPORT +CFLAGS_sha1_ssse3_glue.o += -DSHA1_ENABLE_AVX_SUPPORT +endif +sha1-ssse3-y := sha1_ssse3_asm.o sha1_ssse3_glue.o diff --git a/arch/x86/crypto/sha1_ssse3_asm.S b/arch/x86/crypto/sha1_ssse3_asm.S new file mode 100644 index 000..b2c2f57 --- /dev/null +++ b/arch/x86/crypto/sha1_ssse3_asm.S @@ -0,0 +1,558 @@ +/* + * This is a SIMD SHA-1 implementation. It requires the Intel(R) Supplemental + * SSE3 instruction set extensions introduced in Intel Core Microarchitecture + * processors. CPUs supporting Intel(R) AVX extensions will get an additional + * boost. + * + * This work was inspired by the vectorized implementation of Dean Gaudet. + * Additional information on it can be found at: + *http://www.arctic.org/~dean/crypto/sha1.html + * + * It was improved upon with more efficient vectorization of the message + * scheduling. This implementation has also been optimized for all current and + * several future generations of Intel CPUs. + * + * See this article for more information about the implementation details: + * http://software.intel.com/en-us/articles/improving-the-performance-of-the-secure-hash-algorithm-1/ + * + * Copyright (C) 2010, Intel Corp. + * Authors: Maxim Locktyukhin maxim.locktyuk...@intel.com + *Ronen Zohar ronen.zo...@intel.com + * + * Converted to ATT syntax and adapted for inclusion in the Linux kernel: + * Author: Mathias Krause mini...@googlemail.com + * + * This program is free
Re: [PATCH v2 1/2] crypto, sha1: export sha1_update for reuse
On Sat, Jul 30, 2011 at 3:46 PM, Herbert Xu herb...@gondor.apana.org.au wrote: On Thu, Jul 28, 2011 at 05:29:35PM +0200, Mathias Krause wrote: On Thu, Jul 28, 2011 at 4:58 PM, Herbert Xu herb...@gondor.apana.org.au wrote: On Sun, Jul 24, 2011 at 07:53:13PM +0200, Mathias Krause wrote: diff --git a/include/crypto/sha.h b/include/crypto/sha.h index 069e85b..7c46d0c 100644 --- a/include/crypto/sha.h +++ b/include/crypto/sha.h @@ -82,4 +82,9 @@ struct sha512_state { u8 buf[SHA512_BLOCK_SIZE]; }; +#if defined(CONFIG_CRYPTO_SHA1) || defined (CONFIG_CRYPTO_SHA1_MODULE) +extern int crypto_sha1_update(struct shash_desc *desc, const u8 *data, + unsigned int len); +#endif Please remove the unnecessary #if. The function will only be available when crypto/sha1_generic.o will either be build into the kernel or build as a module. Without the #if a potential wrong user of this function might not be detected as early as at compilation time but as late as at link time, or even worse, as late as on module load time -- which is pretty bad. IMHO it's better to detect errors early, as in reading error: implicit declaration of function ‘crypto_sha1_update’ when trying to compile the code in question instead of Unknown symbol crypto_sha1_update in dmesg when trying to load the module. That for I would like to keep the #if. In order to be consistent please remove the ifdef. In most similar cases in the crypto subsystem we don't do this. As adding such ifdefs all over the place would gain very little, I'd much rather you left it out. Noting that this function wasn't exported before and the only user (sha-ssse3) ensures its availability by other means, it should be okay to remove the #if. I'll update the patch accordingly. Any objections to the second patch? Thanks, Mathias -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 1/2] crypto, sha1: export sha1_update for reuse
On Thu, Jul 28, 2011 at 4:58 PM, Herbert Xu herb...@gondor.apana.org.au wrote: On Sun, Jul 24, 2011 at 07:53:13PM +0200, Mathias Krause wrote: diff --git a/include/crypto/sha.h b/include/crypto/sha.h index 069e85b..7c46d0c 100644 --- a/include/crypto/sha.h +++ b/include/crypto/sha.h @@ -82,4 +82,9 @@ struct sha512_state { u8 buf[SHA512_BLOCK_SIZE]; }; +#if defined(CONFIG_CRYPTO_SHA1) || defined (CONFIG_CRYPTO_SHA1_MODULE) +extern int crypto_sha1_update(struct shash_desc *desc, const u8 *data, + unsigned int len); +#endif Please remove the unnecessary #if. The function will only be available when crypto/sha1_generic.o will either be build into the kernel or build as a module. Without the #if a potential wrong user of this function might not be detected as early as at compilation time but as late as at link time, or even worse, as late as on module load time -- which is pretty bad. IMHO it's better to detect errors early, as in reading error: implicit declaration of function ‘crypto_sha1_update’ when trying to compile the code in question instead of Unknown symbol crypto_sha1_update in dmesg when trying to load the module. That for I would like to keep the #if. Thanks for the review! Mathias -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] crypto, x86: SSSE3 based SHA1 implementation for x86-64
On Sat, Jul 16, 2011 at 2:44 PM, Mathias Krause mini...@googlemail.com wrote: diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile index c04f1b7..a80be92 100644 --- a/arch/x86/crypto/Makefile +++ b/arch/x86/crypto/Makefile @@ -13,6 +13,7 @@ obj-$(CONFIG_CRYPTO_AES_NI_INTEL) += aesni-intel.o obj-$(CONFIG_CRYPTO_GHASH_CLMUL_NI_INTEL) += ghash-clmulni-intel.o obj-$(CONFIG_CRYPTO_CRC32C_INTEL) += crc32c-intel.o +obj-$(CONFIG_CRYPTO_SHA1_SSSE3) += sha1-ssse3.o aes-i586-y := aes-i586-asm_32.o aes_glue.o twofish-i586-y := twofish-i586-asm_32.o twofish_glue.o @@ -25,3 +26,10 @@ salsa20-x86_64-y := salsa20-x86_64-asm_64.o salsa20_glue.o aesni-intel-y := aesni-intel_asm.o aesni-intel_glue.o fpu.o ghash-clmulni-intel-y := ghash-clmulni-intel_asm.o ghash-clmulni-intel_glue.o + +# enable AVX support only when $(AS) can actually assemble the instructions +ifeq ($(call as-instr,vpxor %xmm0$(comma)%xmm1$(comma)%xmm2,yes,no),yes) +AFLAGS_sha1_ssse3.o += -DSHA1_ENABLE_AVX_SUPPORT This should have been AFLAGS_sha1_ssse3_asm.o += -DSHA1_ENABLE_AVX_SUPPORT instead. Sorry, a missing adjustment for a last minute file rename. I'll post a new version of the series with a wider target audience since there have been no reply so far for a week. +CFLAGS_sha1_ssse3_glue.o += -DSHA1_ENABLE_AVX_SUPPORT +endif +sha1-ssse3-y := sha1_ssse3_asm.o sha1_ssse3_glue.o diff --git a/arch/x86/crypto/sha1_ssse3_asm.S b/arch/x86/crypto/sha1_ssse3_asm.S new file mode 100644 index 000..8fb0ba6 Thanks, Mathias -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 0/2] crypto, x86: assembler implementation of SHA1
This patch series adds an assembler implementation for the SHA1 hash algorithm for the x86-64 architecture. Its raw hash performance can be more than 2 times faster than the generic C implementation. This gives a real world benefit for IPsec with an throughput increase of up to +35%. For concrete numbers have a look at the second patch. This implementation is currently x86-64 only but might be ported to 32 bit with some effort in a follow up patch. (I had no time to do this yet.) Note: The SSSE3 is no typo, it's Supplemental SSE3. v2 changes: - fixed typo in Makefile making AVX version unusable - whitespace fixes for the .S file Regards, Mathias Mathias Krause (2): crypto, sha1: export sha1_update for reuse crypto, x86: SSSE3 based SHA1 implementation for x86-64 arch/x86/crypto/Makefile |8 + arch/x86/crypto/sha1_ssse3_asm.S | 558 + arch/x86/crypto/sha1_ssse3_glue.c | 240 arch/x86/include/asm/cpufeature.h |3 + crypto/Kconfig| 10 + crypto/sha1_generic.c |9 +- include/crypto/sha.h |5 + 7 files changed, 829 insertions(+), 4 deletions(-) create mode 100644 arch/x86/crypto/sha1_ssse3_asm.S create mode 100644 arch/x86/crypto/sha1_ssse3_glue.c -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 1/2] crypto, sha1: export sha1_update for reuse
Export the update function as crypto_sha1_update() to not have the need to reimplement the same algorithm for each SHA-1 implementation. This way the generic SHA-1 implementation can be used as fallback for other implementations that fail to run under certain circumstances, like the need for an FPU context while executing in IRQ context. Signed-off-by: Mathias Krause mini...@googlemail.com --- crypto/sha1_generic.c |9 + include/crypto/sha.h |5 + 2 files changed, 10 insertions(+), 4 deletions(-) diff --git a/crypto/sha1_generic.c b/crypto/sha1_generic.c index 0416091..0b6d907 100644 --- a/crypto/sha1_generic.c +++ b/crypto/sha1_generic.c @@ -36,7 +36,7 @@ static int sha1_init(struct shash_desc *desc) return 0; } -static int sha1_update(struct shash_desc *desc, const u8 *data, +int crypto_sha1_update(struct shash_desc *desc, const u8 *data, unsigned int len) { struct sha1_state *sctx = shash_desc_ctx(desc); @@ -70,6 +70,7 @@ static int sha1_update(struct shash_desc *desc, const u8 *data, return 0; } +EXPORT_SYMBOL(crypto_sha1_update); /* Add padding and return the message digest. */ @@ -86,10 +87,10 @@ static int sha1_final(struct shash_desc *desc, u8 *out) /* Pad out to 56 mod 64 */ index = sctx-count 0x3f; padlen = (index 56) ? (56 - index) : ((64+56) - index); - sha1_update(desc, padding, padlen); + crypto_sha1_update(desc, padding, padlen); /* Append length */ - sha1_update(desc, (const u8 *)bits, sizeof(bits)); + crypto_sha1_update(desc, (const u8 *)bits, sizeof(bits)); /* Store state in digest */ for (i = 0; i 5; i++) @@ -120,7 +121,7 @@ static int sha1_import(struct shash_desc *desc, const void *in) static struct shash_alg alg = { .digestsize = SHA1_DIGEST_SIZE, .init = sha1_init, - .update = sha1_update, + .update = crypto_sha1_update, .final = sha1_final, .export = sha1_export, .import = sha1_import, diff --git a/include/crypto/sha.h b/include/crypto/sha.h index 069e85b..7c46d0c 100644 --- a/include/crypto/sha.h +++ b/include/crypto/sha.h @@ -82,4 +82,9 @@ struct sha512_state { u8 buf[SHA512_BLOCK_SIZE]; }; +#if defined(CONFIG_CRYPTO_SHA1) || defined (CONFIG_CRYPTO_SHA1_MODULE) +extern int crypto_sha1_update(struct shash_desc *desc, const u8 *data, + unsigned int len); +#endif + #endif -- 1.5.6.5 -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 2/2] crypto, x86: SSSE3 based SHA1 implementation for x86-64
This is an assembler implementation of the SHA1 algorithm using the Supplemental SSE3 (SSSE3) instructions or, when available, the Advanced Vector Extensions (AVX). Testing with the tcrypt module shows the raw hash performance is up to 2.3 times faster than the C implementation, using 8k data blocks on a Core 2 Duo T5500. For the smalest data set (16 byte) it is still 25% faster. Since this implementation uses SSE/YMM registers it cannot safely be used in every situation, e.g. while an IRQ interrupts a kernel thread. The implementation falls back to the generic SHA1 variant, if using the SSE/YMM registers is not possible. With this algorithm I was able to increase the throughput of a single IPsec link from 344 Mbit/s to 464 Mbit/s on a Core 2 Quad CPU using the SSSE3 variant -- a speedup of +34.8%. Saving and restoring SSE/YMM state might make the actual throughput fluctuate when there are FPU intensive userland applications running. For example, meassuring the performance using iperf2 directly on the machine under test gives wobbling numbers because iperf2 uses the FPU for each packet to check if the reporting interval has expired (in the above test I got min/max/avg: 402/484/464 MBit/s). Using this algorithm on a IPsec gateway gives much more reasonable and stable numbers, albeit not as high as in the directly connected case. Here is the result from an RFC 2544 test run with a EXFO Packet Blazer FTB-8510: frame sizesha1-generic sha1-ssse3delta 64 byte 37.5 MBit/s37.5 MBit/s 0.0% 128 byte 56.3 MBit/s62.5 MBit/s +11.0% 256 byte 87.5 MBit/s 100.0 MBit/s +14.3% 512 byte131.3 MBit/s 150.0 MBit/s +14.2% 1024 byte162.5 MBit/s 193.8 MBit/s +19.3% 1280 byte175.0 MBit/s 212.5 MBit/s +21.4% 1420 byte175.0 MBit/s 218.7 MBit/s +25.0% 1518 byte150.0 MBit/s 181.2 MBit/s +20.8% The throughput for the largest frame size is lower than for the previous size because the IP packets need to be fragmented in this case to make there way through the IPsec tunnel. Signed-off-by: Mathias Krause mini...@googlemail.com Cc: Maxim Locktyukhin maxim.locktyuk...@intel.com --- arch/x86/crypto/Makefile |8 + arch/x86/crypto/sha1_ssse3_asm.S | 558 + arch/x86/crypto/sha1_ssse3_glue.c | 240 arch/x86/include/asm/cpufeature.h |3 + crypto/Kconfig| 10 + 5 files changed, 819 insertions(+), 0 deletions(-) create mode 100644 arch/x86/crypto/sha1_ssse3_asm.S create mode 100644 arch/x86/crypto/sha1_ssse3_glue.c diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile index c04f1b7..57c7f7b 100644 --- a/arch/x86/crypto/Makefile +++ b/arch/x86/crypto/Makefile @@ -13,6 +13,7 @@ obj-$(CONFIG_CRYPTO_AES_NI_INTEL) += aesni-intel.o obj-$(CONFIG_CRYPTO_GHASH_CLMUL_NI_INTEL) += ghash-clmulni-intel.o obj-$(CONFIG_CRYPTO_CRC32C_INTEL) += crc32c-intel.o +obj-$(CONFIG_CRYPTO_SHA1_SSSE3) += sha1-ssse3.o aes-i586-y := aes-i586-asm_32.o aes_glue.o twofish-i586-y := twofish-i586-asm_32.o twofish_glue.o @@ -25,3 +26,10 @@ salsa20-x86_64-y := salsa20-x86_64-asm_64.o salsa20_glue.o aesni-intel-y := aesni-intel_asm.o aesni-intel_glue.o fpu.o ghash-clmulni-intel-y := ghash-clmulni-intel_asm.o ghash-clmulni-intel_glue.o + +# enable AVX support only when $(AS) can actually assemble the instructions +ifeq ($(call as-instr,vpxor %xmm0$(comma)%xmm1$(comma)%xmm2,yes,no),yes) +AFLAGS_sha1_ssse3_asm.o += -DSHA1_ENABLE_AVX_SUPPORT +CFLAGS_sha1_ssse3_glue.o += -DSHA1_ENABLE_AVX_SUPPORT +endif +sha1-ssse3-y := sha1_ssse3_asm.o sha1_ssse3_glue.o diff --git a/arch/x86/crypto/sha1_ssse3_asm.S b/arch/x86/crypto/sha1_ssse3_asm.S new file mode 100644 index 000..b2c2f57 --- /dev/null +++ b/arch/x86/crypto/sha1_ssse3_asm.S @@ -0,0 +1,558 @@ +/* + * This is a SIMD SHA-1 implementation. It requires the Intel(R) Supplemental + * SSE3 instruction set extensions introduced in Intel Core Microarchitecture + * processors. CPUs supporting Intel(R) AVX extensions will get an additional + * boost. + * + * This work was inspired by the vectorized implementation of Dean Gaudet. + * Additional information on it can be found at: + *http://www.arctic.org/~dean/crypto/sha1.html + * + * It was improved upon with more efficient vectorization of the message + * scheduling. This implementation has also been optimized for all current and + * several future generations of Intel CPUs. + * + * See this article for more information about the implementation details: + * http://software.intel.com/en-us/articles/improving-the-performance-of-the-secure-hash-algorithm-1/ + * + * Copyright (C) 2010, Intel Corp. + * Authors: Maxim Locktyukhin maxim.locktyuk...@intel.com + *Ronen Zohar ronen.zo...@intel.com + * + * Converted to ATT syntax and adapted for inclusion in the Linux kernel: + * Author: Mathias Krause mini...@googlemail.com + * + * This program is free
[PATCH 1/2] crypto, sha1: export sha1_update for reuse
Export the update function as crypto_sha1_update() to not have the need to reimplement the same algorithm for each SHA-1 implementation. This way the generic SHA-1 implementation can be used as fallback for other implementations that fail to run under certain circumstances, like the need for an FPU context while executing in IRQ context. Signed-off-by: Mathias Krause mini...@googlemail.com --- crypto/sha1_generic.c |9 + include/crypto/sha.h |5 + 2 files changed, 10 insertions(+), 4 deletions(-) diff --git a/crypto/sha1_generic.c b/crypto/sha1_generic.c index 0416091..0b6d907 100644 --- a/crypto/sha1_generic.c +++ b/crypto/sha1_generic.c @@ -36,7 +36,7 @@ static int sha1_init(struct shash_desc *desc) return 0; } -static int sha1_update(struct shash_desc *desc, const u8 *data, +int crypto_sha1_update(struct shash_desc *desc, const u8 *data, unsigned int len) { struct sha1_state *sctx = shash_desc_ctx(desc); @@ -70,6 +70,7 @@ static int sha1_update(struct shash_desc *desc, const u8 *data, return 0; } +EXPORT_SYMBOL(crypto_sha1_update); /* Add padding and return the message digest. */ @@ -86,10 +87,10 @@ static int sha1_final(struct shash_desc *desc, u8 *out) /* Pad out to 56 mod 64 */ index = sctx-count 0x3f; padlen = (index 56) ? (56 - index) : ((64+56) - index); - sha1_update(desc, padding, padlen); + crypto_sha1_update(desc, padding, padlen); /* Append length */ - sha1_update(desc, (const u8 *)bits, sizeof(bits)); + crypto_sha1_update(desc, (const u8 *)bits, sizeof(bits)); /* Store state in digest */ for (i = 0; i 5; i++) @@ -120,7 +121,7 @@ static int sha1_import(struct shash_desc *desc, const void *in) static struct shash_alg alg = { .digestsize = SHA1_DIGEST_SIZE, .init = sha1_init, - .update = sha1_update, + .update = crypto_sha1_update, .final = sha1_final, .export = sha1_export, .import = sha1_import, diff --git a/include/crypto/sha.h b/include/crypto/sha.h index 069e85b..7c46d0c 100644 --- a/include/crypto/sha.h +++ b/include/crypto/sha.h @@ -82,4 +82,9 @@ struct sha512_state { u8 buf[SHA512_BLOCK_SIZE]; }; +#if defined(CONFIG_CRYPTO_SHA1) || defined (CONFIG_CRYPTO_SHA1_MODULE) +extern int crypto_sha1_update(struct shash_desc *desc, const u8 *data, + unsigned int len); +#endif + #endif -- 1.5.6.5 -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] crypto, x86: SSSE3 based SHA1 implementation for x86-64
This is an assembler implementation of the SHA1 algorithm using the Supplemental SSE3 (SSSE3) instructions or, when available, the Advanced Vector Extensions (AVX). Testing with the tcrypt module shows the raw hash performance is up to 2.3 times faster than the C implementation, using 8k data blocks on a Core 2 Duo T5500. For the smalest data set (16 byte) it is still 25% faster. Since this implementation uses SSE/YMM registers it cannot safely be used in every situation, e.g. while an IRQ interrupts a kernel thread. The implementation falls back to the generic SHA1 variant, if using the SSE/YMM registers is not possible. With this algorithm I was able to increase the throughput of a single IPsec link from 344 Mbit/s to 464 Mbit/s on a Core 2 Quad CPU using the SSSE3 variant -- a speedup of +34.8%. Saving and restoring SSE/YMM state might make the actual throughput fluctuate when there are FPU intensive userland applications running. For example, meassuring the performance using iperf2 directly on the machine under test gives wobbling numbers because iperf2 uses the FPU for each packet to check if the reporting interval has expired (in the above test I got min/max/avg: 402/484/464 MBit/s). Using this algorithm on a IPsec gateway gives much more reasonable and stable numbers, albeit not as high as in the directly connected case. Here is the result from an RFC 2544 test run with a EXFO Packet Blazer FTB-8510: frame sizesha1-generic sha1-ssse3delta 64 byte 37.5 MBit/s37.5 MBit/s 0.0% 128 byte 56.3 MBit/s62.5 MBit/s +11.0% 256 byte 87.5 MBit/s 100.0 MBit/s +14.3% 512 byte131.3 MBit/s 150.0 MBit/s +14.2% 1024 byte162.5 MBit/s 193.8 MBit/s +19.3% 1280 byte175.0 MBit/s 212.5 MBit/s +21.4% 1420 byte175.0 MBit/s 218.7 MBit/s +25.0% 1518 byte150.0 MBit/s 181.2 MBit/s +20.8% The throughput for the largest frame size is lower than for the previous size because the IP packets need to be fragmented in this case to make there way through the IPsec tunnel. Signed-off-by: Mathias Krause mini...@googlemail.com Cc: Maxim Locktyukhin maxim.locktyuk...@intel.com --- arch/x86/crypto/Makefile |8 + arch/x86/crypto/sha1_ssse3_asm.S | 558 + arch/x86/crypto/sha1_ssse3_glue.c | 240 arch/x86/include/asm/cpufeature.h |3 + crypto/Kconfig| 10 + 5 files changed, 819 insertions(+), 0 deletions(-) create mode 100644 arch/x86/crypto/sha1_ssse3_asm.S create mode 100644 arch/x86/crypto/sha1_ssse3_glue.c diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile index c04f1b7..a80be92 100644 --- a/arch/x86/crypto/Makefile +++ b/arch/x86/crypto/Makefile @@ -13,6 +13,7 @@ obj-$(CONFIG_CRYPTO_AES_NI_INTEL) += aesni-intel.o obj-$(CONFIG_CRYPTO_GHASH_CLMUL_NI_INTEL) += ghash-clmulni-intel.o obj-$(CONFIG_CRYPTO_CRC32C_INTEL) += crc32c-intel.o +obj-$(CONFIG_CRYPTO_SHA1_SSSE3) += sha1-ssse3.o aes-i586-y := aes-i586-asm_32.o aes_glue.o twofish-i586-y := twofish-i586-asm_32.o twofish_glue.o @@ -25,3 +26,10 @@ salsa20-x86_64-y := salsa20-x86_64-asm_64.o salsa20_glue.o aesni-intel-y := aesni-intel_asm.o aesni-intel_glue.o fpu.o ghash-clmulni-intel-y := ghash-clmulni-intel_asm.o ghash-clmulni-intel_glue.o + +# enable AVX support only when $(AS) can actually assemble the instructions +ifeq ($(call as-instr,vpxor %xmm0$(comma)%xmm1$(comma)%xmm2,yes,no),yes) +AFLAGS_sha1_ssse3.o += -DSHA1_ENABLE_AVX_SUPPORT +CFLAGS_sha1_ssse3_glue.o += -DSHA1_ENABLE_AVX_SUPPORT +endif +sha1-ssse3-y := sha1_ssse3_asm.o sha1_ssse3_glue.o diff --git a/arch/x86/crypto/sha1_ssse3_asm.S b/arch/x86/crypto/sha1_ssse3_asm.S new file mode 100644 index 000..8fb0ba6 --- /dev/null +++ b/arch/x86/crypto/sha1_ssse3_asm.S @@ -0,0 +1,558 @@ +/* + * This is a SIMD SHA-1 implementation. It requires the Intel(R) Supplemental + * SSE3 instruction set extensions introduced in Intel Core Microarchitecture + * processors. CPUs supporting Intel(R) AVX extensions will get an additional + * boost. + * + * This work was inspired by the vectorized implementation of Dean Gaudet. + * Additional information on it can be found at: + *http://www.arctic.org/~dean/crypto/sha1.html + * + * It was improved upon with more efficient vectorization of the message + * scheduling. This implementation has also been optimized for all current and + * several future generations of Intel CPUs. + * + * See this article for more information about the implementation details: + * http://software.intel.com/en-us/articles/improving-the-performance-of-the-secure-hash-algorithm-1/ + * + * Copyright (C) 2010, Intel Corp. + * Authors: Maxim Locktyukhin maxim.locktyuk...@intel.com + *Ronen Zohar ronen.zo...@intel.com + * + * Converted to ATT syntax and adapted for inclusion in the Linux kernel: + * Author: Mathias Krause mini...@googlemail.com + * + * This program is free
[PATCH] crypto, gf128: fix call to memset()
In gf128mul_lle() and gf128mul_bbe() r isn't completely initialized with zero because the size argument passed to memset() is the size of the pointer, not the structure it points to. Luckily there are no in-kernel users of those functions so the ABI change implied by this fix should break no existing code. Based on a patch by the PaX Team. Signed-off-by: Mathias Krause mini...@googlemail.com Cc: PaX Team pagee...@freemail.hu --- crypto/gf128mul.c |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/crypto/gf128mul.c b/crypto/gf128mul.c index df35e4c..5276607 100644 --- a/crypto/gf128mul.c +++ b/crypto/gf128mul.c @@ -182,7 +182,7 @@ void gf128mul_lle(be128 *r, const be128 *b) for (i = 0; i 7; ++i) gf128mul_x_lle(p[i + 1], p[i]); - memset(r, 0, sizeof(r)); + memset(r, 0, sizeof(*r)); for (i = 0;;) { u8 ch = ((u8 *)b)[15 - i]; @@ -220,7 +220,7 @@ void gf128mul_bbe(be128 *r, const be128 *b) for (i = 0; i 7; ++i) gf128mul_x_bbe(p[i + 1], p[i]); - memset(r, 0, sizeof(r)); + memset(r, 0, sizeof(*r)); for (i = 0;;) { u8 ch = ((u8 *)b)[i]; -- 1.5.6.5 -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: linux-next: Tree for November 29 (aesni-intel)
On 29.11.2010, 17:31 Randy Dunlap wrote: On Mon, 29 Nov 2010 14:03:35 +1100 Stephen Rothwell wrote: Hi all, Changes since 20101126: on i386 builds, I get tons of these (and more) errors: arch/x86/crypto/aesni-intel_asm.S:841: Error: bad register name `%r12' arch/x86/crypto/aesni-intel_asm.S:842: Error: bad register name `%r13' arch/x86/crypto/aesni-intel_asm.S:843: Error: bad register name `%r14' arch/x86/crypto/aesni-intel_asm.S:844: Error: bad register name `%rsp' arch/x86/crypto/aesni-intel_asm.S:849: Error: bad register name `%rsp' arch/x86/crypto/aesni-intel_asm.S:850: Error: bad register name `%rsp' arch/x86/crypto/aesni-intel_asm.S:851: Error: bad register name `%r9' even though the kernel .config file says: CONFIG_CRYPTO_AES=m CONFIG_CRYPTO_AES_586=m CONFIG_CRYPTO_AES_NI_INTEL=m Should arch/x86/crypto/aesni-intel_asm.S be testing #ifdef CONFIG_X86_64 instead of #ifdef __x86_64__ or does that not matter? or is this a toolchain issue? Well, __x86_64__ should be a build-in define of the compiler while CONFIG_X86_64 is defined for 64 bit builds in include/generated/autoconf.h. So by using the latter we should be on the safe side but if your compiler defines __x86_64__ for 32-bit builds it's simply broken. Also git grep showed quite a few more places using __x86_64__ so those would miscompile on your toolchain, too. But it looks like linux-next is just missing 559ad0ff1368baea14dbc3207d55b02bd69bda4b from Herbert's git repo at git://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git. That should fix the build issue. Kind Regards, Mathias -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: linux-next: Tree for November 29 (aesni-intel)
On 29.11.2010, 19:54 Randy Dunlap wrote: On 11/29/10 10:26, Mathias Krause wrote: On 29.11.2010, 17:31 Randy Dunlap wrote: On Mon, 29 Nov 2010 14:03:35 +1100 Stephen Rothwell wrote: Hi all, Changes since 20101126: on i386 builds, I get tons of these (and more) errors: arch/x86/crypto/aesni-intel_asm.S:841: Error: bad register name `%r12' arch/x86/crypto/aesni-intel_asm.S:842: Error: bad register name `%r13' arch/x86/crypto/aesni-intel_asm.S:843: Error: bad register name `%r14' arch/x86/crypto/aesni-intel_asm.S:844: Error: bad register name `%rsp' arch/x86/crypto/aesni-intel_asm.S:849: Error: bad register name `%rsp' arch/x86/crypto/aesni-intel_asm.S:850: Error: bad register name `%rsp' arch/x86/crypto/aesni-intel_asm.S:851: Error: bad register name `%r9' even though the kernel .config file says: CONFIG_CRYPTO_AES=m CONFIG_CRYPTO_AES_586=m CONFIG_CRYPTO_AES_NI_INTEL=m Should arch/x86/crypto/aesni-intel_asm.S be testing #ifdef CONFIG_X86_64 instead of #ifdef __x86_64__ or does that not matter? or is this a toolchain issue? Well, __x86_64__ should be a build-in define of the compiler while CONFIG_X86_64 is defined for 64 bit builds in include/generated/autoconf.h. So by using the latter we should be on the safe side but if your compiler defines __x86_64__ for 32-bit builds it's simply broken. Also git grep showed quite a few more places using __x86_64__ so those would miscompile on your toolchain, too. But it looks like linux-next is just missing 559ad0ff1368baea14dbc3207d55b02bd69bda4b from Herbert's git repo at git://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git. That should fix the build issue. The build problem still happens when that patch is applied. That's weird. So it must be something with your toolchain. Can you please post the output of the following commands?: $ touch /tmp/null.c; cc -m32 -dD -E /tmp/null.c | grep -E 'x86|i.86' $ touch /tmp/null.c; cc -m64 -dD -E /tmp/null.c | grep -E 'x86|i.86' Beside that, the patch below should fix the issue with your toolchain by using CONFIG_X86_64 instead of __x86_64__. Sorry for the inconvenience, Mathias [PATCH] crypto: aesni-intel - Fixed another build error on x86-32 It looks like not all compilers undef __x86_64__ for 32-bit builds so switch to CONFIG_X86_64 to test if we're building for 64 or 32 bit. Signed-off-by: Mathias Krause mini...@googlemail.com --- arch/x86/crypto/aesni-intel_asm.S | 40 ++-- 1 files changed, 20 insertions(+), 20 deletions(-) diff --git a/arch/x86/crypto/aesni-intel_asm.S b/arch/x86/crypto/aesni-intel_asm.S index d528fde..de0ec32 100644 --- a/arch/x86/crypto/aesni-intel_asm.S +++ b/arch/x86/crypto/aesni-intel_asm.S @@ -32,7 +32,7 @@ #include linux/linkage.h #include asm/inst.h -#ifdef __x86_64__ +#ifdef CONFIG_X86_64 .data POLY: .octa 0xC201 TWOONE: .octa 0x00010001 @@ -105,7 +105,7 @@ enc:.octa 0x2 #define CTR%xmm11 #define INC%xmm12 -#ifdef __x86_64__ +#ifdef CONFIG_X86_64 #define AREG %rax #define KEYP %rdi #define OUTP %rsi @@ -132,7 +132,7 @@ enc:.octa 0x2 #endif -#ifdef __x86_64__ +#ifdef CONFIG_X86_64 /* GHASH_MUL MACRO to implement: Data*HashKey mod (128,127,126,121,0) * * @@ -1333,7 +1333,7 @@ _key_expansion_256b: * unsigned int key_len) */ ENTRY(aesni_set_key) -#ifndef __x86_64__ +#ifndef CONFIG_X86_64 pushl KEYP movl 8(%esp), KEYP # ctx movl 12(%esp), UKEYP# in_key @@ -1435,7 +1435,7 @@ ENTRY(aesni_set_key) cmp TKEYP, KEYP jb .Ldec_key_loop xor AREG, AREG -#ifndef __x86_64__ +#ifndef CONFIG_X86_64 popl KEYP #endif ret @@ -1444,7 +1444,7 @@ ENTRY(aesni_set_key) * void aesni_enc(struct crypto_aes_ctx *ctx, u8 *dst, const u8 *src) */ ENTRY(aesni_enc) -#ifndef __x86_64__ +#ifndef CONFIG_X86_64 pushl KEYP pushl KLEN movl 12(%esp), KEYP @@ -1455,7 +1455,7 @@ ENTRY(aesni_enc) movups (INP), STATE # input call _aesni_enc1 movups STATE, (OUTP)# output -#ifndef __x86_64__ +#ifndef CONFIG_X86_64 popl KLEN popl KEYP #endif @@ -1630,7 +1630,7 @@ _aesni_enc4: * void aesni_dec (struct crypto_aes_ctx *ctx, u8 *dst, const u8 *src) */ ENTRY(aesni_dec) -#ifndef __x86_64__ +#ifndef CONFIG_X86_64 pushl KEYP pushl KLEN movl 12(%esp), KEYP @@ -1642,7 +1642,7 @@ ENTRY(aesni_dec) movups (INP), STATE # input call _aesni_dec1 movups STATE, (OUTP)#output -#ifndef __x86_64__ +#ifndef CONFIG_X86_64 popl KLEN popl KEYP #endif @@ -1818,7 +1818,7 @@ _aesni_dec4: * size_t len) */ ENTRY(aesni_ecb_enc) -#ifndef __x86_64__ +#ifndef CONFIG_X86_64 pushl LEN pushl KEYP pushl KLEN @@ -1863,7 +1863,7
Re: linux-next: Tree for November 29 (aesni-intel)
On 29.11.2010, 20:31 Randy Dunlap wrote: On 11/29/10 11:21, Mathias Krause wrote: On 29.11.2010, 19:54 Randy Dunlap wrote: On 11/29/10 10:26, Mathias Krause wrote: On 29.11.2010, 17:31 Randy Dunlap wrote: On Mon, 29 Nov 2010 14:03:35 +1100 Stephen Rothwell wrote: Hi all, Changes since 20101126: on i386 builds, I get tons of these (and more) errors: arch/x86/crypto/aesni-intel_asm.S:841: Error: bad register name `%r12' arch/x86/crypto/aesni-intel_asm.S:842: Error: bad register name `%r13' arch/x86/crypto/aesni-intel_asm.S:843: Error: bad register name `%r14' arch/x86/crypto/aesni-intel_asm.S:844: Error: bad register name `%rsp' arch/x86/crypto/aesni-intel_asm.S:849: Error: bad register name `%rsp' arch/x86/crypto/aesni-intel_asm.S:850: Error: bad register name `%rsp' arch/x86/crypto/aesni-intel_asm.S:851: Error: bad register name `%r9' even though the kernel .config file says: CONFIG_CRYPTO_AES=m CONFIG_CRYPTO_AES_586=m CONFIG_CRYPTO_AES_NI_INTEL=m Should arch/x86/crypto/aesni-intel_asm.S be testing #ifdef CONFIG_X86_64 instead of #ifdef __x86_64__ or does that not matter? or is this a toolchain issue? Well, __x86_64__ should be a build-in define of the compiler while CONFIG_X86_64 is defined for 64 bit builds in include/generated/autoconf.h. So by using the latter we should be on the safe side but if your compiler defines __x86_64__ for 32-bit builds it's simply broken. Also git grep showed quite a few more places using __x86_64__ so those would miscompile on your toolchain, too. But it looks like linux-next is just missing 559ad0ff1368baea14dbc3207d55b02bd69bda4b from Herbert's git repo at git://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git. That should fix the build issue. The build problem still happens when that patch is applied. That's weird. So it must be something with your toolchain. Can you please post the output of the following commands?: $ touch /tmp/null.c; cc -m32 -dD -E /tmp/null.c | grep -E 'x86|i.86' #define __i386 1 #define __i386__ 1 #define i386 1 #define __i586 1 #define __i586__ 1 $ touch /tmp/null.c; cc -m64 -dD -E /tmp/null.c | grep -E 'x86|i.86' #define __x86_64 1 #define __x86_64__ 1 So that's not the problem... and the patch below didn't help. That's odd. The output of the commands looks good so the x86-64 specific code should be left out for 32-bit builds. :/ Sorry that I even asked about that. What next? Can you please post the full error message. Meanwhile I'm checking out a linux-next tree, trying to reproduce your problem. -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: linux-next: Tree for November 29 (aesni-intel)
On 29.11.2010, 20:31 Randy Dunlap wrote: On 11/29/10 11:21, Mathias Krause wrote: On 29.11.2010, 19:54 Randy Dunlap wrote: On 11/29/10 10:26, Mathias Krause wrote: On 29.11.2010, 17:31 Randy Dunlap wrote: On Mon, 29 Nov 2010 14:03:35 +1100 Stephen Rothwell wrote: Hi all, Changes since 20101126: on i386 builds, I get tons of these (and more) errors: arch/x86/crypto/aesni-intel_asm.S:841: Error: bad register name `%r12' arch/x86/crypto/aesni-intel_asm.S:842: Error: bad register name `%r13' arch/x86/crypto/aesni-intel_asm.S:843: Error: bad register name `%r14' arch/x86/crypto/aesni-intel_asm.S:844: Error: bad register name `%rsp' arch/x86/crypto/aesni-intel_asm.S:849: Error: bad register name `%rsp' arch/x86/crypto/aesni-intel_asm.S:850: Error: bad register name `%rsp' arch/x86/crypto/aesni-intel_asm.S:851: Error: bad register name `%r9' even though the kernel .config file says: CONFIG_CRYPTO_AES=m CONFIG_CRYPTO_AES_586=m CONFIG_CRYPTO_AES_NI_INTEL=m Should arch/x86/crypto/aesni-intel_asm.S be testing #ifdef CONFIG_X86_64 instead of #ifdef __x86_64__ or does that not matter? or is this a toolchain issue? Well, __x86_64__ should be a build-in define of the compiler while CONFIG_X86_64 is defined for 64 bit builds in include/generated/autoconf.h. So by using the latter we should be on the safe side but if your compiler defines __x86_64__ for 32-bit builds it's simply broken. Also git grep showed quite a few more places using __x86_64__ so those would miscompile on your toolchain, too. But it looks like linux-next is just missing 559ad0ff1368baea14dbc3207d55b02bd69bda4b from Herbert's git repo at git://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git. That should fix the build issue. The build problem still happens when that patch is applied. That's weird. So it must be something with your toolchain. Can you please post the output of the following commands?: $ touch /tmp/null.c; cc -m32 -dD -E /tmp/null.c | grep -E 'x86|i.86' #define __i386 1 #define __i386__ 1 #define i386 1 #define __i586 1 #define __i586__ 1 $ touch /tmp/null.c; cc -m64 -dD -E /tmp/null.c | grep -E 'x86|i.86' #define __x86_64 1 #define __x86_64__ 1 So that's not the problem... and the patch below didn't help. Sorry that I even asked about that. What next? Sorry, I cannot reproduce the problem with the latest linux-next and commit 559ad0ff1368baea14dbc3207d55b02bd69bda4b from cryptodev-2.6 applied. Please ensure you've applied that patch. Regards, Mathias-- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: linux-next: Tree for November 29 (aesni-intel)
On 29.11.2010, 20:54 Randy Dunlap wrote: On 11/29/10 11:45, Mathias Krause wrote: On 29.11.2010, 20:31 Randy Dunlap wrote: On 11/29/10 11:21, Mathias Krause wrote: On 29.11.2010, 19:54 Randy Dunlap wrote: On 11/29/10 10:26, Mathias Krause wrote: On 29.11.2010, 17:31 Randy Dunlap wrote: On Mon, 29 Nov 2010 14:03:35 +1100 Stephen Rothwell wrote: Hi all, Changes since 20101126: on i386 builds, I get tons of these (and more) errors: arch/x86/crypto/aesni-intel_asm.S:841: Error: bad register name `%r12' arch/x86/crypto/aesni-intel_asm.S:842: Error: bad register name `%r13' arch/x86/crypto/aesni-intel_asm.S:843: Error: bad register name `%r14' arch/x86/crypto/aesni-intel_asm.S:844: Error: bad register name `%rsp' arch/x86/crypto/aesni-intel_asm.S:849: Error: bad register name `%rsp' arch/x86/crypto/aesni-intel_asm.S:850: Error: bad register name `%rsp' arch/x86/crypto/aesni-intel_asm.S:851: Error: bad register name `%r9' even though the kernel .config file says: CONFIG_CRYPTO_AES=m CONFIG_CRYPTO_AES_586=m CONFIG_CRYPTO_AES_NI_INTEL=m Should arch/x86/crypto/aesni-intel_asm.S be testing #ifdef CONFIG_X86_64 instead of #ifdef __x86_64__ or does that not matter? or is this a toolchain issue? Well, __x86_64__ should be a build-in define of the compiler while CONFIG_X86_64 is defined for 64 bit builds in include/generated/autoconf.h. So by using the latter we should be on the safe side but if your compiler defines __x86_64__ for 32-bit builds it's simply broken. Also git grep showed quite a few more places using __x86_64__ so those would miscompile on your toolchain, too. But it looks like linux-next is just missing 559ad0ff1368baea14dbc3207d55b02bd69bda4b from Herbert's git repo at git://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git. That should fix the build issue. The build problem still happens when that patch is applied. That's weird. So it must be something with your toolchain. Can you please post the output of the following commands?: $ touch /tmp/null.c; cc -m32 -dD -E /tmp/null.c | grep -E 'x86|i.86' #define __i386 1 #define __i386__ 1 #define i386 1 #define __i586 1 #define __i586__ 1 $ touch /tmp/null.c; cc -m64 -dD -E /tmp/null.c | grep -E 'x86|i.86' #define __x86_64 1 #define __x86_64__ 1 So that's not the problem... and the patch below didn't help. That's odd. The output of the commands looks good so the x86-64 specific code should be left out for 32-bit builds. :/ Sorry that I even asked about that. What next? Can you please post the full error message. Meanwhile I'm checking out a linux-next tree, trying to reproduce your problem. I just built with make V=1 to see the full commands that are used, but that didn't help me either: gcc -Wp,-MD,arch/x86/crypto/.aesni-intel_asm.o.d -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/4.4.1/include -I/lnx/src/NEXT/linux-next-20101129/arch/x86/include -Iinclude -I/lnx/src/NEXT/linux-next-20101129/include -include include/generated/autoconf.h -D__KERNEL__ -D__ASSEMBLY__ -m32 -DCONFIG_AS_CFI=1 -DCONFIG_AS_CFI_SIGNAL_FRAME=1 -DMODULE -c -o arch/x86/crypto/aesni-intel_asm.o /lnx/src/NEXT/linux-next-20101129/arch/x86/crypto/aesni-intel_asm.S There are 2945 lines like this: linux-next-20101129/arch/x86/crypto/aesni-intel_asm.S:841: Error: bad register name `%r12' Well, in my tree (linux-next + 559ad0ff) line 841 is a comment. Albeit without 559ad0ff it's a 'push %r12'. So maybe you should apply the patch just once more to be sure. ;) It's around 311 KB, so I'll just put it here instead of emailing it: http://oss.oracle.com/~rdunlap/doc/cry32.out -- ~Randy *** Remember to use Documentation/SubmitChecklist when testing your code *** -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: linux-next: Tree for November 29 (aesni-intel)
On 29.11.2010, 21:11 Randy Dunlap wrote: On 11/29/10 12:02, Mathias Krause wrote: On 29.11.2010, 20:54 Randy Dunlap wrote: On 11/29/10 11:45, Mathias Krause wrote: On 29.11.2010, 20:31 Randy Dunlap wrote: On 11/29/10 11:21, Mathias Krause wrote: On 29.11.2010, 19:54 Randy Dunlap wrote: On 11/29/10 10:26, Mathias Krause wrote: On 29.11.2010, 17:31 Randy Dunlap wrote: On Mon, 29 Nov 2010 14:03:35 +1100 Stephen Rothwell wrote: Hi all, Changes since 20101126: on i386 builds, I get tons of these (and more) errors: arch/x86/crypto/aesni-intel_asm.S:841: Error: bad register name `%r12' arch/x86/crypto/aesni-intel_asm.S:842: Error: bad register name `%r13' arch/x86/crypto/aesni-intel_asm.S:843: Error: bad register name `%r14' arch/x86/crypto/aesni-intel_asm.S:844: Error: bad register name `%rsp' arch/x86/crypto/aesni-intel_asm.S:849: Error: bad register name `%rsp' arch/x86/crypto/aesni-intel_asm.S:850: Error: bad register name `%rsp' arch/x86/crypto/aesni-intel_asm.S:851: Error: bad register name `%r9' even though the kernel .config file says: CONFIG_CRYPTO_AES=m CONFIG_CRYPTO_AES_586=m CONFIG_CRYPTO_AES_NI_INTEL=m Should arch/x86/crypto/aesni-intel_asm.S be testing #ifdef CONFIG_X86_64 instead of #ifdef __x86_64__ or does that not matter? or is this a toolchain issue? Well, __x86_64__ should be a build-in define of the compiler while CONFIG_X86_64 is defined for 64 bit builds in include/generated/autoconf.h. So by using the latter we should be on the safe side but if your compiler defines __x86_64__ for 32-bit builds it's simply broken. Also git grep showed quite a few more places using __x86_64__ so those would miscompile on your toolchain, too. But it looks like linux-next is just missing 559ad0ff1368baea14dbc3207d55b02bd69bda4b from Herbert's git repo at git://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git. That should fix the build issue. The build problem still happens when that patch is applied. That's weird. So it must be something with your toolchain. Can you please post the output of the following commands?: $ touch /tmp/null.c; cc -m32 -dD -E /tmp/null.c | grep -E 'x86|i.86' #define __i386 1 #define __i386__ 1 #define i386 1 #define __i586 1 #define __i586__ 1 $ touch /tmp/null.c; cc -m64 -dD -E /tmp/null.c | grep -E 'x86|i.86' #define __x86_64 1 #define __x86_64__ 1 So that's not the problem... and the patch below didn't help. That's odd. The output of the commands looks good so the x86-64 specific code should be left out for 32-bit builds. :/ Sorry that I even asked about that. What next? Can you please post the full error message. Meanwhile I'm checking out a linux-next tree, trying to reproduce your problem. I just built with make V=1 to see the full commands that are used, but that didn't help me either: gcc -Wp,-MD,arch/x86/crypto/.aesni-intel_asm.o.d -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/4.4.1/include -I/lnx/src/NEXT/linux-next-20101129/arch/x86/include -Iinclude -I/lnx/src/NEXT/linux-next-20101129/include -include include/generated/autoconf.h -D__KERNEL__ -D__ASSEMBLY__ -m32 -DCONFIG_AS_CFI=1 -DCONFIG_AS_CFI_SIGNAL_FRAME=1 -DMODULE -c -o arch/x86/crypto/aesni-intel_asm.o /lnx/src/NEXT/linux-next-20101129/arch/x86/crypto/aesni-intel_asm.S There are 2945 lines like this: linux-next-20101129/arch/x86/crypto/aesni-intel_asm.S:841: Error: bad register name `%r12' Well, in my tree (linux-next + 559ad0ff) line 841 is a comment. Albeit without 559ad0ff it's a 'push %r12'. So maybe you should apply the patch just once more to be sure. ;) Touche. What does that patch have to do with aesni-intel?? The description should be clear enough: crypto: aesni-intel - Fixed build error on x86-32. Here is the link to the patch: http://git.kernel.org/?p=linux/kernel/git/herbert/cryptodev-2.6.git;a=patch;h=559ad0ff1368baea14dbc3207d55b02bd69bda4b. Please apply it on top of your linux-next build. I'm using the linux-next tarball of 2029. However, your s/__x86_64__/CONFIG_X86_64/ patch was applied, so I dropped it. Well I doubt it. The patch was made on top of 559ad0ff so it should have failed to apply in your tree since obviously 559ad0ff is missing. new output file: http://oss.oracle.com/~rdunlap/doc/cry4.out Same bug: 559ad0ff is still missing. Please apply the patch from the link above. -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: linux-next: Tree for November 29 (aesni-intel)
On 29.11.2010, 21:37 Randy Dunlap wrote: On 11/29/10 12:21, Mathias Krause wrote: On 29.11.2010, 21:11 Randy Dunlap wrote: On 11/29/10 12:02, Mathias Krause wrote: On 29.11.2010, 20:54 Randy Dunlap wrote: On 11/29/10 11:45, Mathias Krause wrote: On 29.11.2010, 20:31 Randy Dunlap wrote: On 11/29/10 11:21, Mathias Krause wrote: On 29.11.2010, 19:54 Randy Dunlap wrote: On 11/29/10 10:26, Mathias Krause wrote: On 29.11.2010, 17:31 Randy Dunlap wrote: On Mon, 29 Nov 2010 14:03:35 +1100 Stephen Rothwell wrote: Hi all, Changes since 20101126: on i386 builds, I get tons of these (and more) errors: arch/x86/crypto/aesni-intel_asm.S:841: Error: bad register name `%r12' arch/x86/crypto/aesni-intel_asm.S:842: Error: bad register name `%r13' arch/x86/crypto/aesni-intel_asm.S:843: Error: bad register name `%r14' arch/x86/crypto/aesni-intel_asm.S:844: Error: bad register name `%rsp' arch/x86/crypto/aesni-intel_asm.S:849: Error: bad register name `%rsp' arch/x86/crypto/aesni-intel_asm.S:850: Error: bad register name `%rsp' arch/x86/crypto/aesni-intel_asm.S:851: Error: bad register name `%r9' even though the kernel .config file says: CONFIG_CRYPTO_AES=m CONFIG_CRYPTO_AES_586=m CONFIG_CRYPTO_AES_NI_INTEL=m Should arch/x86/crypto/aesni-intel_asm.S be testing #ifdef CONFIG_X86_64 instead of #ifdef __x86_64__ or does that not matter? or is this a toolchain issue? Well, __x86_64__ should be a build-in define of the compiler while CONFIG_X86_64 is defined for 64 bit builds in include/generated/autoconf.h. So by using the latter we should be on the safe side but if your compiler defines __x86_64__ for 32-bit builds it's simply broken. Also git grep showed quite a few more places using __x86_64__ so those would miscompile on your toolchain, too. But it looks like linux-next is just missing 559ad0ff1368baea14dbc3207d55b02bd69bda4b from Herbert's git repo at git://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git. That should fix the build issue. The build problem still happens when that patch is applied. That's weird. So it must be something with your toolchain. Can you please post the output of the following commands?: $ touch /tmp/null.c; cc -m32 -dD -E /tmp/null.c | grep -E 'x86|i.86' #define __i386 1 #define __i386__ 1 #define i386 1 #define __i586 1 #define __i586__ 1 $ touch /tmp/null.c; cc -m64 -dD -E /tmp/null.c | grep -E 'x86|i.86' #define __x86_64 1 #define __x86_64__ 1 So that's not the problem... and the patch below didn't help. That's odd. The output of the commands looks good so the x86-64 specific code should be left out for 32-bit builds. :/ Sorry that I even asked about that. What next? Can you please post the full error message. Meanwhile I'm checking out a linux-next tree, trying to reproduce your problem. I just built with make V=1 to see the full commands that are used, but that didn't help me either: gcc -Wp,-MD,arch/x86/crypto/.aesni-intel_asm.o.d -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/4.4.1/include -I/lnx/src/NEXT/linux-next-20101129/arch/x86/include -Iinclude -I/lnx/src/NEXT/linux-next-20101129/include -include include/generated/autoconf.h -D__KERNEL__ -D__ASSEMBLY__ -m32 -DCONFIG_AS_CFI=1 -DCONFIG_AS_CFI_SIGNAL_FRAME=1 -DMODULE -c -o arch/x86/crypto/aesni-intel_asm.o /lnx/src/NEXT/linux-next-20101129/arch/x86/crypto/aesni-intel_asm.S There are 2945 lines like this: linux-next-20101129/arch/x86/crypto/aesni-intel_asm.S:841: Error: bad register name `%r12' Well, in my tree (linux-next + 559ad0ff) line 841 is a comment. Albeit without 559ad0ff it's a 'push %r12'. So maybe you should apply the patch just once more to be sure. ;) Touche. What does that patch have to do with aesni-intel?? The description should be clear enough: crypto: aesni-intel - Fixed build error on x86-32. Here is the link to the patch: http://git.kernel.org/?p=linux/kernel/git/herbert/cryptodev-2.6.git;a=patch;h=559ad0ff1368baea14dbc3207d55b02bd69bda4b. Please apply it on top of your linux-next build. I'm using the linux-next tarball of 2029. However, your s/__x86_64__/CONFIG_X86_64/ patch was applied, so I dropped it. Well I doubt it. The patch was made on top of 559ad0ff so it should have failed to apply in your tree since obviously 559ad0ff is missing. new output file: http://oss.oracle.com/~rdunlap/doc/cry4.out Same bug: 559ad0ff is still missing. Please apply the patch from the link above. Thanks for persisting/continuing with me. I apologize, I had applied the most recent patch in Herbert's cryptodev repo, not the one that you referred me to. Yes, the build is now fixed. Great! Have fun with your new AESNI-accelerated crypto :) -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http
[PATCH] crypto: aesni-intel - Fixed build error on x86-32
Herbert, thanks for merge but the AES-GCM code merged meanwhile made the x86-32 bit version break on build. The following patch fixes this: Exclude AES-GCM code for x86-32 due to heavy usage of 64-bit registers not available on x86-32. While at it, fixed unregister order in aesni_exit(). Signed-off-by: Mathias Krause mini...@googlemail.com --- arch/x86/crypto/aesni-intel_asm.S |5 - arch/x86/crypto/aesni-intel_glue.c | 26 +- 2 files changed, 17 insertions(+), 14 deletions(-) diff --git a/arch/x86/crypto/aesni-intel_asm.S b/arch/x86/crypto/aesni-intel_asm.S index f592e03..d528fde 100644 --- a/arch/x86/crypto/aesni-intel_asm.S +++ b/arch/x86/crypto/aesni-intel_asm.S @@ -32,6 +32,7 @@ #include linux/linkage.h #include asm/inst.h +#ifdef __x86_64__ .data POLY: .octa 0xC201 TWOONE: .octa 0x00010001 @@ -84,6 +85,7 @@ enc:.octa 0x2 #define arg8 STACK_OFFSET+16(%r14) #define arg9 STACK_OFFSET+24(%r14) #define arg10 STACK_OFFSET+32(%r14) +#endif #define STATE1 %xmm0 @@ -130,6 +132,7 @@ enc:.octa 0x2 #endif +#ifdef __x86_64__ /* GHASH_MUL MACRO to implement: Data*HashKey mod (128,127,126,121,0) * * @@ -1255,7 +1258,7 @@ _return_T_done_encrypt: pop %r13 pop %r12 ret - +#endif _key_expansion_128: diff --git a/arch/x86/crypto/aesni-intel_glue.c b/arch/x86/crypto/aesni-intel_glue.c index 8a3b800..0f2c3c6 100644 --- a/arch/x86/crypto/aesni-intel_glue.c +++ b/arch/x86/crypto/aesni-intel_glue.c @@ -97,7 +97,6 @@ asmlinkage void aesni_cbc_dec(struct crypto_aes_ctx *ctx, u8 *out, #ifdef CONFIG_X86_64 asmlinkage void aesni_ctr_enc(struct crypto_aes_ctx *ctx, u8 *out, const u8 *in, unsigned int len, u8 *iv); -#endif /* asmlinkage void aesni_gcm_enc() * void *ctx, AES Key schedule. Starts on a 16 byte boundary. @@ -149,6 +148,7 @@ aesni_rfc4106_gcm_ctx *aesni_rfc4106_gcm_ctx_get(struct crypto_aead *tfm) PTR_ALIGN((u8 *) crypto_tfm_ctx(crypto_aead_tfm(tfm)), AESNI_ALIGN); } +#endif static inline struct crypto_aes_ctx *aes_ctx(void *raw_ctx) { @@ -822,6 +822,7 @@ static struct crypto_alg ablk_xts_alg = { }; #endif +#ifdef CONFIG_X86_64 static int rfc4106_init(struct crypto_tfm *tfm) { struct cryptd_aead *cryptd_tfm; @@ -1237,6 +1238,7 @@ static struct crypto_alg __rfc4106_alg = { }, }, }; +#endif static int __init aesni_init(void) { @@ -1264,6 +1266,10 @@ static int __init aesni_init(void) goto blk_ctr_err; if ((err = crypto_register_alg(ablk_ctr_alg))) goto ablk_ctr_err; + if ((err = crypto_register_alg(__rfc4106_alg)) + goto __aead_gcm_err; + if ((err = crypto_register_alg(rfc4106_alg)) + goto aead_gcm_err; #ifdef HAS_CTR if ((err = crypto_register_alg(ablk_rfc3686_ctr_alg))) goto ablk_rfc3686_ctr_err; @@ -1281,19 +1287,9 @@ static int __init aesni_init(void) if ((err = crypto_register_alg(ablk_xts_alg))) goto ablk_xts_err; #endif - err = crypto_register_alg(__rfc4106_alg); - if (err) - goto __aead_gcm_err; - err = crypto_register_alg(rfc4106_alg); - if (err) - goto aead_gcm_err; return err; -aead_gcm_err: - crypto_unregister_alg(__rfc4106_alg); -__aead_gcm_err: #ifdef HAS_XTS - crypto_unregister_alg(ablk_xts_alg); ablk_xts_err: #endif #ifdef HAS_PCBC @@ -1309,6 +1305,10 @@ ablk_lrw_err: crypto_unregister_alg(ablk_rfc3686_ctr_alg); ablk_rfc3686_ctr_err: #endif + crypto_unregister_alg(rfc4106_alg); +aead_gcm_err: + crypto_unregister_alg(__rfc4106_alg); +__aead_gcm_err: crypto_unregister_alg(ablk_ctr_alg); ablk_ctr_err: crypto_unregister_alg(blk_ctr_alg); @@ -1331,8 +1331,6 @@ aes_err: static void __exit aesni_exit(void) { - crypto_unregister_alg(__rfc4106_alg); - crypto_unregister_alg(rfc4106_alg); #ifdef HAS_XTS crypto_unregister_alg(ablk_xts_alg); #endif @@ -1346,6 +1344,8 @@ static void __exit aesni_exit(void) #ifdef HAS_CTR crypto_unregister_alg(ablk_rfc3686_ctr_alg); #endif + crypto_unregister_alg(rfc4106_alg); + crypto_unregister_alg(__rfc4106_alg); crypto_unregister_alg(ablk_ctr_alg); crypto_unregister_alg(blk_ctr_alg); #endif -- 1.5.6.5 -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] crypto: aesni-intel - Fixed build error on x86-32
Herbert, thanks for merge but the AES-GCM code merged meanwhile made the x86-32 bit version break on build. The following patch fixes this (now compile tested on x86-64, too): Exclude AES-GCM code for x86-32 due to heavy usage of 64-bit registers not available on x86-32. While at it, fixed unregister order in aesni_exit(). Signed-off-by: Mathias Krause mini...@googlemail.com --- arch/x86/crypto/aesni-intel_asm.S |5 - arch/x86/crypto/aesni-intel_glue.c | 26 +- 2 files changed, 17 insertions(+), 14 deletions(-) diff --git a/arch/x86/crypto/aesni-intel_asm.S b/arch/x86/crypto/aesni-intel_asm.S index f592e03..d528fde 100644 --- a/arch/x86/crypto/aesni-intel_asm.S +++ b/arch/x86/crypto/aesni-intel_asm.S @@ -32,6 +32,7 @@ #include linux/linkage.h #include asm/inst.h +#ifdef __x86_64__ .data POLY: .octa 0xC201 TWOONE: .octa 0x00010001 @@ -84,6 +85,7 @@ enc:.octa 0x2 #define arg8 STACK_OFFSET+16(%r14) #define arg9 STACK_OFFSET+24(%r14) #define arg10 STACK_OFFSET+32(%r14) +#endif #define STATE1 %xmm0 @@ -130,6 +132,7 @@ enc:.octa 0x2 #endif +#ifdef __x86_64__ /* GHASH_MUL MACRO to implement: Data*HashKey mod (128,127,126,121,0) * * @@ -1255,7 +1258,7 @@ _return_T_done_encrypt: pop %r13 pop %r12 ret - +#endif _key_expansion_128: diff --git a/arch/x86/crypto/aesni-intel_glue.c b/arch/x86/crypto/aesni-intel_glue.c index 8a3b800..e1e60c7 100644 --- a/arch/x86/crypto/aesni-intel_glue.c +++ b/arch/x86/crypto/aesni-intel_glue.c @@ -97,7 +97,6 @@ asmlinkage void aesni_cbc_dec(struct crypto_aes_ctx *ctx, u8 *out, #ifdef CONFIG_X86_64 asmlinkage void aesni_ctr_enc(struct crypto_aes_ctx *ctx, u8 *out, const u8 *in, unsigned int len, u8 *iv); -#endif /* asmlinkage void aesni_gcm_enc() * void *ctx, AES Key schedule. Starts on a 16 byte boundary. @@ -149,6 +148,7 @@ aesni_rfc4106_gcm_ctx *aesni_rfc4106_gcm_ctx_get(struct crypto_aead *tfm) PTR_ALIGN((u8 *) crypto_tfm_ctx(crypto_aead_tfm(tfm)), AESNI_ALIGN); } +#endif static inline struct crypto_aes_ctx *aes_ctx(void *raw_ctx) { @@ -822,6 +822,7 @@ static struct crypto_alg ablk_xts_alg = { }; #endif +#ifdef CONFIG_X86_64 static int rfc4106_init(struct crypto_tfm *tfm) { struct cryptd_aead *cryptd_tfm; @@ -1237,6 +1238,7 @@ static struct crypto_alg __rfc4106_alg = { }, }, }; +#endif static int __init aesni_init(void) { @@ -1264,6 +1266,10 @@ static int __init aesni_init(void) goto blk_ctr_err; if ((err = crypto_register_alg(ablk_ctr_alg))) goto ablk_ctr_err; + if ((err = crypto_register_alg(__rfc4106_alg))) + goto __aead_gcm_err; + if ((err = crypto_register_alg(rfc4106_alg))) + goto aead_gcm_err; #ifdef HAS_CTR if ((err = crypto_register_alg(ablk_rfc3686_ctr_alg))) goto ablk_rfc3686_ctr_err; @@ -1281,19 +1287,9 @@ static int __init aesni_init(void) if ((err = crypto_register_alg(ablk_xts_alg))) goto ablk_xts_err; #endif - err = crypto_register_alg(__rfc4106_alg); - if (err) - goto __aead_gcm_err; - err = crypto_register_alg(rfc4106_alg); - if (err) - goto aead_gcm_err; return err; -aead_gcm_err: - crypto_unregister_alg(__rfc4106_alg); -__aead_gcm_err: #ifdef HAS_XTS - crypto_unregister_alg(ablk_xts_alg); ablk_xts_err: #endif #ifdef HAS_PCBC @@ -1309,6 +1305,10 @@ ablk_lrw_err: crypto_unregister_alg(ablk_rfc3686_ctr_alg); ablk_rfc3686_ctr_err: #endif + crypto_unregister_alg(rfc4106_alg); +aead_gcm_err: + crypto_unregister_alg(__rfc4106_alg); +__aead_gcm_err: crypto_unregister_alg(ablk_ctr_alg); ablk_ctr_err: crypto_unregister_alg(blk_ctr_alg); @@ -1331,8 +1331,6 @@ aes_err: static void __exit aesni_exit(void) { - crypto_unregister_alg(__rfc4106_alg); - crypto_unregister_alg(rfc4106_alg); #ifdef HAS_XTS crypto_unregister_alg(ablk_xts_alg); #endif @@ -1346,6 +1344,8 @@ static void __exit aesni_exit(void) #ifdef HAS_CTR crypto_unregister_alg(ablk_rfc3686_ctr_alg); #endif + crypto_unregister_alg(rfc4106_alg); + crypto_unregister_alg(__rfc4106_alg); crypto_unregister_alg(ablk_ctr_alg); crypto_unregister_alg(blk_ctr_alg); #endif -- 1.5.6.5 -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3] x86, crypto: ported aes-ni implementation to x86
On 13.11.2010, 00:25 Mathias Krause wrote: On 12.11.2010, 08:34 Huang Ying wrote: On Fri, 2010-11-12 at 15:30 +0800, Mathias Krause wrote: On 12.11.2010, 01:33 Huang Ying wrote: Why the improvement of ECB is so small? I can not understand it. It should be as big as CBC. I don't know why the ECB variant is so slow compared to the other variants. But it is so even for the current x86-64 version. See the above values for x86-64 (old). I setup dm-crypt for this test like this: # cryptsetup -c aes-ecb-plain -d /dev/urandom create cfs /dev/loop0 What where the numbers you measured in your tests while developing the x86-64 version? Can't remember the number. Do you have interest to dig into the issue? I looked at /proc/crypto while doing the tests again and noticed that ECB isn't handled using cryptd, while all other modes, e.g. CBC and CTR, are. The reason for that seems to be that for ECB, and only for ECB, the kernel is using the synchronous block algorithm instead of the asynchronous one. So the question is: Why is the ECB variant handled using the synchronous cipher -- because of the missing iv handling in this mode? Herbert, any idea why this is the case? Regards, Mathias -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4] x86, crypto: ported aes-ni implementation to x86
On 11.11.2010, 23:20 Mathias Krause wrote: The AES-NI instructions are also available in legacy mode so the 32-bit architecture may profit from those, too. To illustrate the performance gain here's a short summary of a dm-crypt speed test on a Core i7 M620 running at 2.67GHz comparing both assembler implementations: x86: i568 aes-nidelta ECB, 256 bit: 93.8 MB/s 123.3 MB/s +31.4% CBC, 256 bit: 84.8 MB/s 262.3 MB/s +209.3% LRW, 256 bit:108.6 MB/s 222.1 MB/s +104.5% XTS, 256 bit:105.0 MB/s 205.5 MB/s +95.7% Additionally, due to some minor optimizations, the 64-bit version also got a minor performance gain as seen below: x86-64: old impl.new impl.delta ECB, 256 bit:121.1 MB/s 123.0 MB/s+1.5% CBC, 256 bit:285.3 MB/s 290.8 MB/s+1.9% LRW, 256 bit:263.7 MB/s 265.3 MB/s+0.6% XTS, 256 bit:251.1 MB/s 255.3 MB/s+1.7% Signed-off-by: Mathias Krause mini...@googlemail.com --- v4 changes: * adapted CBC implementation to be useable on x86, too * redo the measurement using dm-crypt v3 changes: * fixed 32-bit implementation of aesni_ecb_enc (a hunk somehow moved to the end of another function) v2 changes: * hide almost all register names in macros so the same code base can be shared between x86 and x86_64 * unified Kconfig documentation again * added alignment constraints for internal functions. arch/x86/crypto/aesni-intel_asm.S | 197 ++-- arch/x86/crypto/aesni-intel_glue.c | 22 +++- crypto/Kconfig | 12 ++- 3 files changed, 191 insertions(+), 40 deletions(-) No comments so far? :( What's wrong with the patch? Regards, Mathias-- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3] x86, crypto: ported aes-ni implementation to x86
On 12.11.2010, 08:34 Huang Ying wrote: On Fri, 2010-11-12 at 15:30 +0800, Mathias Krause wrote: On 12.11.2010, 01:33 Huang Ying wrote: Why the improvement of ECB is so small? I can not understand it. It should be as big as CBC. I don't know why the ECB variant is so slow compared to the other variants. But it is so even for the current x86-64 version. See the above values for x86-64 (old). I setup dm-crypt for this test like this: # cryptsetup -c aes-ecb-plain -d /dev/urandom create cfs /dev/loop0 What where the numbers you measured in your tests while developing the x86-64 version? Can't remember the number. Do you have interest to dig into the issue? I looked at /proc/crypto while doing the tests again and noticed that ECB isn't handled using cryptd, while all other modes, e.g. CBC and CTR, are. The reason for that seems to be that for ECB, and only for ECB, the kernel is using the synchronous block algorithm instead of the asynchronous one. So the question is: Why is the ECB variant handled using the synchronous cipher -- because of the missing iv handling in this mode? Best regards, Mathias -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3] x86, crypto: ported aes-ni implementation to x86
Hello Huang Ying, On 03.11.2010, 23:27 Huang Ying wrote: On Wed, 2010-11-03 at 14:14 -0700, Mathias Krause wrote: The AES-NI instructions are also available in legacy mode so the 32-bit architecture may profit from those, too. To illustrate the performance gain here's a short summary of the tcrypt speed test on a Core i7 M620 running at 2.67GHz comparing both assembler implementations: x86: i568 aes-ni delta 256 bit, 8kB blocks, ECB: 125.94 MB/s 187.09 MB/s +48.6% Which method do you used for speed testing? modprobe tcrypt mode=200 sec=? That actually does not work very well for AES-NI. Because AES-NI blkcipher is tested in synchronous mode, and in that mode, kernel_fpu_begin/end() must be called for every block, and kernel_fpu_begin/end() is quite slow. At the same time, some further optimization for AES-NI can not be tested (such as ecb-aes-aesni driver) in that mode, because they are only available in asynchronous mode. When developing AES-NI for x86_64, I uses dm-crypt + AES-NI for speed testing, where AES-NI blkcipher will be tested in asynchronous mode, and kernel_fpu_begin/end() is called for every page. Can you use that to test? Or you can add test_acipher_speed (similar with test_ahash_speed) to test cipher in asynchronous mode. here are the numbers for dm-crypt. I run the test again on the Core i7 M620, 2.67GHz. During the test I noticed that not porting the CBC variant to x86 was a bad idea so I did that too and got pretty nice numbers (see v3 vs. v4 of the patch). All test were run five times in a row using a 256 bit key and doing i/o to the block device in chunks of 1MB. The numbers are MB/s. x86 (i586 variant): 1. run 2. run 3. run 4. run 5. runmean ECB: 93.993.994.093.593.893.8 CBC: 84.984.884.984.984.884.8 XTS: 108.2 108.3 109.6 108.3 108.9 108.6 LRW: 105.0 105.0 105.1 105.1 105.1 105.0 x86 (AES-NI), v3 of the patch: 1. run 2. run 3. run 4. run 5. runmean ECB: 124.8 120.8 124.5 120.6 124.5 123.0 CBC: 112.6 109.6 112.6 110.7 109.4 110.9 XTS: 221.6 221.1 220.9 223.5 224.4 222.3 LRW: 206.2 209.7 207.4 203.7 209.3 207.2 x86 (AES-NI), v4 of the patch: 1. run 2. run 3. run 4. run 5. runmean ECB: 122.5 121.2 121.6 125.7 125.5 123.3 CBC: 259.5 259.2 261.2 264.0 267.6 262.3 XTS: 225.1 230.7 220.6 217.9 216.3 222.1 LRW: 202.7 202.8 210.6 208.9 202.7 205.5 Comparing the values for the CBC variant between v3 and v4 of the patch shows that porting the CBC variant to x86 more then doubled the performance so the little bit ugly #ifdefed code is worth the effort. x86-64 (old): 1. run 2. run 3. run 4. run 5. runmean ECB: 121.4 120.9 121.1 121.2 120.9 121.1 CBC: 282.5 286.3 281.5 282.0 294.5 285.3 XTS: 263.6 260.3 263.0 267.0 264.6 263.7 LRW: 249.6 249.8 250.5 253.4 252.2 251.1 x86-64 (new): 1. run 2. run 3. run 4. run 5. runmean ECB: 122.1 122.0 122.0 127.0 121.9 123.0 CBC: 291.2 286.2 295.6 291.4 289.9 290.8 XTS: 263.3 264.4 264.5 264.2 270.4 265.3 LRW: 254.9 252.3 253.6 258.2 257.5 255.3 Comparing the mean values gives us: x86: i586 aes-nidelta ECB: 93.8123.3 +31.4% CBC: 84.8262.3 +209.3% LRW:108.6222.1 +104.5% XTS:105.0205.5 +95.7% x86-64: old newdelta ECB:121.1123.0+1.5% CBC:285.3290.8+1.9% LRW:263.7265.3+0.6% XTS:251.1255.3+1.7% The improvement for the old vs. the new x86-64 version is not as drastically as for the synchronous variant (see the tcrypt tests in the previous email), but nevertheless an improvement. The improvement for the x86 case, albeit, should be noticeable. It's almost as fast as the x86-64 version. I'll post the new version of the patch in a follow-up email. Regards, Mathias -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4] x86, crypto: ported aes-ni implementation to x86
The AES-NI instructions are also available in legacy mode so the 32-bit architecture may profit from those, too. To illustrate the performance gain here's a short summary of a dm-crypt speed test on a Core i7 M620 running at 2.67GHz comparing both assembler implementations: x86: i568 aes-nidelta ECB, 256 bit: 93.8 MB/s 123.3 MB/s +31.4% CBC, 256 bit: 84.8 MB/s 262.3 MB/s +209.3% LRW, 256 bit:108.6 MB/s 222.1 MB/s +104.5% XTS, 256 bit:105.0 MB/s 205.5 MB/s +95.7% Additionally, due to some minor optimizations, the 64-bit version also got a minor performance gain as seen below: x86-64: old impl.new impl.delta ECB, 256 bit:121.1 MB/s 123.0 MB/s+1.5% CBC, 256 bit:285.3 MB/s 290.8 MB/s+1.9% LRW, 256 bit:263.7 MB/s 265.3 MB/s+0.6% XTS, 256 bit:251.1 MB/s 255.3 MB/s+1.7% Signed-off-by: Mathias Krause mini...@googlemail.com --- v4 changes: * adapted CBC implementation to be useable on x86, too * redo the measurement using dm-crypt v3 changes: * fixed 32-bit implementation of aesni_ecb_enc (a hunk somehow moved to the end of another function) v2 changes: * hide almost all register names in macros so the same code base can be shared between x86 and x86_64 * unified Kconfig documentation again * added alignment constraints for internal functions. arch/x86/crypto/aesni-intel_asm.S | 197 ++-- arch/x86/crypto/aesni-intel_glue.c | 22 +++- crypto/Kconfig | 12 ++- 3 files changed, 191 insertions(+), 40 deletions(-) diff --git a/arch/x86/crypto/aesni-intel_asm.S b/arch/x86/crypto/aesni-intel_asm.S index ff16756..74626fa 100644 --- a/arch/x86/crypto/aesni-intel_asm.S +++ b/arch/x86/crypto/aesni-intel_asm.S @@ -9,6 +9,9 @@ *Vinodh Gopal vinodh.go...@intel.com *Kahraman Akdemir * + * Ported x86_64 version to x86: + *Author: Mathias Krause mini...@googlemail.com + * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or @@ -32,12 +35,16 @@ #define IN IN1 #define KEY%xmm2 #define IV %xmm3 + #define BSWAP_MASK %xmm10 #define CTR%xmm11 #define INC%xmm12 +#ifdef __x86_64__ +#define AREG %rax #define KEYP %rdi #define OUTP %rsi +#define UKEYP OUTP #define INP%rdx #define LEN%rcx #define IVP%r8 @@ -46,6 +53,18 @@ #define TKEYP T1 #define T2 %r11 #define TCTR_LOW T2 +#else +#define AREG %eax +#define KEYP %edi +#define OUTP AREG +#define UKEYP OUTP +#define INP%edx +#define LEN%esi +#define IVP%ebp +#define KLEN %ebx +#define T1 %ecx +#define TKEYP T1 +#endif _key_expansion_128: _key_expansion_256a: @@ -55,10 +74,11 @@ _key_expansion_256a: shufps $0b10001100, %xmm0, %xmm4 pxor %xmm4, %xmm0 pxor %xmm1, %xmm0 - movaps %xmm0, (%rcx) - add $0x10, %rcx + movaps %xmm0, (TKEYP) + add $0x10, TKEYP ret +.align 4 _key_expansion_192a: pshufd $0b01010101, %xmm1, %xmm1 shufps $0b0001, %xmm0, %xmm4 @@ -76,12 +96,13 @@ _key_expansion_192a: movaps %xmm0, %xmm1 shufps $0b01000100, %xmm0, %xmm6 - movaps %xmm6, (%rcx) + movaps %xmm6, (TKEYP) shufps $0b01001110, %xmm2, %xmm1 - movaps %xmm1, 16(%rcx) - add $0x20, %rcx + movaps %xmm1, 0x10(TKEYP) + add $0x20, TKEYP ret +.align 4 _key_expansion_192b: pshufd $0b01010101, %xmm1, %xmm1 shufps $0b0001, %xmm0, %xmm4 @@ -96,10 +117,11 @@ _key_expansion_192b: pxor %xmm3, %xmm2 pxor %xmm5, %xmm2 - movaps %xmm0, (%rcx) - add $0x10, %rcx + movaps %xmm0, (TKEYP) + add $0x10, TKEYP ret +.align 4 _key_expansion_256b: pshufd $0b10101010, %xmm1, %xmm1 shufps $0b0001, %xmm2, %xmm4 @@ -107,8 +129,8 @@ _key_expansion_256b: shufps $0b10001100, %xmm2, %xmm4 pxor %xmm4, %xmm2 pxor %xmm1, %xmm2 - movaps %xmm2, (%rcx) - add $0x10, %rcx + movaps %xmm2, (TKEYP) + add $0x10, TKEYP ret /* @@ -116,17 +138,23 @@ _key_expansion_256b: * unsigned int key_len) */ ENTRY(aesni_set_key) - movups (%rsi), %xmm0# user key (first 16 bytes) - movaps %xmm0, (%rdi) - lea 0x10(%rdi), %rcx# key addr - movl %edx, 480(%rdi) +#ifndef __x86_64__ + pushl KEYP + movl 8(%esp), KEYP # ctx + movl 12(%esp), UKEYP# in_key + movl 16(%esp), %edx # key_len +#endif + movups (UKEYP), %xmm0 # user key (first 16 bytes) + movaps %xmm0, (KEYP) + lea 0x10(KEYP), TKEYP # key addr + movl %edx, 480(KEYP) pxor %xmm4, %xmm4
Re: [PATCH v3] x86, crypto: ported aes-ni implementation to x86
On 12.11.2010, 01:33 Huang Ying wrote: Hi, Mathias, On Fri, 2010-11-12 at 06:18 +0800, Mathias Krause wrote: All test were run five times in a row using a 256 bit key and doing i/o to the block device in chunks of 1MB. The numbers are MB/s. x86 (i586 variant): 1. run 2. run 3. run 4. run 5. runmean ECB: 93.993.994.093.593.893.8 CBC: 84.984.884.984.984.884.8 XTS: 108.2 108.3 109.6 108.3 108.9 108.6 LRW: 105.0 105.0 105.1 105.1 105.1 105.0 x86 (AES-NI), v3 of the patch: 1. run 2. run 3. run 4. run 5. runmean ECB: 124.8 120.8 124.5 120.6 124.5 123.0 CBC: 112.6 109.6 112.6 110.7 109.4 110.9 XTS: 221.6 221.1 220.9 223.5 224.4 222.3 LRW: 206.2 209.7 207.4 203.7 209.3 207.2 x86 (AES-NI), v4 of the patch: 1. run 2. run 3. run 4. run 5. runmean ECB: 122.5 121.2 121.6 125.7 125.5 123.3 CBC: 259.5 259.2 261.2 264.0 267.6 262.3 XTS: 225.1 230.7 220.6 217.9 216.3 222.1 LRW: 202.7 202.8 210.6 208.9 202.7 205.5 Comparing the values for the CBC variant between v3 and v4 of the patch shows that porting the CBC variant to x86 more then doubled the performance so the little bit ugly #ifdefed code is worth the effort. x86-64 (old): 1. run 2. run 3. run 4. run 5. runmean ECB: 121.4 120.9 121.1 121.2 120.9 121.1 CBC: 282.5 286.3 281.5 282.0 294.5 285.3 XTS: 263.6 260.3 263.0 267.0 264.6 263.7 LRW: 249.6 249.8 250.5 253.4 252.2 251.1 x86-64 (new): 1. run 2. run 3. run 4. run 5. runmean ECB: 122.1 122.0 122.0 127.0 121.9 123.0 CBC: 291.2 286.2 295.6 291.4 289.9 290.8 XTS: 263.3 264.4 264.5 264.2 270.4 265.3 LRW: 254.9 252.3 253.6 258.2 257.5 255.3 Comparing the mean values gives us: x86: i586 aes-nidelta ECB: 93.8123.3 +31.4% Why the improvement of ECB is so small? I can not understand it. It should be as big as CBC. I don't know why the ECB variant is so slow compared to the other variants. But it is so even for the current x86-64 version. See the above values for x86-64 (old). I setup dm-crypt for this test like this: # cryptsetup -c aes-ecb-plain -d /dev/urandom create cfs /dev/loop0 What where the numbers you measured in your tests while developing the x86-64 version? Best regards, Mathias Best Regards, Huang Ying CBC: 84.8262.3 +209.3% LRW:108.6222.1 +104.5% XTS:105.0205.5 +95.7% x86-64: old newdelta ECB:121.1123.0+1.5% CBC:285.3290.8+1.9% LRW:263.7265.3+0.6% XTS:251.1255.3+1.7% The improvement for the old vs. the new x86-64 version is not as drastically as for the synchronous variant (see the tcrypt tests in the previous email), but nevertheless an improvement. The improvement for the x86 case, albeit, should be noticeable. It's almost as fast as the x86-64 version. I'll post the new version of the patch in a follow-up email. Regards, Mathias -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3] x86, crypto: ported aes-ni implementation to x86
On 12.11.2010, 08:34 Huang Ying wrote: On Fri, 2010-11-12 at 15:30 +0800, Mathias Krause wrote: On 12.11.2010, 01:33 Huang Ying wrote: Hi, Mathias, On Fri, 2010-11-12 at 06:18 +0800, Mathias Krause wrote: All test were run five times in a row using a 256 bit key and doing i/o to the block device in chunks of 1MB. The numbers are MB/s. x86 (i586 variant): 1. run 2. run 3. run 4. run 5. runmean ECB: 93.993.994.093.593.893.8 CBC: 84.984.884.984.984.884.8 XTS: 108.2 108.3 109.6 108.3 108.9 108.6 LRW: 105.0 105.0 105.1 105.1 105.1 105.0 x86 (AES-NI), v3 of the patch: 1. run 2. run 3. run 4. run 5. runmean ECB: 124.8 120.8 124.5 120.6 124.5 123.0 CBC: 112.6 109.6 112.6 110.7 109.4 110.9 XTS: 221.6 221.1 220.9 223.5 224.4 222.3 LRW: 206.2 209.7 207.4 203.7 209.3 207.2 x86 (AES-NI), v4 of the patch: 1. run 2. run 3. run 4. run 5. runmean ECB: 122.5 121.2 121.6 125.7 125.5 123.3 CBC: 259.5 259.2 261.2 264.0 267.6 262.3 XTS: 225.1 230.7 220.6 217.9 216.3 222.1 LRW: 202.7 202.8 210.6 208.9 202.7 205.5 Comparing the values for the CBC variant between v3 and v4 of the patch shows that porting the CBC variant to x86 more then doubled the performance so the little bit ugly #ifdefed code is worth the effort. x86-64 (old): 1. run 2. run 3. run 4. run 5. runmean ECB: 121.4 120.9 121.1 121.2 120.9 121.1 CBC: 282.5 286.3 281.5 282.0 294.5 285.3 XTS: 263.6 260.3 263.0 267.0 264.6 263.7 LRW: 249.6 249.8 250.5 253.4 252.2 251.1 x86-64 (new): 1. run 2. run 3. run 4. run 5. runmean ECB: 122.1 122.0 122.0 127.0 121.9 123.0 CBC: 291.2 286.2 295.6 291.4 289.9 290.8 XTS: 263.3 264.4 264.5 264.2 270.4 265.3 LRW: 254.9 252.3 253.6 258.2 257.5 255.3 Comparing the mean values gives us: x86: i586 aes-nidelta ECB: 93.8123.3 +31.4% Why the improvement of ECB is so small? I can not understand it. It should be as big as CBC. I don't know why the ECB variant is so slow compared to the other variants. But it is so even for the current x86-64 version. See the above values for x86-64 (old). I setup dm-crypt for this test like this: # cryptsetup -c aes-ecb-plain -d /dev/urandom create cfs /dev/loop0 What where the numbers you measured in your tests while developing the x86-64 version? Can't remember the number. Do you have interest to dig into the issue? Sure. Increasing performance is always a good thing to do. :) Best regards, Mathias-- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3] x86, crypto: ported aes-ni implementation to x86
On 03.11.2010, 23:27 Huang Ying wrote: On Wed, 2010-11-03 at 14:14 -0700, Mathias Krause wrote: The AES-NI instructions are also available in legacy mode so the 32-bit architecture may profit from those, too. To illustrate the performance gain here's a short summary of the tcrypt speed test on a Core i7 M620 running at 2.67GHz comparing both assembler implementations: x86: i568 aes-ni delta 256 bit, 8kB blocks, ECB: 125.94 MB/s 187.09 MB/s +48.6% Which method do you used for speed testing? modprobe tcrypt mode=200 sec=? Yes. I used: modprobe tcrypt mode=200 sec=1 That actually does not work very well for AES-NI. Because AES-NI blkcipher is tested in synchronous mode, and in that mode, kernel_fpu_begin/end() must be called for every block, and kernel_fpu_begin/end() is quite slow. That's what I figured, too. Can this slowdown be avoided by saving and restoring the used FPU registers within the assembler implementation or would this be even slower? At the same time, some further optimization for AES-NI can not be tested (such as ecb-aes-aesni driver) in that mode, because they are only available in asynchronous mode. After finding the bug in the second version of the patch I noticed this, too. When developing AES-NI for x86_64, I uses dm-crypt + AES-NI for speed testing, where AES-NI blkcipher will be tested in asynchronous mode, and kernel_fpu_begin/end() is called for every page. Can you use that to test? But wouldn't this be even slower than the above measurement? I took the results for 8kB blocks and a page would only be 4kB ... well, depends on what kind of pages you took. IIRC x86-64 not only supports 2MB but also 1GB pages ;) Or you can add test_acipher_speed (similar with test_ahash_speed) to test cipher in asynchronous mode. Maybe I'll try this approach, since it looks like just a minor modification of the tcrypt module. Thanks for the hints! Best regards, Mathias Best Regards, Huang Ying -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] x86, crypto: ported aes-ni implementation to x86
Hi, I modified the patch so it doesn't introduce a copy of the existing assembler implementation but modifies the existing one to be usable for 64 and 32 bit. Additionally I added some alignment constraints for internal functions which resulted in a noticeable speed-up. I rerun the tests on another machine, an Core i7 M620, 2.67GHz. I also took the low-end numbers for the AES-NI variants because I didn't want to wait for the big numbers to come every now and then any more ;) So here is the comparison of 5 consecutive tcrypt test runs for some selected algorithms in MiB/s: x86-64 (old): 1. run 2. run 3. run 4. run 5. runmean ECB, 256 bit, 8kB: 152.49 152.58 152.51 151.80 151.87 152.25 CBC. 256 bit, 8kB: 144.32 144.44 144.35 143.75 143.75 144.12 LRW, 320 bit, 8kB: 159.41 159.21 159.21 158.55 159.28 159.13 XTS, 512 bit, 8kB: 144.87 142.88 144.75 144.11 144.75 144.27 x86-64 (new): 1. run 2. run 3. run 4. run 5. runmean ECB, 256 bit, 8kB: 184.07 184.07 183.50 183.50 184.07 183.84 CBC. 256 bit, 8kB: 170.25 170.24 169.71 169.71 170.25 170.03 LRW, 320 bit, 8kB: 169.91 169.91 169.39 169.37 169.91 169.69 XTS, 512 bit, 8kB: 172.39 172.35 171.82 171.82 172.35 172.14 i586: 1. run 2. run 3. run 4. run 5. runmean ECB, 256 bit, 8kB: 125.98 126.03 126.03 125.64 126.03 125.94 CBC. 256 bit, 8kB: 118.18 118.19 117.84 117.84 118.19 118.04 LRW, 320 bit, 8kB: 128.37 128.35 127.97 127.98 128.35 128.20 XTS, 512 bit, 8kB: 118.52 118.50 118.14 118.14 118.49 118.35 x86 (AES-NI): 1. run 2. run 3. run 4. run 5. runmean ECB, 256 bit, 8kB: 187.33 187.34 187.33 186.75 186.74 187.09 CBC. 256 bit, 8kB: 171.84 171.84 171.84 171.28 171.28 171.61 LRW, 320 bit, 8kB: 168.54 168.54 168.53 168.00 168.02 168.32 XTS, 512 bit, 8kB: 166.61 166.60 166.60 166.08 166.60 166.49 Comparing the mean values gives us: x86-64:old new delta ECB, 256 bit, 8kB: 152.25 183.84 +20.7% CBC. 256 bit, 8kB: 144.12 170.03 +18.0% LRW, 320 bit, 8kB: 159.13 169.69 +6.6% XTS, 512 bit, 8kB: 144.27 172.14 +19.3% x86: i586 aes-ni delta ECB, 256 bit, 8kB: 125.94 187.09 +48.6% CBC. 256 bit, 8kB: 118.04 171.61 +45.4% LRW, 320 bit, 8kB: 128.20 168.32 +31.3% XTS, 512 bit, 8kB: 118.35 166.49 +40.7% The funny thing is that the 32 bit implementation is sometimes even faster then the 64 bit one. Nevertheless the minor optimization of aligning function entries gave the 64 bit version quite a big performance gain (up to 20%). I'll post the new version of the patch in a follow-up email. Regards, Mathias -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html