warning in crypto_wait_for_test+0x84/0x88
Hello, while booting a latest -net.git kernel on s390x test machine, I've hit this one below, just in case it was not reported yet: [...] [3.317031] qeth 0.0.1000: Outbound TSO not supported on eth0 [3.317123] qeth 0.0.1000: MAC address 02:a1:11:0e:bf:f0 successfully registered on device eth0 [3.317127] qeth 0.0.1000: Device is a Virtual NIC QDIO card (level: V633) with link type Virt.NIC QDIO. [3.364637] qeth 0.0.1000 enccw0.0.1000: renamed from eth0 [3.374169] Adding 501740k swap on /dev/dasda3. Priority:-1 extents:1 across:501740k SSFS [3.436378] EXT4-fs (dasdb1): mounted filesystem with ordered data mode. Opts: (null) [3.459612] EXT4-fs (dasda1): mounted filesystem with ordered data mode. Opts: (null) [ 36.777259] random: crng init done [ 123.421672] device-mapper: uevent: version 1.0.3 [ 123.421747] device-mapper: ioctl: 4.35.0-ioctl (2016-06-23) initialised: dm-de...@redhat.com [ 124.163728] nf_conntrack version 0.5.0 (16384 buckets, 65536 max) [ 184.741563] [ cut here ] [ 184.741579] WARNING: CPU: 1 PID: 537 at crypto/algapi.c:348 crypto_wait_for_test+0x84/0x88 [ 184.741581] Modules linked in: ipt_REJECT nf_reject_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack iptable_filter dm_mirror dm_region_hash dm_log dm_mod xts gf128mul aes_s390(+) des_s390 des_generic sha512_s390 qeth_l2 vmur nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables ext4 jbd2 mbcache dasd_eckd_mod dasd_mod qeth ccwgroup qdio [ 184.741598] CPU: 1 PID: 537 Comm: systemd-udevd Not tainted 4.10.0-rc3+ #4 [ 184.741601] Hardware name: IBM 2964 NE1 716 (z/VM) [ 184.741605] task: f544e200 task.stack: f419c000 [ 184.741608] Krnl PSW : 0704f0018000 003cc484 (crypto_wait_for_test+0x84/0x88) [ 184.741613]R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:3 PM:0 RI:0 EA:3 [ 184.741616] Krnl GPRS: 00058e00 fe00 0001 0001 [ 184.741619]006863e0 f3f00800 03ff80460850 [ 184.741633]03ff80460650 03ff80460618 03ff80460980 f419fe88 [ 184.741634] f4e40c00 003cc45a f419fb60 [ 184.741644] Krnl Code: 003cc476: f073a7f40001srp 2036(8,%r10),1,3 003cc47c: a7f4fff6 brc 15,3cc468 #003cc480: a7f40001 brc 15,3cc482 >003cc484: a7f4fff2brc 15,3cc468 003cc488: ebdff0800024 stmg%r13,%r15,128(%r15) 003cc48e: a7f13fc0 tmll%r15,16320 003cc492: b90400ef lgr %r14,%r15 003cc496: e3f0ffd0ff71 lay %r15,-48(%r15) [ 184.741658] Call Trace: [ 184.741659] ([<003cc45a>] crypto_wait_for_test+0x5a/0x88) [ 184.741661] [<003cc7ee>] crypto_register_alg+0x76/0x88 [ 184.741712] [<03ff8045e490>] aes_s390_register_alg+0x28/0xb98 [aes_s390] [ 184.741722] [<03ff80463116>] cpu_feature_match_MSA_init+0x116/0x1000 [aes_s390] [ 184.741724] [<00100276>] do_one_initcall+0x46/0x148 [ 184.741729] [<0024e264>] do_init_module+0x74/0x218 [ 184.741735] [<001cc586>] load_module+0x141e/0x1888 [ 184.741737] [<001ccbd0>] SyS_finit_module+0x98/0xd0 [ 184.741741] [<0068a1e6>] system_call+0xd6/0x25c [ 184.741742] Last Breaking-Event-Address: [ 184.741743] [<003cc480>] crypto_wait_for_test+0x80/0x88 [ 184.741744] ---[ end trace bf4d899231b60bca ]--- [ 184.741765] [ cut here ] [ 184.741767] WARNING: CPU: 1 PID: 537 at crypto/algapi.c:342 crypto_wait_for_test+0x7c/0x88 [ 184.741768] Modules linked in: ipt_REJECT nf_reject_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack iptable_filter dm_mirror dm_region_hash dm_log dm_mod xts gf128mul aes_s390(+) des_s390 des_generic sha512_s390 qeth_l2 vmur nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables ext4 jbd2 mbcache dasd_eckd_mod dasd_mod qeth ccwgroup qdio [ 184.741775] CPU: 1 PID: 537 Comm: systemd-udevd Tainted: GW 4.10.0-rc3+ #4 [ 184.741776] Hardware name: IBM 2964 NE1 716 (z/VM) [ 184.741777] task: f544e200 task.stack: f419c000 [ 184.741778] Krnl PSW : 0704c0018000 003cc47c (crypto_wait_for_test+0x7c/0x88) [ 184.741780]R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3 [ 184.741781] Krnl GPRS: 0001 8001 0001 0002 [ 184.741783]003cc432 115e 03ff80460850 [ 184.741784]03ff80460650 03ff80460618 03ff80460980 f419fe88 [ 184.741785] f6f3d000
Re: [PATCH v2 7/8] net: Rename TCA*BPF_DIGEST to ..._SHA256
On 01/11/2017 07:19 PM, Andy Lutomirski wrote: On Wed, Jan 11, 2017 at 1:09 AM, Daniel Borkmann <dan...@iogearbox.net> wrote: [...] Ok. Sleeping over this a bit, how about a general rename into "prog_tag" for fdinfo and TCA_BPF_TAG resp. TCA_ACT_BPF_TAG for the netlink attributes, fwiw, it might reduce any assumptions on this being made? If this would be preferable, I could cook that patch against -net for renaming it? That would be fine with me. I think there are two reasonable approaches to computing the actual tag. 1. Use a standard, modern cryptographic hash. SHA-256, SHA-512, Blake2b, whatever. SHA-1 is a bad choice in part because it's partly broken and in part because the implementation in lib/ is a real mess to use (as you noticed while writing the code). 2. Use whatever algorithm you like but make the tag so short that it's obviously not collision-free. 48 or 64 bits is probably reasonable. The intermediate versions are just asking for trouble. Yeah agree, I've just sent a patch to rework this a bit and it got also reasonably small for net. Cleanups, if needed, can be done in net-next once that's pulled into it. Thanks, Daniel -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 7/8] net: Rename TCA*BPF_DIGEST to ..._SHA256
Hi Andy, On 01/11/2017 04:11 AM, Andy Lutomirski wrote: On Tue, Jan 10, 2017 at 4:50 PM, Daniel Borkmann <dan...@iogearbox.net> wrote: On 01/11/2017 12:24 AM, Andy Lutomirski wrote: This makes it easier to add another digest algorithm down the road if needed. It also serves to force any programs that might have been written against a kernel that had the old field name to notice the change and make any necessary changes. This shouldn't violate any stable API policies, as no released kernel has ever had TCA*BPF_DIGEST. Imho, this and patch 6/8 is not really needed. Should there ever another digest alg be used (doubt it), then you'd need a new nl attribute and fdinfo line anyway to keep existing stuff intact. Nobody made the claim that you can just change this underneath and not respecting abi for existing applications when I read from above that such apps now will get "forced" to notice a change. Fair enough. I was more concerned about prerelease iproute2 versions, but maybe that's a nonissue. I'll drop these two patches. Ok. Sleeping over this a bit, how about a general rename into "prog_tag" for fdinfo and TCA_BPF_TAG resp. TCA_ACT_BPF_TAG for the netlink attributes, fwiw, it might reduce any assumptions on this being made? If this would be preferable, I could cook that patch against -net for renaming it? Thanks, Daniel -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 7/8] net: Rename TCA*BPF_DIGEST to ..._SHA256
On 01/11/2017 12:24 AM, Andy Lutomirski wrote: This makes it easier to add another digest algorithm down the road if needed. It also serves to force any programs that might have been written against a kernel that had the old field name to notice the change and make any necessary changes. This shouldn't violate any stable API policies, as no released kernel has ever had TCA*BPF_DIGEST. Imho, this and patch 6/8 is not really needed. Should there ever another digest alg be used (doubt it), then you'd need a new nl attribute and fdinfo line anyway to keep existing stuff intact. Nobody made the claim that you can just change this underneath and not respecting abi for existing applications when I read from above that such apps now will get "forced" to notice a change. -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 4.10 1/6] crypto/sha256: Refactor the API so it can be used without shash
On 12/27/2016 10:58 AM, Herbert Xu wrote: On Mon, Dec 26, 2016 at 10:08:48AM -0800, Andy Lutomirski wrote: According to Daniel, the networking folks want to let embedded systems include BPF without requiring the crypto core. Last I checked the IPv4 stack depended on the crypto API so this sounds bogus. I think there's a bit of a mixup here with what I said. To clarify, requirement back then from tracing folks was that bpf engine and therefore bpf syscall can be build w/o networking enabled for small devices, so dependencies preferably need to be kept on a absolute minimum, same counts for either making it suddenly a depend on CRYPTO or a select CRYPTO for just those few lines that can be pulled in from lib/ code instead. -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 4.10 3/6] bpf: Use SHA256 instead of SHA1 for bpf digests
On 12/24/2016 03:22 AM, Andy Lutomirski wrote: BPF digests are intended to be used to avoid reloading programs that are already loaded. For use cases (CRIU?) where untrusted programs are involved, intentional hash collisions could cause the wrong BPF program to execute. Additionally, if BPF digests are ever used in-kernel to skip verification, a hash collision could give privilege escalation directly. Just for the record, digests will never ever be used to skip the verification step, so I don't know why this idea even comes up here (?) or is part of the changelog? As this will never be done anyway, rather drop that part so we can avoid confusion on this? Wrt untrusted programs, I don't see much of a use on this facility in general for them. Something like a tail call map would quite likely only be private to the application. And again, I really doubt we'll have something like user namespace support in the foreseeable future. Anyway, that said, I don't really have a big issue if you want to switch to sha256, though. SHA1 is no longer considered adequately collision-resistant (see, for example, all the major browsers dropping support for SHA1 certificates). Use SHA256 instead. I moved the digest field to keep all of the bpf program metadata in the same cache line. Cc: Daniel Borkmann <dan...@iogearbox.net> Cc: Alexei Starovoitov <a...@kernel.org> Signed-off-by: Andy Lutomirski <l...@kernel.org> -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BPF hash algo (Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5)
On 12/23/2016 11:59 AM, Hannes Frederic Sowa wrote: On Fri, 2016-12-23 at 11:04 +0100, Daniel Borkmann wrote: On 12/22/2016 05:59 PM, Hannes Frederic Sowa wrote: On Thu, 2016-12-22 at 08:07 -0800, Andy Lutomirski wrote: [...] The hashing is not a proper sha1 neither, unfortunately. I think that is why it will have a custom implementation in iproute2? Still trying to catch up on this admittedly bit confusing thread. I did run automated tests over couple of days comparing the data I got from fdinfo with the one from af_alg and found no mismatch on the test cases varying from min to max possible program sizes. In the process of testing, as you might have seen on netdev, I found couple of other bugs in bpf code along the way and fixed them up as well. So my question, do you or Andy or anyone participating in claiming this have any concrete data or test cases that suggests something different? If yes, I'm very curious to hear about it and willing fix it up, of course. When I'm back from pto I'll prep and cook up my test suite to be included into the selftests/bpf/, should have done this initially, sorry about that. I'll also post something to expose the alg, that sounds fine to me. Looking into your code closer, I noticed that you indeed seem to do the finalization of sha-1 by hand by aligning and padding the buffer accordingly and also patching in the necessary payload length. Apologies for my side for claiming that this is not correct sha1 output, I was only looking at sha_transform and its implementation and couldn't see the padding and finalization round with embedding the data length in there and hadn't thought of it being done manually. Anyway, is it difficult to get the sha finalization into some common code library? It is not very bpf specific and crypto code reviewers won't find it there at all. Yes, sure, I'll rework it that way (early next year when I'm back if that's fine with you). Thanks, Daniel -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BPF hash algo (Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5)
On 12/22/2016 06:25 PM, Andy Lutomirski wrote: On Thu, Dec 22, 2016 at 8:59 AM, Hannes Frederic Sowa [...] I wondered if bpf program loading should have used the module loading infrastructure from the beginning... That would be way too complicated and would be nasty for the unprivileged cases. Also, there are users be it privileged or not that don't require to have a full obj loader from user space, but are fine with just hard-coding parts or all of the insns in their application. Back then we settled with using fds based on Andy's suggestion, it has both ups and downs as we saw along the way but worked okay thus far. -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BPF hash algo (Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5)
On 12/22/2016 05:59 PM, Hannes Frederic Sowa wrote: On Thu, 2016-12-22 at 08:07 -0800, Andy Lutomirski wrote: On Thu, Dec 22, 2016 at 7:51 AM, Hannes Frederic Sowa <han...@stressinduktion.org> wrote: On Thu, 2016-12-22 at 16:41 +0100, Jason A. Donenfeld wrote: On Thu, Dec 22, 2016 at 4:33 PM, Hannes Frederic Sowa <han...@stressinduktion.org> wrote: IPv6 you cannot touch anymore. The hashing algorithm is part of uAPI. You don't want to give people new IPv6 addresses with the same stable secret (across reboots) after a kernel upgrade. Maybe they lose connectivity then and it is extra work? Ahh, too bad. So it goes. If no other users survive we can put it into the ipv6 module. The bpf hash stuff can be changed during this merge window, as it is not yet in a released kernel. Albeit I would probably have preferred something like sha256 here, which can be easily replicated by user space tools (minus the problem of patching out references to not hashable data, which must be zeroed). Oh, interesting, so time is of the essence then. Do you want to handle changing the new eBPF code to something not-SHA1 before it's too late, as part of a ne w patchset that can fast track itself to David? And then I can preserve my large series for the next merge window. This algorithm should be a non-seeded algorithm, because the hashes should be stable and verifiable by user space tooling. Thus this would need a hashing algorithm that is hardened against pre-image attacks/collision resistance, which siphash is not. I would prefer some higher order SHA algorithm for that actually. You mean: commit 7bd509e311f408f7a5132fcdde2069af65fa05ae Author: Daniel Borkmann <dan...@iogearbox.net> Date: Sun Dec 4 23:19:41 2016 +0100 bpf: add prog_digest and expose it via fdinfo/netlink Yes, please! This actually matters for security -- imagine a malicious program brute-forcing a collision so that it gets loaded wrong. And this is IMO a use case for SHA-256 or SHA-512/256 (preferably the latter). Speed basically doesn't matter here and Blake2 is both less stable (didn't they slightly change it recently?) and much less well studied. We don't prevent ebpf programs being loaded based on the digest but just to uniquely identify loaded programs from user space and match up with their source. There have been talks about signing bpf programs, thus this would probably need another digest algorithm additionally to that one, wasting probably instructions. Probably going somewhere in direction of PKCS#7 might be the thing to do (which leads to the problem to make PKCS#7 attackable by ordinary unpriv users, hmpf). My inclination would have been to seed them with something that isn't exposed to userspace for the precise reason that it would prevent user code from making assumptions about what's in the hash. But if there's a use case for why user code needs to be able to calculate the hash on its own, then that's fine. But perhaps the actual fdinfo string should be "sha256:abcd1234..." to give some flexibility down the road. Also: + result = (__force __be32 *)fp->digest; + for (i = 0; i < SHA_DIGEST_WORDS; i++) + result[i] = cpu_to_be32(fp->digest[i]); Everyone, please, please, please don't open-code crypto primitives. Is this and the code above it even correct? It might be but on a very brief glance it looks wrong to me. If you're doing this to avoid depending on crypto, then fix crypto so you can pull in the algorithm without pulling in the whole crypto core. The hashing is not a proper sha1 neither, unfortunately. I think that is why it will have a custom implementation in iproute2? Still trying to catch up on this admittedly bit confusing thread. I did run automated tests over couple of days comparing the data I got from fdinfo with the one from af_alg and found no mismatch on the test cases varying from min to max possible program sizes. In the process of testing, as you might have seen on netdev, I found couple of other bugs in bpf code along the way and fixed them up as well. So my question, do you or Andy or anyone participating in claiming this have any concrete data or test cases that suggests something different? If yes, I'm very curious to hear about it and willing fix it up, of course. When I'm back from pto I'll prep and cook up my test suite to be included into the selftests/bpf/, should have done this initially, sorry about that. I'll also post something to expose the alg, that sounds fine to me. I wondered if bpf program loading should have used the module loading infrastructure from the beginning... At the very least, there should be a separate function that calculates the hash of a buffer and that function should explicitly run itself against test vectors of various lengths to make sure that it's calculating what it claims to be calculating. And it doesn't look like the userspace code has landed, so, if this thing i
[PATCH crypto-2.6] lib: make memzero_explicit more robust against dead store elimination
In commit 0b053c951829 (lib: memzero_explicit: use barrier instead of OPTIMIZER_HIDE_VAR), we made memzero_explicit() more robust in case LTO would decide to inline memzero_explicit() and eventually find out it could be elimiated as dead store. While using barrier() works well for the case of gcc, recent efforts from LLVMLinux people suggest to use llvm as an alternative to gcc, and there, Stephan found in a simple stand-alone user space example that llvm could nevertheless optimize and thus elimitate the memset(). A similar issue has been observed in the referenced llvm bug report, which is regarded as not-a-bug. The fix in this patch now works for both compilers (also tested with more aggressive optimization levels). Arguably, in the current kernel tree it's more of a theoretical issue, but imho, it's better to be pedantic about it. It's clearly visible though, with the below code: if we would have used barrier()-only here, llvm would have omitted clearing, not so with barrier_data() variant: static inline void memzero_explicit(void *s, size_t count) { memset(s, 0, count); barrier_data(s); } int main(void) { char buff[20]; memzero_explicit(buff, sizeof(buff)); return 0; } $ gcc -O2 test.c $ gdb a.out (gdb) disassemble main Dump of assembler code for function main: 0x00400400 +0: lea -0x28(%rsp),%rax 0x00400405 +5: movq $0x0,-0x28(%rsp) 0x0040040e +14: movq $0x0,-0x20(%rsp) 0x00400417 +23: movl $0x0,-0x18(%rsp) 0x0040041f +31: xor %eax,%eax 0x00400421 +33: retq End of assembler dump. $ clang -O2 test.c $ gdb a.out (gdb) disassemble main Dump of assembler code for function main: 0x004004f0 +0: xorps %xmm0,%xmm0 0x004004f3 +3: movaps %xmm0,-0x18(%rsp) 0x004004f8 +8: movl $0x0,-0x8(%rsp) 0x00400500 +16: lea-0x18(%rsp),%rax 0x00400505 +21: xor%eax,%eax 0x00400507 +23: retq End of assembler dump. As clang (but also icc) defines __GNUC__, it's sufficient to define this in compiler-gcc.h only. Reference: https://llvm.org/bugs/show_bug.cgi?id=15495 Reported-by: Stephan Mueller smuel...@chronox.de Signed-off-by: Daniel Borkmann dan...@iogearbox.net Cc: Theodore Ts'o ty...@mit.edu Cc: Stephan Mueller smuel...@chronox.de Cc: Hannes Frederic Sowa han...@stressinduktion.org Cc: mancha security manc...@zoho.com Cc: Mark Charlebois charl...@gmail.com Cc: Behan Webster beh...@converseincode.com --- include/linux/compiler-gcc.h | 16 +++- include/linux/compiler.h | 4 lib/string.c | 2 +- 3 files changed, 20 insertions(+), 2 deletions(-) diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h index cdf13ca..371e560 100644 --- a/include/linux/compiler-gcc.h +++ b/include/linux/compiler-gcc.h @@ -9,10 +9,24 @@ + __GNUC_MINOR__ * 100 \ + __GNUC_PATCHLEVEL__) - /* Optimization barrier */ + /* The volatile is due to gcc bugs */ #define barrier() __asm__ __volatile__(: : :memory) +/* + * This version is i.e. to prevent dead stores elimination on @ptr + * where gcc and llvm may behave differently when otherwise using + * normal barrier(): while gcc behavior gets along with a normal + * barrier(), llvm needs an explicit input variable to be assumed + * clobbered. The issue is as follows: while the inline asm might + * access any memory it wants, the compiler could have fit all of + * @ptr into memory registers instead, and since @ptr never escaped + * from that, it proofed that the inline asm wasn't touching any of + * it. This version works well with both compilers, i.e. we're telling + * the compiler that the inline asm absolutely may see the contents + * of @ptr. See also: https://llvm.org/bugs/show_bug.cgi?id=15495 + */ +#define barrier_data(ptr) __asm__ __volatile__(: :r(ptr) :memory) /* * This macro obfuscates arithmetic on a variable address so that gcc diff --git a/include/linux/compiler.h b/include/linux/compiler.h index 0e41ca0..8677225 100644 --- a/include/linux/compiler.h +++ b/include/linux/compiler.h @@ -169,6 +169,10 @@ void ftrace_likely_update(struct ftrace_branch_data *f, int val, int expect); # define barrier() __memory_barrier() #endif +#ifndef barrier_data +# define barrier_data(ptr) barrier() +#endif + /* Unreachable code */ #ifndef unreachable # define unreachable() do { } while (1) diff --git a/lib/string.c b/lib/string.c index a579201..bb3d4b6 100644 --- a/lib/string.c +++ b/lib/string.c @@ -607,7 +607,7 @@ EXPORT_SYMBOL(memset); void memzero_explicit(void *s, size_t count) { memset(s, 0, count); - barrier(); + barrier_data(s); } EXPORT_SYMBOL(memzero_explicit); -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo
Re: [BUG/PATCH] kernel RNG and its secrets
On 04/27/2015 10:41 PM, Stephan Mueller wrote: ... It seems you have the code already in mind, so please if you could write it :-) Ok, sure. I'll cook something by tomorrow morning. Cheers, Daniel -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG/PATCH] kernel RNG and its secrets
On 04/27/2015 09:10 PM, Stephan Mueller wrote: ... I posted the issue on the clang mailing list on April 10 -- no word so far. I would interpret this as a sign that it is a no-issue for them. Hm. ;) Here's a bug report on the topic, gcc vs llvm: https://llvm.org/bugs/show_bug.cgi?id=15495 Lets add a new barrier macro to linux/compiler{,-gcc}.h, f.e. #define barrier_data(ptr) __asm__ __volatile__( : : r (ptr) : memory) or the version Mancha proposed. You could wrap that ... #define OPTIMIZER_HIDE(ptr) barrier_data(ptr) ... and use that one for memzero_explicit() instead: void memzero_explicit(void *s, size_t count) { memset(s, 0, count); OPTIMIZER_HIDE(s); } It certainly needs comments explaining in what situations to use which OPTIMIZER_HIDE* variants, etc. Do you want to send a patch? -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: crypto: fips - Move fips_enabled sysctl into fips.c
On 04/22/2015 07:02 AM, Herbert Xu wrote: There is currently a large ifdef FIPS code section in proc.c. Ostensibly it's there because the fips_enabled sysctl sits under /proc/sys/crypto. However, no other crypto sysctls exist. In fact, the whole ethos of the crypto API is against such user interfaces so this patch moves all the FIPS sysctl code over to fips.c. Signed-off-by: Herbert Xu herb...@gondor.apana.org.au ... int fips_enabled; EXPORT_SYMBOL_GPL(fips_enabled); While you're at it, this should also be marked as __read_mostly. Thanks, Daniel -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG/PATCH] kernel RNG and its secrets
[ Cc'ing Cesar ] On 03/18/2015 10:53 AM, mancha wrote: Hi. The kernel RNG introduced memzero_explicit in d4c5efdb9777 to protect memory cleansing against things like dead store optimization: void memzero_explicit(void *s, size_t count) { memset(s, 0, count); OPTIMIZER_HIDE_VAR(s); } OPTIMIZER_HIDE_VAR, introduced in fe8c8a126806 to protect crypto_memneq against timing analysis, is defined when using gcc as: #define OPTIMIZER_HIDE_VAR(var) __asm__ ( : =r (var) : 0 (var)) My tests with gcc 4.8.2 on x86 find it insufficient to prevent gcc from optimizing out memset (i.e. secrets remain in memory). Could you elaborate on your test case? memzero_explicit() is actually an EXPORT_SYMBOL(), are you saying that gcc removes the call to memzero_explicit() entirely, inlines it, and then optimizes the memset() eventually away? Last time I looked, it emitted a call to memzero_explicit(), and inside memzero_explicit() it did the memset() as it cannot make any assumption from there. I'm using gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-7). Two things that do work: __asm__ __volatile__ ( : =r (var) : 0 (var)) and __asm__ __volatile__(: : :memory) The first is OPTIMIZER_HIDE_VAR plus a volatile qualifier and the second is barrier() [as defined when using gcc]. I propose memzero_explicit use barrier(). --- a/lib/string.c +++ b/lib/string.c @@ -616,7 +616,7 @@ EXPORT_SYMBOL(memset); void memzero_explicit(void *s, size_t count) { memset(s, 0, count); - OPTIMIZER_HIDE_VAR(s); + barrier(); } EXPORT_SYMBOL(memzero_explicit); For any attribution deemed necessary, please use mancha security. Please CC me on replies. --mancha PS CC'ing Herbert Xu in case this impacts crypto_memneq. -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG/PATCH] kernel RNG and its secrets
On 03/18/2015 11:50 AM, Hannes Frederic Sowa wrote: On Wed, Mar 18, 2015, at 10:53, mancha wrote: Hi. The kernel RNG introduced memzero_explicit in d4c5efdb9777 to protect memory cleansing against things like dead store optimization: void memzero_explicit(void *s, size_t count) { memset(s, 0, count); OPTIMIZER_HIDE_VAR(s); } OPTIMIZER_HIDE_VAR, introduced in fe8c8a126806 to protect crypto_memneq against timing analysis, is defined when using gcc as: #define OPTIMIZER_HIDE_VAR(var) __asm__ ( : =r (var) : 0 (var)) My tests with gcc 4.8.2 on x86 find it insufficient to prevent gcc from optimizing out memset (i.e. secrets remain in memory). Two things that do work: __asm__ __volatile__ ( : =r (var) : 0 (var)) You are correct, volatile signature should be added to OPTIMIZER_HIDE_VAR. Because we use an output variable =r, gcc is allowed to check if it is needed and may remove the asm statement. Another option would be to just use var as an input variable - asm blocks without output variables are always considered being volatile by gcc. Can you send a patch? I don't think it is security critical, as Daniel pointed out, the call will happen because the function is an external call to the crypto functions, thus the compiler has to flush memory on return. Just had a look. $ gdb vmlinux (gdb) disassemble memzero_explicit Dump of assembler code for function memzero_explicit: 0x813a18b0 +0: push %rbp 0x813a18b1 +1: mov%rsi,%rdx 0x813a18b4 +4: xor%esi,%esi 0x813a18b6 +6: mov%rsp,%rbp 0x813a18b9 +9: callq 0x813a7120 memset 0x813a18be +14: pop%rbp 0x813a18bf +15: retq End of assembler dump. (gdb) disassemble extract_entropy [...] 0x814a5000 +304: sub%r15,%rbx 0x814a5003 +307: jne0x814a4f80 extract_entropy+176 0x814a5009 +313: mov%r12,%rdi 0x814a500c +316: mov$0xa,%esi 0x814a5011 +321: callq 0x813a18b0 memzero_explicit 0x814a5016 +326: mov-0x48(%rbp),%rax [...] I would be fine with __volatile__. Thanks a lot mancha, could you send a patch? Best, Daniel -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG/PATCH] kernel RNG and its secrets
On 03/18/2015 01:20 PM, Stephan Mueller wrote: Am Mittwoch, 18. März 2015, 13:19:07 schrieb Hannes Frederic Sowa: Hi Hannes, On Wed, Mar 18, 2015, at 13:14, Stephan Mueller wrote: Am Mittwoch, 18. März 2015, 13:02:12 schrieb Hannes Frederic Sowa: Hi Hannes, On Wed, Mar 18, 2015, at 12:09, Stephan Mueller wrote: Am Mittwoch, 18. März 2015, 11:56:43 schrieb Daniel Borkmann: On 03/18/2015 11:50 AM, Hannes Frederic Sowa wrote: On Wed, Mar 18, 2015, at 10:53, mancha wrote: Hi. The kernel RNG introduced memzero_explicit in d4c5efdb9777 to protect memory cleansing against things like dead store optimization: void memzero_explicit(void *s, size_t count) { memset(s, 0, count); OPTIMIZER_HIDE_VAR(s); } OPTIMIZER_HIDE_VAR, introduced in fe8c8a126806 to protect crypto_memneq against timing analysis, is defined when using gcc as: #define OPTIMIZER_HIDE_VAR(var) __asm__ ( : =r (var) : 0 (var)) My tests with gcc 4.8.2 on x86 find it insufficient to prevent gcc from optimizing out memset (i.e. secrets remain in memory). Two things that do work: __asm__ __volatile__ ( : =r (var) : 0 (var)) You are correct, volatile signature should be added to OPTIMIZER_HIDE_VAR. Because we use an output variable =r, gcc is allowed to check if it is needed and may remove the asm statement. Another option would be to just use var as an input variable - asm blocks without output variables are always considered being volatile by gcc. Can you send a patch? I don't think it is security critical, as Daniel pointed out, the call will happen because the function is an external call to the crypto functions, thus the compiler has to flush memory on return. Just had a look. $ gdb vmlinux (gdb) disassemble memzero_explicit Dump of assembler code for function memzero_explicit: 0x813a18b0 +0: push %rbp 0x813a18b1 +1: mov%rsi,%rdx 0x813a18b4 +4: xor%esi,%esi 0x813a18b6 +6: mov%rsp,%rbp 0x813a18b9 +9: callq 0x813a7120 memset 0x813a18be +14: pop%rbp 0x813a18bf +15: retq End of assembler dump. (gdb) disassemble extract_entropy [...] 0x814a5000 +304:sub%r15,%rbx 0x814a5003 +307:jne0x814a4f80 extract_entropy+176 0x814a5009 +313:mov%r12,%rdi 0x814a500c +316:mov$0xa,%esi 0x814a5011 +321:callq 0x813a18b0 memzero_explicit 0x814a5016 +326: mov -0x48(%rbp),%rax [...] I would be fine with __volatile__. Are we sure that simply adding a __volatile__ works in any case? I just did a test with a simple user space app: static inline void memset_secure(void *s, int c, size_t n) { memset(s, c, n); //__asm__ __volatile__(: : :memory); __asm__ __volatile__( : =r (s) : 0 (s)); } Good point, thanks! Of course an input or output of s does not force the memory pointed to by s being flushed. My proposal would be to add a #define OPTIMIZER_HIDE_MEM(ptr, len) __asm__ __volatile__ ( : : m( ({ struct { u8 b[len]; } *p = (void *)ptr ; *p; }) ) and use this in the code function. This is documented in gcc manual 6.43.2.5. That one adds the zeroization instructuctions. But now there are much more than with the barrier. 400469: 48 c7 04 24 00 00 00movq $0x0,(%rsp) 400470: 00 400471: 48 c7 44 24 08 00 00movq $0x0,0x8(%rsp) 400478: 00 00 40047a: c7 44 24 10 00 00 00movl $0x0,0x10(%rsp) 400481: 00 400482: 48 c7 44 24 20 00 00movq $0x0,0x20(%rsp) 400489: 00 00 40048b: 48 c7 44 24 28 00 00movq $0x0,0x28(%rsp) 400492: 00 00 400494: c7 44 24 30 00 00 00movl $0x0,0x30(%rsp) 40049b: 00 Any ideas? Hmm, correct definition of u8? I use unsigned char Which version of gcc do you use? I can't see any difference if I compile your example at -O2. gcc-Version 4.9.2 20150212 (Red Hat 4.9.2-6) (GCC) I can see the same with the gcc version I previously posted. So it clears the 20 bytes from your example (movq, movq, movl) at two locations, presumably buf[] and b[]. Best, Daniel -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH -crypto] lib: memzero_explicit: use barrier instead of OPTIMIZER_HIDE_VAR
From: mancha security manc...@zoho.com OPTIMIZER_HIDE_VAR(), as defined when using gcc, is insufficient to ensure protection from dead store optimization. For the random driver and crypto drivers, calls are emitted ... $ gdb vmlinux (gdb) disassemble memzero_explicit Dump of assembler code for function memzero_explicit: 0x813a18b0 +0:push %rbp 0x813a18b1 +1:mov%rsi,%rdx 0x813a18b4 +4:xor%esi,%esi 0x813a18b6 +6:mov%rsp,%rbp 0x813a18b9 +9:callq 0x813a7120 memset 0x813a18be +14: pop%rbp 0x813a18bf +15: retq End of assembler dump. (gdb) disassemble extract_entropy [...] 0x814a5009 +313: mov%r12,%rdi 0x814a500c +316: mov$0xa,%esi 0x814a5011 +321: callq 0x813a18b0 memzero_explicit 0x814a5016 +326: mov-0x48(%rbp),%rax [...] ... but in case in future we might use facilities such as LTO, then OPTIMIZER_HIDE_VAR() is not sufficient to protect gcc from a possible eviction of the memset(). We have to use a compiler barrier instead. Minimal test example when we assume memzero_explicit() would *not* be a call, but would have been *inlined* instead: static inline void memzero_explicit(void *s, size_t count) { memset(s, 0, count); foo } int main(void) { char buff[20]; snprintf(buff, sizeof(buff) - 1, test); printf(%s, buff); memzero_explicit(buff, sizeof(buff)); return 0; } With foo := OPTIMIZER_HIDE_VAR(): (gdb) disassemble main Dump of assembler code for function main: [...] 0x00400464 +36:callq 0x400410 printf@plt 0x00400469 +41:xor%eax,%eax 0x0040046b +43:add$0x28,%rsp 0x0040046f +47:retq End of assembler dump. With foo := barrier(): (gdb) disassemble main Dump of assembler code for function main: [...] 0x00400464 +36:callq 0x400410 printf@plt 0x00400469 +41:movq $0x0,(%rsp) 0x00400471 +49:movq $0x0,0x8(%rsp) 0x0040047a +58:movl $0x0,0x10(%rsp) 0x00400482 +66:xor%eax,%eax 0x00400484 +68:add$0x28,%rsp 0x00400488 +72:retq End of assembler dump. As can be seen, movq, movq, movl are being emitted inlined via memset(). Reference: http://thread.gmane.org/gmane.linux.kernel.cryptoapi/13764/ Fixes: d4c5efdb9777 (random: add and use memzero_explicit() for clearing data) Cc: Hannes Frederic Sowa han...@stressinduktion.org Cc: Stephan Mueller smuel...@chronox.de Cc: Theodore Ts'o ty...@mit.edu Signed-off-by: mancha security manc...@zoho.com Signed-off-by: Daniel Borkmann dan...@iogearbox.net --- Sending to Herbert as crypto/random are the main users. Based against -crypto tree. Thanks! lib/string.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lib/string.c b/lib/string.c index ce81aae..a579201 100644 --- a/lib/string.c +++ b/lib/string.c @@ -607,7 +607,7 @@ EXPORT_SYMBOL(memset); void memzero_explicit(void *s, size_t count) { memset(s, 0, count); - OPTIMIZER_HIDE_VAR(s); + barrier(); } EXPORT_SYMBOL(memzero_explicit); -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG/PATCH] kernel RNG and its secrets
On 03/18/2015 06:14 PM, mancha wrote: ... Patch 0001 fixes the dead store issue in memzero_explicit(). Thanks! I have issued the fix for the memzero bug to Herbert in your authorship as discussed, also giving some more context. For the 2nd issue, lets wait for Cesar. Thanks again! -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] lib: memzero_explicit: add comment for its usage
Lets improve the comment to add a note on when to use memzero_explicit() for those not digging through the git logs. We don't want people to pollute places with memzero_explicit() where it's not really necessary. Reference: https://lkml.org/lkml/2015/1/4/190 Suggested-by: Herbert Xu herb...@gondor.apana.org.au Signed-off-by: Daniel Borkmann dbork...@redhat.com --- [ Sending to -crypto as it's most relevant here and suggested by Herbert anyway. ] lib/string.c | 5 + 1 file changed, 5 insertions(+) diff --git a/lib/string.c b/lib/string.c index 1006330..d984ec4 100644 --- a/lib/string.c +++ b/lib/string.c @@ -604,6 +604,11 @@ EXPORT_SYMBOL(memset); * @s: Pointer to the start of the area. * @count: The size of the area. * + * Note: usually using memset() is just fine (!), but in cases + * where clearing out _local_ data at the end of a scope is + * necessary, memzero_explicit() should be used instead in + * order to prevent the compiler from optimising away zeroing. + * * memzero_explicit() doesn't need an arch-specific version as * it just invokes the one of memset() implicitly. */ -- 1.7.11.7 -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 7/8] crypto: AF_ALG: add random number generator support
On 11/12/2014 08:05 AM, Stephan Mueller wrote: This patch adds the random number generator support for AF_ALG. A random number generator's purpose is to generate data without requiring the caller to provide any data. Therefore, the AF_ALG interface handler for RNGs only implements a callback handler for recvmsg. ... +static int rng_recvmsg(struct kiocb *unused, struct socket *sock, + struct msghdr *msg, size_t len, int flags) +{ + struct sock *sk = sock-sk; + struct alg_sock *ask = alg_sk(sk); + struct rng_ctx *ctx = ask-private; + int err = -EFAULT; + + if (0 == len) if (len == 0) ... [And also other places.] We don't use Yoda condition style in the kernel. + return 0; + if (MAXSIZE len) + len = MAXSIZE; + + lock_sock(sk); + len = crypto_rng_get_bytes(ctx-drng, ctx-result, len); + if (0 len) + goto unlock; + + err = memcpy_toiovec(msg-msg_iov, ctx-result, len); + memset(ctx-result, 0, err); + This looks buggy. If copy_to_user() fails from within memcpy_toiovec(), we call memset() with a negative return value which is interpreted as size_t and thus causes a buffer overflow writing beyond ctx-result, no? If it succeeds, we call memset(ctx-result, 0, 0) . +unlock: + release_sock(sk); + + return err ? err : len; +} + +static struct proto_ops algif_rng_ops = { + .family = PF_ALG, + + .connect= sock_no_connect, + .socketpair = sock_no_socketpair, + .getname= sock_no_getname, + .ioctl = sock_no_ioctl, + .listen = sock_no_listen, + .shutdown = sock_no_shutdown, + .getsockopt = sock_no_getsockopt, + .mmap = sock_no_mmap, + .bind = sock_no_bind, + .accept = sock_no_accept, + .setsockopt = sock_no_setsockopt, + .poll = sock_no_poll, + .sendmsg= sock_no_sendmsg, + .sendpage = sock_no_sendpage, + + .release= af_alg_release, + .recvmsg= rng_recvmsg, +}; + +static void *rng_bind(const char *name, u32 type, u32 mask) +{ + return crypto_alloc_rng(name, type, mask); +} + +static void rng_release(void *private) +{ + crypto_free_rng(private); +} + +static void rng_sock_destruct(struct sock *sk) +{ + struct alg_sock *ask = alg_sk(sk); + struct rng_ctx *ctx = ask-private; + + memset(ctx-result, 0, MAXSIZE); memset(ctx-result, 0, sizeof(ctx-result)); + sock_kfree_s(sk, ctx, ctx-len); + af_alg_release_parent(sk); +} + +static int rng_accept_parent(void *private, struct sock *sk) +{ + struct rng_ctx *ctx; + struct alg_sock *ask = alg_sk(sk); + unsigned int len = sizeof(*ctx); + int seedsize = crypto_rng_seedsize(private); + int ret = -ENOMEM; + + ctx = sock_kmalloc(sk, len, GFP_KERNEL); + if (!ctx) + return -ENOMEM; + memset(ctx-result, 0, MAXSIZE); Ditto... + ctx-len = len; + + if (seedsize) { + u8 *buf = kmalloc(seedsize, GFP_KERNEL); + if (!buf) + goto err; + get_random_bytes(buf, seedsize); + ret = crypto_rng_reset(private, buf, len); + kzfree(buf); -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 7/8] crypto: AF_ALG: add random number generator support
On 11/12/2014 05:54 PM, Stephan Mueller wrote: Am Mittwoch, 12. November 2014, 17:15:52 schrieb Daniel Borkmann: On 11/12/2014 08:05 AM, Stephan Mueller wrote: This patch adds the random number generator support for AF_ALG. A random number generator's purpose is to generate data without requiring the caller to provide any data. Therefore, the AF_ALG interface handler for RNGs only implements a callback handler for recvmsg. ... +static int rng_recvmsg(struct kiocb *unused, struct socket *sock, + struct msghdr *msg, size_t len, int flags) +{ + struct sock *sk = sock-sk; + struct alg_sock *ask = alg_sk(sk); + struct rng_ctx *ctx = ask-private; + int err = -EFAULT; + + if (0 == len) if (len == 0) ... [And also other places.] We don't use Yoda condition style in the kernel. Well, there is a very good reason for using the approach I have: we all have done the error of forgetting the second = sign. In my case, the compiler will complain and we fix the error right away. In your case, nobody is complaining but we introduced a nasty, potentially hard to debug error. Thus, I very much like to keep my version just to be on the safe side. Note, there was even a backdoor I have seen where the missing 2nd equal sign introduced a privilege escalation. Therefore, my standard coding practice is to have a fixed value on the left side and the variable on the right side of any comparison. I understand, but then please add this proposal first into ... Documentation/CodingStyle The problem is that while the rest of the kernel does not follow this coding style, it's also much harder to read and/or program this way for people not being used to. So the danger of bugs slipping in this way is at least equally high. Besides that, this argument would also only account for '==' checks. + return 0; + if (MAXSIZE len) + len = MAXSIZE; + + lock_sock(sk); + len = crypto_rng_get_bytes(ctx-drng, ctx-result, len); + if (0 len) + goto unlock; + + err = memcpy_toiovec(msg-msg_iov, ctx-result, len); + memset(ctx-result, 0, err); + This looks buggy. If copy_to_user() fails from within memcpy_toiovec(), we call memset() with a negative return value which is interpreted as size_t and thus causes a buffer overflow writing beyond ctx-result, no? If it succeeds, we call memset(ctx-result, 0, 0) . Right, good catch, I have to add a catch for negative error here. Hm? Don't you rather mean to say to unconditionally do something like ... memzero_explicit(ctx-result, len); ... + memset(ctx-result, 0, MAXSIZE); memset(ctx-result, 0, sizeof(ctx-result)); Ok, if this is desired, fine with me. Yes, please. -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 7/8] crypto: AF_ALG: add random number generator support
On 11/12/2014 06:46 PM, Stephan Mueller wrote: ... * I unconditionally use the memset after memcpy as you indicated. Once the cryptodev tree contains the memzero_explicit call, I will start picking up that function. Herbert merged it actually in this morning, so it's already part of the cryptodev tree by now. Essentially, I throught of the line you suggested. Ok, thanks. -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: crypto: zeroization of sensitive data in af_alg
On 11/11/2014 05:16 AM, Stephan Mueller wrote: ... That is a good idea. Herbert: I can prepare a patch that uses memzero_explicit. However, your current tree does not yet implement that function as it was added to Linus' tree after you pulled from it. Yep, Ted took it [1] on top of the random driver fix as discussed. [1] https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=7185ad2672a7d50bc384de0e38d90b75d99f3d82 -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] crypto: AF_ALG - zeroize message digest buffer
Hi Stephan, On 11/11/2014 05:37 AM, Stephan Mueller wrote: Zeroize the buffer holding the message digest calculated for the consumer before the buffer is released by the hash AF_ALG interface handler. Signed-off-by: Stephan Mueller smuel...@chronox.de --- crypto/algif_hash.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/crypto/algif_hash.c b/crypto/algif_hash.c index 8502462..f75db4c 100644 --- a/crypto/algif_hash.c +++ b/crypto/algif_hash.c @@ -258,6 +258,8 @@ static void hash_sock_destruct(struct sock *sk) struct alg_sock *ask = alg_sk(sk); struct hash_ctx *ctx = ask-private; + memzero_explicit(ctx-result, +crypto_ahash_digestsize(crypto_ahash_reqtfm(ctx-req))); sock_kfree_s(sk, ctx-result, Perhaps something like this (alternatively kzfree() would work, too) ... static void __sock_kfree_s(struct sock *sk, void *mem, int size, bool clear_mem) { if (WARN_ON_ONCE(!mem)) return; if (clear_mem) memzero_explicit(mem, size); kfree(mem); atomic_sub(size, sk-sk_omem_alloc); } void sock_kfree_s(struct sock *sk, void *mem, int size) { __sock_kfree_s(sk, mem, size, false); } EXPORT_SYMBOL(sock_kfree_s); void sock_kzfree_s(struct sock *sk, void *mem, int size) { __sock_kfree_s(sk, mem, size, true); } EXPORT_SYMBOL(sock_kzfree_s); ... so you could then just use it as drop-in in various places: sock_kzfree_s(sk, ctx-result, ...); crypto_ahash_digestsize(crypto_ahash_reqtfm(ctx-req))); sock_kfree_s(sk, ctx, ctx-len); Thanks, Daniel -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: memset() in crypto code?
On 10/08/2014 04:30 AM, Sandy Harris wrote: I have started a thread about this on the gcc help mailing list https://gcc.gnu.org/ml/gcc-help/2014-10/msg00047.html Great, perhaps you want to pass a patch proposal to gcc folks? We might consider replacinging memzero_explicit with memset_s() since that is in the C!! standard, albeit I think as optional. IBM, Apple, NetBSD, ... have that. https://mail-index.netbsd.org/tech-userlevel/2012/02/24/msg006125.html The patch you point to for NetBSD does nothing else what memzero_explicit() or bzero_explicit() variants already internally do, only that they're discussing about whether to adopt it or not in their user space libc ... Cheers, Daniel -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: memset() in crypto code?
On 10/06/2014 08:52 PM, Sandy Harris wrote: On Mon, Oct 6, 2014 at 1:44 PM, Jason Cooper ja...@lakedaemon.net wrote: On Sat, Oct 04, 2014 at 11:09:40PM -0400, Sandy Harris wrote: ... There was recently a patch to the random driver to replace memset() because, according to the submitter, gcc sometimes optimises memset() away ... memzero_explicit() is a good start, ... As I see it, memzero_explicit() is a rather ugly kluge, albeit an acceptable one in the circumstances. Right. A real fix would make memset() do the right thing reliably; if the programmer puts in memset( x, 0, nbytes) then the memory should be cleared, no ifs or buts. I do not know or care if that means changes in the compiler or in the library code or even both, but the fix should make the standard library code work right, not require adding a new function and expecting everyone to use it. That would be a desirable goal, ideally perhaps as a built-in from the compiler itself, just as memset(). Applications such as openssh implement for the very same purpose their bzero_explicit() variant just as well. -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: memset() in crypto code?
Hi Sandy, On 10/05/2014 05:09 AM, Sandy Harris wrote: There was recently a patch to the random driver to replace memset() because, according to the submitter, gcc sometimes optimises memset() away which might leave data unnecessarily exposed. The solution suggested was a function called memzero_explicit(). There was a fair bit of discussion and the patch was accepted. In the crypto directory of the kernel source I have: $ grep memset *.c | wc -l 133 $ I strongly suspect some of these should be fixed. I have submitted it here one month ago for crypto and it's still awaiting to be applied: http://www.spinics.net/lists/linux-crypto/msg11965.html As the random driver patch has been applied to random -dev, it will be available from 3.18 onwards, but the dependency for crypto is currently there, that's why I asked Ted to take it through his tree; hopefully this will happen soonish (but I haven't heard anything back ever since) ... Thanks! Daniel -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] crypto: memzero_explicit - make sure to clear out sensitive data
Hi Ted, On 09/15/2014 01:52 PM, Herbert Xu wrote: On Sun, Sep 07, 2014 at 11:23:38PM +0200, Daniel Borkmann wrote: Recently, in commit 13aa93c70e71 (random: add and use memzero_explicit() for clearing data), we have found that GCC may optimize some memset() cases away when it detects a stack variable is not being used anymore and going out of scope. This can happen, for example, in cases when we are clearing out sensitive information such as keying material or any e.g. intermediate results from crypto computations, etc. With the help of Coccinelle, we can figure out and fix such occurences in the crypto subsytem as well. Julia Lawall provided the following Coccinelle program: @@ type T; identifier x; @@ T x; ... when exists when any -memset +memzero_explicit (x, -0, ...) ... when != x when strict @@ type T; identifier x; @@ T x[...]; ... when exists when any -memset +memzero_explicit (x, -0, ...) ... when != x when strict Therefore, make use of the drop-in replacement memzero_explicit() for exactly such cases instead of using memset(). Signed-off-by: Daniel Borkmann dbork...@redhat.com Cc: Julia Lawall julia.law...@lip6.fr Cc: Herbert Xu herb...@gondor.apana.org.au Cc: Theodore Ts'o ty...@mit.edu Cc: Hannes Frederic Sowa han...@stressinduktion.org Acked-by: Herbert Xu herb...@gondor.apana.org.au Thanks, Sorry for the noise, but given Herbert acked the patch and in the random tree the infrastructure is at current present to fix this security issue, could you take it through random tree as originally proposed between the --- and diffstat line of this patch [1]? Thanks a lot, Daniel [1] http://www.spinics.net/lists/linux-crypto/msg11965.html -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] crypto: memzero_explicit - make sure to clear out sensitive data
Recently, in commit 13aa93c70e71 (random: add and use memzero_explicit() for clearing data), we have found that GCC may optimize some memset() cases away when it detects a stack variable is not being used anymore and going out of scope. This can happen, for example, in cases when we are clearing out sensitive information such as keying material or any e.g. intermediate results from crypto computations, etc. With the help of Coccinelle, we can figure out and fix such occurences in the crypto subsytem as well. Julia Lawall provided the following Coccinelle program: @@ type T; identifier s; @@ T s[...]; ... when any memset(s,...); ... when != s Therefore, make use of the drop-in replacement memzero_explicit() for exactly such cases instead of using memset(). Signed-off-by: Daniel Borkmann dbork...@redhat.com Cc: Julia Lawall julia.law...@lip6.fr Cc: Herbert Xu herb...@gondor.apana.org.au Cc: Theodore Ts'o ty...@mit.edu Cc: Hannes Frederic Sowa han...@stressinduktion.org --- Herbert, Ted, I have posted my original patch that introduces memzero_explicit() infrastructure along with its first users that fixes such occurences to Ted's random tree [1]. That means, this one last patch on that regard would be on top of it as an outcome of an audit in the crypto subsystem. Would the two of you be okay, if Ted takes it to random as well? Changes are minimal and uncomplicated, but I think it would be quite important to get fixed. Thanks! [1] https://git.kernel.org/cgit/linux/kernel/git/tytso/random.git/commit/?h=devid=13aa93c70e716b597d2451abee5e9f0b140b831c crypto/cts.c| 3 ++- crypto/sha1_generic.c | 2 +- crypto/sha256_generic.c | 5 ++--- crypto/sha512_generic.c | 2 +- crypto/tgr192.c | 4 ++-- crypto/wp512.c | 8 6 files changed, 12 insertions(+), 12 deletions(-) diff --git a/crypto/cts.c b/crypto/cts.c index 042223f..133f087 100644 --- a/crypto/cts.c +++ b/crypto/cts.c @@ -202,7 +202,8 @@ static int cts_cbc_decrypt(struct crypto_cts_ctx *ctx, /* 5. Append the tail (BB - Ln) bytes of Xn (tmp) to Cn to create En */ memcpy(s + bsize + lastn, tmp + lastn, bsize - lastn); /* 6. Decrypt En to create Pn-1 */ - memset(iv, 0, sizeof(iv)); + memzero_explicit(iv, sizeof(iv)); + sg_set_buf(sgsrc[0], s + bsize, bsize); sg_set_buf(sgdst[0], d, bsize); err = crypto_blkcipher_decrypt_iv(lcldesc, sgdst, sgsrc, bsize); diff --git a/crypto/sha1_generic.c b/crypto/sha1_generic.c index 4279480..7bb0474 100644 --- a/crypto/sha1_generic.c +++ b/crypto/sha1_generic.c @@ -64,7 +64,7 @@ int crypto_sha1_update(struct shash_desc *desc, const u8 *data, src = data + done; } while (done + SHA1_BLOCK_SIZE = len); - memset(temp, 0, sizeof(temp)); + memzero_explicit(temp, sizeof(temp)); partial = 0; } memcpy(sctx-buffer + partial, src, len - done); diff --git a/crypto/sha256_generic.c b/crypto/sha256_generic.c index 5433667..32c5e5e 100644 --- a/crypto/sha256_generic.c +++ b/crypto/sha256_generic.c @@ -210,10 +210,9 @@ static void sha256_transform(u32 *state, const u8 *input) /* clear any sensitive info... */ a = b = c = d = e = f = g = h = t1 = t2 = 0; - memset(W, 0, 64 * sizeof(u32)); + memzero_explicit(W, 64 * sizeof(u32)); } - static int sha224_init(struct shash_desc *desc) { struct sha256_state *sctx = shash_desc_ctx(desc); @@ -316,7 +315,7 @@ static int sha224_final(struct shash_desc *desc, u8 *hash) sha256_final(desc, D); memcpy(hash, D, SHA224_DIGEST_SIZE); - memset(D, 0, SHA256_DIGEST_SIZE); + memzero_explicit(D, SHA256_DIGEST_SIZE); return 0; } diff --git a/crypto/sha512_generic.c b/crypto/sha512_generic.c index 6ed124f..04d295a 100644 --- a/crypto/sha512_generic.c +++ b/crypto/sha512_generic.c @@ -238,7 +238,7 @@ static int sha384_final(struct shash_desc *desc, u8 *hash) sha512_final(desc, D); memcpy(hash, D, 48); - memset(D, 0, 64); + memzero_explicit(D, 64); return 0; } diff --git a/crypto/tgr192.c b/crypto/tgr192.c index 8740355..3c7af0d 100644 --- a/crypto/tgr192.c +++ b/crypto/tgr192.c @@ -612,7 +612,7 @@ static int tgr160_final(struct shash_desc *desc, u8 * out) tgr192_final(desc, D); memcpy(out, D, TGR160_DIGEST_SIZE); - memset(D, 0, TGR192_DIGEST_SIZE); + memzero_explicit(D, TGR192_DIGEST_SIZE); return 0; } @@ -623,7 +623,7 @@ static int tgr128_final(struct shash_desc *desc, u8 * out) tgr192_final(desc, D); memcpy(out, D, TGR128_DIGEST_SIZE); - memset(D, 0, TGR192_DIGEST_SIZE); + memzero_explicit(D, TGR192_DIGEST_SIZE); return 0; } diff --git a/crypto/wp512.c b/crypto/wp512.c index 180f1d6..ec64e77 100644 --- a/crypto/wp512.c +++ b/crypto/wp512.c @@ -1102,8 +1102,8 @@ static int wp384_final
Re: [PATCH] crypto: memzero_explicit - make sure to clear out sensitive data
Hi Milan, On 09/07/2014 07:15 PM, Milan Broz wrote: On 09/07/2014 06:46 PM, Daniel Borkmann wrote: Recently, in commit 13aa93c70e71 (random: add and use memzero_explicit() for clearing data), we have found that GCC may optimize some memset() cases away when it detects a stack variable is not being used anymore and going out of scope. This can happen, for example, in cases when we are clearing out sensitive information such as keying material or any e.g. intermediate results from crypto computations, etc. Hi, do you plan to send patches also for other crypto code in kernel? (I am almost sure we have the same pattern in dmcrypt.) If not, I can do this for the dmcrypt part. Yes, please feel free and go ahead. I have checked random driver, crypto and networking subsystem. With this patch that I've sent here, these three are covered (in networking, there was no such candidate, just one false positive in Bluetooth). But if you find other areas with a similar case, feel free to go ahead and fix it, very much appreciated. Thanks best, Daniel Milan -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH cryptodev] crypto: arch - use crypto_memneq instead of memcmp
Replace remaining occurences (just as we did in crypto/) under arch/*/crypto/ that make use of memcmp() for comparing keys or authentication tags for usage with crypto_memneq(). It can simply be used as a drop-in replacement for the normal memcmp(). Signed-off-by: Daniel Borkmann dbork...@redhat.com Cc: James Yonan ja...@openvpn.net --- arch/s390/crypto/des_s390.c| 6 +++--- arch/x86/crypto/aesni-intel_glue.c | 2 +- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/s390/crypto/des_s390.c b/arch/s390/crypto/des_s390.c index bcca01c..200f2a1 100644 --- a/arch/s390/crypto/des_s390.c +++ b/arch/s390/crypto/des_s390.c @@ -237,9 +237,9 @@ static int des3_setkey(struct crypto_tfm *tfm, const u8 *key, struct s390_des_ctx *ctx = crypto_tfm_ctx(tfm); u32 *flags = tfm-crt_flags; - if (!(memcmp(key, key[DES_KEY_SIZE], DES_KEY_SIZE) - memcmp(key[DES_KEY_SIZE], key[DES_KEY_SIZE * 2], - DES_KEY_SIZE)) + if (!(crypto_memneq(key, key[DES_KEY_SIZE], DES_KEY_SIZE) + crypto_memneq(key[DES_KEY_SIZE], key[DES_KEY_SIZE * 2], + DES_KEY_SIZE)) (*flags CRYPTO_TFM_REQ_WEAK_KEY)) { *flags |= CRYPTO_TFM_RES_WEAK_KEY; return -EINVAL; diff --git a/arch/x86/crypto/aesni-intel_glue.c b/arch/x86/crypto/aesni-intel_glue.c index 835488b..aba34b8 100644 --- a/arch/x86/crypto/aesni-intel_glue.c +++ b/arch/x86/crypto/aesni-intel_glue.c @@ -1001,7 +1001,7 @@ static int __driver_rfc4106_decrypt(struct aead_request *req) authTag, auth_tag_len); /* Compare generated tag with passed in tag. */ - retval = memcmp(src + tempCipherLen, authTag, auth_tag_len) ? + retval = crypto_memneq(src + tempCipherLen, authTag, auth_tag_len) ? -EBADMSG : 0; if (one_entry_in_sg) { -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH cryptodev] crypto: memneq: fix for archs without efficient unaligned access
Commit fe8c8a126806 introduced a possible build error for archs that do not have CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS set. :/ Fix this up by bringing else braces outside of the ifdef. Reported-by: Fengguang Wu fengguang...@intel.com Fixes: fe8c8a126806 (crypto: more robust crypto_memneq) Cc: Cesar Eduardo Barros ces...@cesarb.eti.br Signed-off-by: Daniel Borkmann dbork...@redhat.com --- Sending stand-alone fix as original patch is already applied. crypto/memneq.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/crypto/memneq.c b/crypto/memneq.c index 570f6f3..afed1bd 100644 --- a/crypto/memneq.c +++ b/crypto/memneq.c @@ -108,8 +108,9 @@ static inline unsigned long __crypto_memneq_16(const void *a, const void *b) OPTIMIZER_HIDE_VAR(neq); neq |= *(unsigned int *)(a+12) ^ *(unsigned int *)(b+12); OPTIMIZER_HIDE_VAR(neq); - } else { + } else #endif /* CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS */ + { neq |= *(unsigned char *)(a)^ *(unsigned char *)(b); OPTIMIZER_HIDE_VAR(neq); neq |= *(unsigned char *)(a+1) ^ *(unsigned char *)(b+1); -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3] crypto: more robust crypto_memneq
On 11/26/2013 10:44 PM, Cesar Eduardo Barros wrote: Em 26-11-2013 17:27, Daniel Borkmann escreveu: On 11/26/2013 01:00 AM, Cesar Eduardo Barros wrote: Compile-tested on x86_64. Actually with yet another version, I hoped that the compile-tested-only statement would eventually disappear, ohh well. ;) I did compile test each version ;-) including verifying (with make crypto/memneq.i) that the macro was really generating the expected inline assembly (with these #ifdef chains, one has to be careful with typos). (Actually, I compile tested with make crypto/memneq.o crypto/memneq.s crypto/memneq.i. I took a peek at the assembly to see if it made sense.) Resolving the OPTIMIZER_HIDE_VAR() macro for others than GCC jnto a barrier() seems a bit suboptimal, but assuming 99% of people will use GCC anyway, then for the minority of the remaining, they will worst case have a clever compiler and eventually mimic memcmp() in some situations, or have a not-so-clever compiler and execute the full code as is. I do not think any compiler other than gcc and icc can compile unmodified upstream kernel code. LLVM's clang would be the one which comes closest, but it has gcc-compatible inline assembly, as does icc AFAIK. The #define to barrier() within compiler-intel.h is for some compiler called ECC (not icc). From what I could find about it on a quick search, it appears to be some kind of Itanium compiler. That part of the header was added back in 2003, and I do not believe it is still relevant. A comment within that #ifdef block says Intel ECC compiler doesn't support gcc specific asm stmts, but there are many uses of unprotected inline assembly all over the kernel (including on the ia64 headers), so if that comment is true, the kernel will not compile with that compiler. It is probably a piece of leftover dead code. I only added to it because I am following RELOC_HIDE's example, and RELOC_HIDE is there. Yep. Acked-by: Daniel Borkmann dbork...@redhat.com Anyway, I think still better than the rather ugly Makefile workaround imho, so I'm generally fine with this. -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3] crypto: more robust crypto_memneq
On 11/26/2013 01:00 AM, Cesar Eduardo Barros wrote: Disabling compiler optimizations can be fragile, since a new optimization could be added to -O0 or -Os that breaks the assumptions the code is making. Instead of disabling compiler optimizations, use a dummy inline assembly (based on RELOC_HIDE) to block the problematic kinds of optimization, while still allowing other optimizations to be applied to the code. The dummy inline assembly is added after every OR, and has the accumulator variable as its input and output. The compiler is forced to assume that the dummy inline assembly could both depend on the accumulator variable and change the accumulator variable, so it is forced to compute the value correctly before the inline assembly, and cannot assume anything about its value after the inline assembly. This change should be enough to make crypto_memneq work correctly (with data-independent timing) even if it is inlined at its call sites. That can be done later in a followup patch. Compile-tested on x86_64. Actually with yet another version, I hoped that the compile-tested-only statement would eventually disappear, ohh well. ;) Signed-off-by: Cesar Eduardo Barros ces...@cesarb.eti.br Resolving the OPTIMIZER_HIDE_VAR() macro for others than GCC jnto a barrier() seems a bit suboptimal, but assuming 99% of people will use GCC anyway, then for the minority of the remaining, they will worst case have a clever compiler and eventually mimic memcmp() in some situations, or have a not-so-clever compiler and execute the full code as is. Anyway, I think still better than the rather ugly Makefile workaround imho, so I'm generally fine with this. -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] crypto: more robust crypto_memneq
On 11/25/2013 04:59 PM, James Yonan wrote: On 24/11/2013 14:12, Cesar Eduardo Barros wrote: Disabling compiler optimizations can be fragile, since a new optimization could be added to -O0 or -Os that breaks the assumptions the code is making. Instead of disabling compiler optimizations, use a dummy inline assembly (based on RELOC_HIDE) to block the problematic kinds of optimization, while still allowing other optimizations to be applied to the code. The dummy inline assembly is added after every OR, and has the accumulator variable as its input and output. The compiler is forced to assume that the dummy inline assembly could both depend on the accumulator variable and change the accumulator variable, so it is forced to compute the value correctly before the inline assembly, and cannot assume anything about its value after the inline assembly. This change should be enough to make crypto_memneq work correctly (with data-independent timing) even if it is inlined at its call sites. That can be done later in a followup patch. Compile-tested on x86_64. Signed-off-by: Cesar Eduardo Barros ces...@cesarb.eti.br This approach using __asm__ ( : =r (var) : 0 (var)) to try to prevent compiler optimizations of var is interesting. I like the fact that it's finer-grained than -Os and doesn't preclude inlining. Agreed. This looks much better than the Makefile workaround. Do we have a hard guarantee that in future, this will not be detected and optimized away by the compiler? Otherwise, works fine, e.g.: int main(void) { int foo = 5; __asm__ __volatile__ ( : =r (foo) : 0 (foo)); if (foo == 5) return 1; else return 0; } gcc -O2 -Wall foo.c, w/ asm code: Dump of assembler code for function main: 0x00400390 +0: mov$0x5,%eax 0x00400395 +5: cmp$0x5,%eax 0x00400398 +8: sete %al 0x0040039b +11: movzbl %al,%eax 0x0040039e +14: retq gcc -O2 -Wall foo.c, w/o asm code: Dump of assembler code for function main: 0x00400390 +0: mov$0x1,%eax 0x00400395 +5: retq One concern would be that __asm__ could be optimized out unless __volatile__ is present. James -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] crypto: more robust crypto_memneq
On 11/25/2013 11:35 PM, Cesar Eduardo Barros wrote: Disabling compiler optimizations can be fragile, since a new optimization could be added to -O0 or -Os that breaks the assumptions the code is making. Instead of disabling compiler optimizations, use a dummy inline assembly (based on RELOC_HIDE) to block the problematic kinds of optimization, while still allowing other optimizations to be applied to the code. The dummy inline assembly is added after every OR, and has the accumulator variable as its input and output. The compiler is forced to assume that the dummy inline assembly could both depend on the accumulator variable and change the accumulator variable, so it is forced to compute the value correctly before the inline assembly, and cannot assume anything about its value after the inline assembly. This change should be enough to make crypto_memneq work correctly (with data-independent timing) even if it is inlined at its call sites. That can be done later in a followup patch. Compile-tested on x86_64. Signed-off-by: Cesar Eduardo Barros ces...@cesarb.eti.br Acked-by: Daniel Borkmann dbork...@redhat.com Looks good, thanks! -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] crypto_mem_not_equal: add constant-time equality testing of memory regions
On 09/19/2013 02:13 AM, James Yonan wrote: [...] We can easily specify -Os in the Makefile rather than depending on #pragma optimize or __attribute__ optimize if they are considered broken. Re: arch/*/crypto/... asm, not sure it's worth it given the extra effort to develop, test, and maintain asm for all archs. The two things we care about (constant time and performance) seem readily achievable in C. Regarding O0 vs. Os, I would tend to prefer Os because it's much faster than O0, but still carries the desirable property that optimizations that increase code size are disabled. It seems that short-circuit optimizations would be disabled by this, since by definition a short-circuit optimization requires the addition of a compare and branch. Ok, if we can make sure that this would overwrite global defaults in any circumstances, then that approach should be fine, imho. I would suggest that you use the crypto_mem_not_equal() function that you originally had or that I was proposing, and still allow the possibility for an arch optimized version, if people want to. In that way, it can be kept simple and stupid and easy to review, just like all other util functions such as memcmp etc is implemented in [1]. [1] http://lingrok.org/xref/linux-net-next/lib/string.c#643 -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] crypto_mem_not_equal: add constant-time equality testing of memory regions
On 09/16/2013 07:10 PM, James Yonan wrote: On 16/09/2013 01:56, Daniel Borkmann wrote: On 09/15/2013 06:59 PM, James Yonan wrote: On 15/09/2013 09:45, Florian Weimer wrote: * James Yonan: + * Constant-time equality testing of memory regions. + * Returns 0 when data is equal, non-zero otherwise. + * Fast path if size == 16. + */ +noinline unsigned long crypto_mem_not_equal(const void *a, const void *b, size_t size) I think this should really return unsigned or int, to reduce the risk that the upper bytes are truncated because the caller uses an inappropriate type, resulting in a bogus zero result. Reducing the value to 0/1 probably doesn't hurt performance too much. It also doesn't encode any information about the location of the difference in the result value, which helps if that ever leaks. The problem with returning 0/1 within the function body of crypto_mem_not_equal is that it makes it easier for the compiler to introduce a short-circuit optimization. It might be better to move the test where the result is compared against 0 into an inline function: noinline unsigned long __crypto_mem_not_equal(const void *a, const void *b, size_t size); static inline int crypto_mem_not_equal(const void *a, const void *b, size_t size) { return __crypto_mem_not_equal(a, b, size) != 0UL ? 1 : 0; } This hides the fact that we are only interested in a boolean result from the compiler when it's compiling crypto_mem_not_equal.c, but also ensures type safety when users test the return value. It's also likely to have little or no performance impact. Well, the code snippet I've provided from NaCl [1] is not really fast-path as you say, but rather to prevent the compiler from doing such optimizations by having a transformation of the accumulated bits into 0 and 1 as an end result (likely to prevent a short circuit), plus it has static size, so no loops applied here that could screw up. Variable size could be done under arch/ in asm, and if not available, that just falls back to normal memcmp that is being transformed into a same return value. By that, all other archs could easily migrate afterwards. What do you think? [1] http://www.spinics.net/lists/linux-crypto/msg09558.html I'm not sure that the differentbits - 0/-1 transform in NaCl really buys us anything because we don't care very much about making the final test of differentbits != 0 constant-time. An attacker already knows whether the test succeeded or failed -- we care more about making the failure cases constant-time. To do this, we need to make sure that the compiler doesn't insert one or more early instructions to compare differentbits with 0xFF and then bypass the rest of the F(n) lines because it knows then that the value of differentbits cannot be changed by subsequent F(n) lines. It seems that all of the approaches that use |= to build up the differentbits value suffer from this problem. My proposed solution is rather than trying to outsmart the compiler with code that resists optimization, why not just turn optimization off directly with #pragma GCC optimize. Or better yet, use an optimization setting that provides reasonable speed without introducing potential short-circuit optimizations. By optimizing for size (Os), the compiler will need to turn off optimizations that add code for a possible speed benefit, and the kind of short-circuit optimizations that we are trying to avoid fall precisely into this class -- they add an instruction to check if the OR accumulator has all of its bits set, then if so, do an early exit. So by using Os, we still benefit from optimizations that increase speed but don't increase code size. While I was looking a bit further into this, I thought using an attribute like this might be a more clean variant ... diff --git a/include/linux/compiler.h b/include/linux/compiler.h index 92669cd..2505b1b 100644 --- a/include/linux/compiler.h +++ b/include/linux/compiler.h @@ -351,6 +351,11 @@ void ftrace_likely_update(struct ftrace_branch_data *f, int val, int expect); */ #define ACCESS_ONCE(x) (*(volatile typeof(x) *)(x)) +/* Tell GCC to turn off optimization for a particular function. */ +#ifndef __do_not_optimize +#define __do_not_optimize __attribute__((optimize(O0))) +#endif + /* Ignore/forbid kprobes attach on very low level functions marked by this attribute: */ #ifdef CONFIG_KPROBES # define __kprobes __attribute__((__section__(.kprobes.text))) ... however, then I found on the GCC developer mailing list [1]: That said, I consider the optimize attribute code seriously broken and unmaintained (but sometimes useful for debugging - and only that). [...] Does #pragma Gcc optimize work more reliably? No, it uses the same mechanism internally. [...] And if these options are so broken, should they be marked as such in the manual? [...] Probably yes. Therefore, I still need to ask ... what about the option of an arch/*/crypto/... asm
Re: [PATCH] crypto_mem_not_equal: add constant-time equality testing of memory regions
On 09/15/2013 06:59 PM, James Yonan wrote: On 15/09/2013 09:45, Florian Weimer wrote: * James Yonan: + * Constant-time equality testing of memory regions. + * Returns 0 when data is equal, non-zero otherwise. + * Fast path if size == 16. + */ +noinline unsigned long crypto_mem_not_equal(const void *a, const void *b, size_t size) I think this should really return unsigned or int, to reduce the risk that the upper bytes are truncated because the caller uses an inappropriate type, resulting in a bogus zero result. Reducing the value to 0/1 probably doesn't hurt performance too much. It also doesn't encode any information about the location of the difference in the result value, which helps if that ever leaks. The problem with returning 0/1 within the function body of crypto_mem_not_equal is that it makes it easier for the compiler to introduce a short-circuit optimization. It might be better to move the test where the result is compared against 0 into an inline function: noinline unsigned long __crypto_mem_not_equal(const void *a, const void *b, size_t size); static inline int crypto_mem_not_equal(const void *a, const void *b, size_t size) { return __crypto_mem_not_equal(a, b, size) != 0UL ? 1 : 0; } This hides the fact that we are only interested in a boolean result from the compiler when it's compiling crypto_mem_not_equal.c, but also ensures type safety when users test the return value. It's also likely to have little or no performance impact. Well, the code snippet I've provided from NaCl [1] is not really fast-path as you say, but rather to prevent the compiler from doing such optimizations by having a transformation of the accumulated bits into 0 and 1 as an end result (likely to prevent a short circuit), plus it has static size, so no loops applied here that could screw up. Variable size could be done under arch/ in asm, and if not available, that just falls back to normal memcmp that is being transformed into a same return value. By that, all other archs could easily migrate afterwards. What do you think? [1] http://www.spinics.net/lists/linux-crypto/msg09558.html -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] crypto_memcmp: add constant-time memcmp
On 09/11/2013 07:20 PM, James Yonan wrote: On 10/09/2013 12:57, Daniel Borkmann wrote: There was a similar patch posted some time ago [1] on lkml, where Florian (CC) made a good point in [2] that future compiler optimizations could short circuit on this. This issue should probably be addressed in such a patch here as well. [1] https://lkml.org/lkml/2013/2/10/131 [2] https://lkml.org/lkml/2013/2/11/381 On 11/09/2013 06:19, Marcelo Cerri wrote: The discussion that Daniel pointed out has another interesting point regarding the function name. I don't think it's a good idea to name it crypto_memcpy since it doesn't have behavior the same way as strcmp. Florian suggested in the thread names such crypto_mem_equal, which I think fits better here. Ok, here's another stab at this: * Changed the name to crypto_mem_not_equal. The not_equal seems to make more sense because the function returns a nonzero true value if the memory regions are not equal. Ok, sounds good. * Good point that a smart optimizer might add instructions to short-circuit the loop if all bits in ret have been set. One way to deal with this is to disable optimizations that might increase code size, since a short-circuit optimization in this case would require adding instructions. #pragma GCC optimize (Os) The nice thing about using #pragma is that older versions of gcc that don't recognize it will simply ignore it, and we can probably presume that older versions of gcc do not support a short-circuit optimization if the latest one does not. I did a quick test using gcc 3.4.6 at -O2, and did not see any evidence of a short-circuit optimization. * Improved performance when CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS is enabled. This makes the performance roughly on-par with memcmp. Hm, why don't we take fixed-size versions of Daniel J Bernstein from NaCl library [1], e.g. for comparing hashes? E.g. for 16 bytes: int crypto_verify(const unsigned char *x,const unsigned char *y) { unsigned int differentbits = 0; #define F(i) differentbits |= x[i] ^ y[i]; F(0) F(1) F(2) F(3) F(4) F(5) F(6) F(7) F(8) F(9) F(10) F(11) F(12) F(13) F(14) F(15) return (1 ((differentbits - 1) 8)) - 1; } It will return 0 if x[0], x[1], ..., x[15] are the same as y[0], y[1], ..., y[15], otherwise it returns -1. That's w/o for loops, so probably more compiler-proof ... [1] http://nacl.cr.yp.to/ #pragma GCC optimize (Os) noinline unsigned long crypto_mem_not_equal(const void *a, const void *b, size_t size) { unsigned long ret = 0; #ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS #if BITS_PER_LONG == 64 while (size = 8) { ret |= *(unsigned long *)a ^ *(unsigned long *)b; a += 8; b += 8; size -= 8; } if (!size) return ret; #endif /* BITS_PER_LONG == 64 */ if (sizeof(unsigned int) == 4) { while (size = 4) { ret |= *(unsigned int *)a ^ *(unsigned int *)b; a += 4; b += 4; size -= 4; } if (!size) return ret; } #endif /* CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS */ while (size 0) { ret |= *(unsigned char *)a ^ *(unsigned char *)b; a += 1; b += 1; size -= 1; } return ret; } James -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] crypto_memcmp: add constant-time memcmp
On 09/10/2013 08:38 PM, James Yonan wrote: When comparing MAC hashes, AEAD authentication tags, or other hash values in the context of authentication or integrity checking, it is important not to leak timing information to a potential attacker. Bytewise memory comparisons (such as memcmp) are usually optimized so that they return a nonzero value as soon as a mismatch is found. This early-return behavior can leak timing information, allowing an attacker to iteratively guess the correct result. This patch adds a new method crypto_memcmp that has the same prototype as standard memcmp, but that compares strings of the same length in roughly constant time (cache misses could change the timing, but since they don't reveal information about the content of the strings being compared, they are effectively benign). Note that crypto_memcmp (unlike memcmp) can only be used to test for equality or inequality, NOT greater-than or less-than. This is not an issue for its use-cases within the Crypto API. I tried to locate all of the places in the Crypto API where memcmp was being used for authentication or integrity checking, and convert them over to crypto_memcmp. crypto_memcmp is declared noinline and placed in its own source file because a very smart compiler (or LTO) might notice that the return value is always compared against zero/nonzero, and might then reintroduce the same early-return optimization that we are trying to avoid. There was a similar patch posted some time ago [1] on lkml, where Florian (CC) made a good point in [2] that future compiler optimizations could short circuit on this. This issue should probably be addressed in such a patch here as well. [1] https://lkml.org/lkml/2013/2/10/131 [2] https://lkml.org/lkml/2013/2/11/381 Signed-off-by: James Yonan ja...@openvpn.net --- crypto/Makefile | 2 +- crypto/asymmetric_keys/rsa.c| 5 +++-- crypto/authenc.c| 7 --- crypto/authencesn.c | 9 + crypto/ccm.c| 5 +++-- crypto/crypto_memcmp.c | 31 +++ crypto/gcm.c| 3 ++- include/crypto/internal/crypto_memcmp.h | 20 8 files changed, 69 insertions(+), 13 deletions(-) create mode 100644 crypto/crypto_memcmp.c create mode 100644 include/crypto/internal/crypto_memcmp.h diff --git a/crypto/Makefile b/crypto/Makefile index 2ba0df2..39a574d 100644 --- a/crypto/Makefile +++ b/crypto/Makefile @@ -3,7 +3,7 @@ # obj-$(CONFIG_CRYPTO) += crypto.o -crypto-y := api.o cipher.o compress.o +crypto-y := api.o cipher.o compress.o crypto_memcmp.o obj-$(CONFIG_CRYPTO_WORKQUEUE) += crypto_wq.o diff --git a/crypto/asymmetric_keys/rsa.c b/crypto/asymmetric_keys/rsa.c index 4a6a069..4f9a250 100644 --- a/crypto/asymmetric_keys/rsa.c +++ b/crypto/asymmetric_keys/rsa.c @@ -13,6 +13,7 @@ #include linux/module.h #include linux/kernel.h #include linux/slab.h +#include crypto/internal/crypto_memcmp.h #include public_key.h MODULE_LICENSE(GPL); @@ -189,12 +190,12 @@ static int RSA_verify(const u8 *H, const u8 *EM, size_t k, size_t hash_size, } } - if (memcmp(asn1_template, EM + T_offset, asn1_size) != 0) { + if (crypto_memcmp(asn1_template, EM + T_offset, asn1_size) != 0) { kleave( = -EBADMSG [EM[T] ASN.1 mismatch]); return -EBADMSG; } - if (memcmp(H, EM + T_offset + asn1_size, hash_size) != 0) { + if (crypto_memcmp(H, EM + T_offset + asn1_size, hash_size) != 0) { kleave( = -EKEYREJECTED [EM[T] hash mismatch]); return -EKEYREJECTED; } diff --git a/crypto/authenc.c b/crypto/authenc.c index ffce19d..82ca98f 100644 --- a/crypto/authenc.c +++ b/crypto/authenc.c @@ -13,6 +13,7 @@ #include crypto/aead.h #include crypto/internal/hash.h #include crypto/internal/skcipher.h +#include crypto/internal/crypto_memcmp.h #include crypto/authenc.h #include crypto/scatterwalk.h #include linux/err.h @@ -188,7 +189,7 @@ static void authenc_verify_ahash_update_done(struct crypto_async_request *areq, scatterwalk_map_and_copy(ihash, areq_ctx-sg, areq_ctx-cryptlen, authsize, 0); - err = memcmp(ihash, ahreq-result, authsize) ? -EBADMSG : 0; + err = crypto_memcmp(ihash, ahreq-result, authsize) ? -EBADMSG : 0; if (err) goto out; @@ -227,7 +228,7 @@ static void authenc_verify_ahash_done(struct crypto_async_request *areq, scatterwalk_map_and_copy(ihash, areq_ctx-sg, areq_ctx-cryptlen, authsize, 0); - err = memcmp(ihash, ahreq-result, authsize) ? -EBADMSG : 0; + err = crypto_memcmp(ihash, ahreq-result, authsize) ? -EBADMSG : 0; if (err) goto out; @@ -462,7 +463,7 @@ static int crypto_authenc_verify(struct
Re: [PATCH crypto] crypto: algboss: fix NULL pointer dereference in cryptomgr_probe
On 06/24/2013 03:59 PM, Herbert Xu wrote: ... Author: Herbert Xu herb...@gondor.apana.org.au Date: Mon Jun 24 21:57:42 2013 +0800 crypto: algboss - Hold ref count on larval ... The use of wait_for_completion_interruptible is intentional so that we don't lock up the thread if a bug causes us to never wake up. This bug is caused by the helper thread using the larval without holding a reference count on it. If the helper thread completes after the original thread requesting for help has gone away and destroyed the larval, then we get the crash above. So the fix is to hold a reference count on the larval. Cc: sta...@vger.kernel.org # 3.6+ Reported-by: Daniel Borkmann dbork...@redhat.com Signed-off-by: Herbert Xu herb...@gondor.apana.org.au Tested-by: Daniel Borkmann dbork...@redhat.com This fixes the panic for me with the reproducer I sent off-list. Thanks Herbert ! -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH crypto] crypto: algboss: fix NULL pointer dereference in cryptomgr_probe
After having fixed a NULL pointer dereference in SCTP 1abd165e (net: sctp: fix NULL pointer dereference in socket destruction), I ran into the following NULL pointer dereference in the crypto subsystem with the same reproducer, easily hit each time: BUG: unable to handle kernel NULL pointer dereference at (null) IP: [81070321] __wake_up_common+0x31/0x90 PGD 0 Oops: [#1] SMP Modules linked in: padlock_sha(F-) sha256_generic(F) sctp(F) libcrc32c(F) [..] CPU: 6 PID: 3326 Comm: cryptomgr_probe Tainted: GF3.10.0-rc5+ #1 Hardware name: Dell Inc. PowerEdge T410/0H19HD, BIOS 1.6.3 02/01/2011 task: 88007b6cf4e0 ti: 88007b7cc000 task.ti: 88007b7cc000 RIP: 0010:[81070321] [81070321] __wake_up_common+0x31/0x90 RSP: 0018:88007b7cde08 EFLAGS: 00010082 RAX: ffe8 RBX: 88003756c130 RCX: RDX: RSI: 0003 RDI: 88003756c130 RBP: 88007b7cde48 R08: R09: 88012b173200 R10: R11: R12: 0282 R13: 88003756c138 R14: R15: FS: () GS:88012fc6() knlGS: CS: 0010 DS: ES: CR0: 8005003b CR2: CR3: 01a0b000 CR4: 07e0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 Stack: 88007b7cde28 0003 88007b7cde28 88003756c130 0282 88003756c128 81227670 88007b7cde78 810722b7 88007cdcf000 81a90540 Call Trace: [81227670] ? crypto_alloc_pcomp+0x20/0x20 [810722b7] complete_all+0x47/0x60 [81227708] cryptomgr_probe+0x98/0xc0 [81227670] ? crypto_alloc_pcomp+0x20/0x20 [8106760e] kthread+0xce/0xe0 [81067540] ? kthread_freezable_should_stop+0x70/0x70 [815450dc] ret_from_fork+0x7c/0xb0 [81067540] ? kthread_freezable_should_stop+0x70/0x70 Code: 41 56 41 55 41 54 53 48 83 ec 18 66 66 66 66 90 89 75 cc 89 55 c8 4c 8d 6f 08 48 8b 57 08 41 89 cf 4d 89 c6 48 8d 42 e RIP [81070321] __wake_up_common+0x31/0x90 RSP 88007b7cde08 CR2: ---[ end trace b495b19270a4d37e ]--- My assumption is that the following is happening: the minimal SCTP tool runs under ``echo 1 /proc/sys/net/sctp/auth_enable'', hence it's making use of crypto_alloc_hash() via sctp_auth_init_hmacs(). It forks itself, heavily allocates, binds, listens and waits in accept on sctp sockets, and then randomly kills some of them (no need for an actual client in this case to hit this). Then, again, allocating, binding, etc, and then killing child processes. The problem that might be happening here is that cryptomgr requests the module to probe/load through cryptomgr_schedule_probe(), but before the thread handler cryptomgr_probe() returns, we return from the wait_for_completion_interruptible() function and probably already have cleared up larval, thus we run into a NULL pointer dereference when in cryptomgr_probe() complete_all() is being called. If we wait with wait_for_completion() instead, this panic will not occur anymore. This is valid, because in case a signal is pending, cryptomgr_probe() returns from probing anyway with properly calling complete_all(). Signed-off-by: Daniel Borkmann dbork...@redhat.com --- v1-v2: - Submitting as non-RFC - Slightly improving commit message crypto/algboss.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/crypto/algboss.c b/crypto/algboss.c index 769219b..eee89a5 100644 --- a/crypto/algboss.c +++ b/crypto/algboss.c @@ -195,7 +195,7 @@ static int cryptomgr_schedule_probe(struct crypto_larval *larval) if (IS_ERR(thread)) goto err_free_param; - wait_for_completion_interruptible(larval-completion); + wait_for_completion(larval-completion); return NOTIFY_STOP; -- 1.7.11.7 -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH crypto] crypto: algboss: fix NULL pointer dereference in cryptomgr_probe
After having fixed a NULL pointer dereference in SCTP 1abd165e (net: sctp: fix NULL pointer dereference in socket destruction), I ran into the following NULL pointer dereference in the crypto subsystem with the same reproducer, easily hit each time: BUG: unable to handle kernel NULL pointer dereference at (null) IP: [81070321] __wake_up_common+0x31/0x90 PGD 0 Oops: [#1] SMP Modules linked in: padlock_sha(F-) sha256_generic(F) sctp(F) libcrc32c(F) [..] CPU: 6 PID: 3326 Comm: cryptomgr_probe Tainted: GF3.10.0-rc5+ #1 Hardware name: Dell Inc. PowerEdge T410/0H19HD, BIOS 1.6.3 02/01/2011 task: 88007b6cf4e0 ti: 88007b7cc000 task.ti: 88007b7cc000 RIP: 0010:[81070321] [81070321] __wake_up_common+0x31/0x90 RSP: 0018:88007b7cde08 EFLAGS: 00010082 RAX: ffe8 RBX: 88003756c130 RCX: RDX: RSI: 0003 RDI: 88003756c130 RBP: 88007b7cde48 R08: R09: 88012b173200 R10: R11: R12: 0282 R13: 88003756c138 R14: R15: FS: () GS:88012fc6() knlGS: CS: 0010 DS: ES: CR0: 8005003b CR2: CR3: 01a0b000 CR4: 07e0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 Stack: 88007b7cde28 0003 88007b7cde28 88003756c130 0282 88003756c128 81227670 88007b7cde78 810722b7 88007cdcf000 81a90540 Call Trace: [81227670] ? crypto_alloc_pcomp+0x20/0x20 [810722b7] complete_all+0x47/0x60 [81227708] cryptomgr_probe+0x98/0xc0 [81227670] ? crypto_alloc_pcomp+0x20/0x20 [8106760e] kthread+0xce/0xe0 [81067540] ? kthread_freezable_should_stop+0x70/0x70 [815450dc] ret_from_fork+0x7c/0xb0 [81067540] ? kthread_freezable_should_stop+0x70/0x70 Code: 41 56 41 55 41 54 53 48 83 ec 18 66 66 66 66 90 89 75 cc 89 55 c8 4c 8d 6f 08 48 8b 57 08 41 89 cf 4d 89 c6 48 8d 42 e RIP [81070321] __wake_up_common+0x31/0x90 RSP 88007b7cde08 CR2: ---[ end trace b495b19270a4d37e ]--- My assumption is that the following is happening: the minimal SCTP tool runs under ``echo 1 /proc/sys/net/sctp/auth_enable'', hence it's making use of crypto_alloc_hash() via sctp_auth_init_hmacs(). It forks itself, heavily allocates, binds, listens and waits in accept on sctp sockets, and then randomly kills some of them (no need for an actual client in this case to hit this). Then, again, allocating, binding, etc, and then killing child processes. The problem that might be happening here is that cryptomgr requests the module to probe/load through cryptomgr_schedule_probe(), but before the thread handler cryptomgr_probe() returns, we return from the wait_for_completion_interruptible() function and probably already have cleared up larval, thus we run into a NULL pointer dereference when in cryptomgr_probe() complete_all() is being called. If we wait with wait_for_completion() instead, this panic will not occur anymore. This is valid, because in case a signal is pending, cryptomgr_probe() returns from probing anyway with properly calling complete_all(). Signed-off-by: Daniel Borkmann dbork...@redhat.com --- v1-v2: - Submitting as non-RFC - Slightly improving commit message crypto/algboss.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/crypto/algboss.c b/crypto/algboss.c index 769219b..eee89a5 100644 --- a/crypto/algboss.c +++ b/crypto/algboss.c @@ -195,7 +195,7 @@ static int cryptomgr_schedule_probe(struct crypto_larval *larval) if (IS_ERR(thread)) goto err_free_param; - wait_for_completion_interruptible(larval-completion); + wait_for_completion(larval-completion); return NOTIFY_STOP; -- 1.7.11.7 -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC, PATCH] crypto: algboss: fix NULL pointer dereference in cryptomgr_probe
After having fixed a NULL pointer dereference in SCTP 1abd165e (net: sctp: fix NULL pointer dereference in socket destruction), I ran into the following NULL pointer dereference in the crypto subsystem with the same reproducer, easily hit each time: BUG: unable to handle kernel NULL pointer dereference at (null) IP: [81070321] __wake_up_common+0x31/0x90 PGD 0 Oops: [#1] SMP Modules linked in: padlock_sha(F-) sha256_generic(F) sctp(F) libcrc32c(F) [..] CPU: 6 PID: 3326 Comm: cryptomgr_probe Tainted: GF3.10.0-rc5+ #1 Hardware name: Dell Inc. PowerEdge T410/0H19HD, BIOS 1.6.3 02/01/2011 task: 88007b6cf4e0 ti: 88007b7cc000 task.ti: 88007b7cc000 RIP: 0010:[81070321] [81070321] __wake_up_common+0x31/0x90 RSP: 0018:88007b7cde08 EFLAGS: 00010082 RAX: ffe8 RBX: 88003756c130 RCX: RDX: RSI: 0003 RDI: 88003756c130 RBP: 88007b7cde48 R08: R09: 88012b173200 R10: R11: R12: 0282 R13: 88003756c138 R14: R15: FS: () GS:88012fc6() knlGS: CS: 0010 DS: ES: CR0: 8005003b CR2: CR3: 01a0b000 CR4: 07e0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 Stack: 88007b7cde28 0003 88007b7cde28 88003756c130 0282 88003756c128 81227670 88007b7cde78 810722b7 88007cdcf000 81a90540 Call Trace: [81227670] ? crypto_alloc_pcomp+0x20/0x20 [810722b7] complete_all+0x47/0x60 [81227708] cryptomgr_probe+0x98/0xc0 [81227670] ? crypto_alloc_pcomp+0x20/0x20 [8106760e] kthread+0xce/0xe0 [81067540] ? kthread_freezable_should_stop+0x70/0x70 [815450dc] ret_from_fork+0x7c/0xb0 [81067540] ? kthread_freezable_should_stop+0x70/0x70 Code: 41 56 41 55 41 54 53 48 83 ec 18 66 66 66 66 90 89 75 cc 89 55 c8 4c 8d 6f 08 48 8b 57 08 41 89 cf 4d 89 c6 48 8d 42 e RIP [81070321] __wake_up_common+0x31/0x90 RSP 88007b7cde08 CR2: ---[ end trace b495b19270a4d37e ]--- My assumption is that the following is happening: the minimal SCTP tool runs under ``echo 1 /proc/sys/net/sctp/auth_enable'', hence it's making use of crypto_alloc_hash() via sctp_auth_init_hmacs(). It forks itself, heavily allocates, binds, listens and waits in accept on sctp sockets, and then randomly kills some of them (no need for an actual client in this case to hit this). Then, again, allocating, binding, etc, and then killing child processes. The problem that might be happening here is that cryptomgr requests the module to probe/load through cryptomgr_schedule_probe(), but before the thread handler cryptomgr_probe() returns, we return from the wait_for_completion_interruptible() function and probably already have cleared up larval, thus we run into a NULL pointer dereference when in cryptomgr_probe() complete_all() is being called. If we wait with wait_for_completion() instead, this panic will not occur anymore. Signed-off-by: Daniel Borkmann dbork...@redhat.com --- I'm not very familiar with the crypto subsystem and not entirely sure if this is the best solution. However, it has fixed the panic in my case. crypto/algboss.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/crypto/algboss.c b/crypto/algboss.c index 769219b..eee89a5 100644 --- a/crypto/algboss.c +++ b/crypto/algboss.c @@ -195,7 +195,7 @@ static int cryptomgr_schedule_probe(struct crypto_larval *larval) if (IS_ERR(thread)) goto err_free_param; - wait_for_completion_interruptible(larval-completion); + wait_for_completion(larval-completion); return NOTIFY_STOP; -- 1.7.11.7 -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html