RE: [openssl.org #555] RSA blinding MT patch
Approximately ten days ago, I posted about having problems with the RSA Blinding patch that resulted in seeing an intermittent problem of dropping GIFs from my SSL server implementation. I continued to see these problems until yesterday when I built with the 402 Snapshot for 0.9.6 (openssl-0.9.6-stable-SNAP-20030402.tar.gz). In short, I want to post for the record that the fixes in the 402 Snapshot for 0.9.6 are quite acceptable for my implementation. I realize there are some performance concerns being brought up, but I am quite HAPPY to see my stuff work for the first time since the original RSA blinding patch went in. In closing, does the OpenSSL Release Group have any idea as to when OpenSSL 0.9.6j might be officially released? --- Pete Bobco --- -Original Message- From: Tom Wu via RT [mailto:[EMAIL PROTECTED] Sent: Wednesday, April 02, 2003 7:43 PM Cc: [EMAIL PROTECTED] Subject: Re: [openssl.org #555] RSA blinding MT patch Bodo Moeller via RT wrote: [EMAIL PROTECTED] - Mon Mar 31 17:14:12 2003]: The latest snapshots have not been fixed, some more patience is required ... The next round of snapshots (20030402, to appear at ftp://ftp.openssl.org/snapshot;type=d in about six hours) should solve the multi-threading problems. Please test them when they are available. The good news is that the fix in the snapshot fixes the problem, but the bad news is that it seems to kill performance in my benchmarks. On a P3-750 running Linux, I get 106 RSA sign/s (1024-bit) with my patch, regardless of the number of simultaneous threads. With the snapshot fix, I get 102 RSA sign/s with one thread, but if I try with 2 or more threads it drops down to 81 sign/s. It's quite possible that I've misconfigured something on my own end, but I suspect that it is more likely that the local blinding operation is slowing things down. In the case where the blinding struct is owned by a different thread from the one doing an RSA op, the code has to do a modexp and a mod inverse, as opposed to the two squarings that the update normally does. I believe that on most if not all platforms, the cost of putting critical sections around the blinding convert/update will be drastically smaller than the cost of the extra local blinding computation. Tom -- Tom Wu Chief Security Architect Arcot Systems (408) 969-6124 __ OpenSSL Project http://www.openssl.org Development Mailing List [EMAIL PROTECTED] Automated List Manager [EMAIL PROTECTED] __ OpenSSL Project http://www.openssl.org Development Mailing List [EMAIL PROTECTED] Automated List Manager [EMAIL PROTECTED]
Re: [openssl.org #555] RSA blinding MT patch
In message [EMAIL PROTECTED] on Thu, 3 Apr 2003 11:26:58 -0600, Bobco, Pete [EMAIL PROTECTED] said: Pete.Bobco In closing, does the OpenSSL Release Group have any idea Pete.Bobco as to when OpenSSL 0.9.6j might be officially released? I don't know, but I'm guessing someone is working on the performance problem. Therefore, I can only say soon, and I do realise I've said that before. Be sure that we're working toward as much perfection as we can, and will do a release as soon as it's pronounced done. -- Richard Levitte \ Spannvägen 38, II \ [EMAIL PROTECTED] [EMAIL PROTECTED] \ S-168 35 BROMMA \ T: +46-8-26 52 47 \ SWEDEN \ or +46-708-26 53 44 Procurator Odiosus Ex Infernis-- [EMAIL PROTECTED] Member of the OpenSSL development team: http://www.openssl.org/ Unsolicited commercial email is subject to an archival fee of $400. See http://www.stacken.kth.se/~levitte/mail/ for more info. __ OpenSSL Project http://www.openssl.org Development Mailing List [EMAIL PROTECTED] Automated List Manager [EMAIL PROTECTED]
Re: [openssl.org #555] RSA blinding MT patch
Tom Wu via RT [EMAIL PROTECTED]: Bodo Moeller via RT wrote: The next round of snapshots (20030402, to appear at ftp://ftp.openssl.org/snapshot;type=d in about six hours) should solve the multi-threading problems. Please test them when they are available. The good news is that the fix in the snapshot fixes the problem, but the bad news is that it seems to kill performance in my benchmarks. On a P3-750 running Linux, I get 106 RSA sign/s (1024-bit) with my patch, regardless of the number of simultaneous threads. With the snapshot fix, I get 102 RSA sign/s with one thread, but if I try with 2 or more threads it drops down to 81 sign/s. It's quite possible that I've misconfigured something on my own end, but I suspect that it is more likely that the local blinding operation is slowing things down. Yes, surely this is what is happening, local blinding is somewhat expensive. In the case where the blinding struct is owned by a different thread from the one doing an RSA op, the code has to do a modexp and a mod inverse, as opposed to the two squarings that the update normally does. These two squarings should be changed, though -- OpenSSL should use a random new blinding factor after a couple of RSA secret key operations instead of predictably updating the factor (I didn't want to change all of this at once). So the update code will be slower in the future; not every time, but sometimes. I believe that on most if not all platforms, the cost of putting critical sections around the blinding convert/update will be drastically smaller than the cost of the extra local blinding computation. This depends: on a single-processor machine, indeed additional locking should usually be faster than using local blinding; but for multi-processor systems, the cost of locking could be quite high. There are some strategies that could be used to make blinding faster without expensive locking, but these would require incompatible changes to the RSA and/or BN_BLINDING structures (the addition of thread_id is an incompatible change in theory, but it's one that does not directly affect applications). -- Bodo Möller [EMAIL PROTECTED] PGP http://www.informatik.tu-darmstadt.de/TI/Mitarbeiter/moeller/0x36d2c658.html * TU Darmstadt, Theoretische Informatik, Alexanderstr. 10, D-64283 Darmstadt * Tel. +49-6151-16-6628, Fax +49-6151-16-6036 __ OpenSSL Project http://www.openssl.org Development Mailing List [EMAIL PROTECTED] Automated List Manager [EMAIL PROTECTED]
Re: [openssl.org #555] RSA blinding MT patch
(Bodo Moeller) via RT wrote: Tom Wu via RT [EMAIL PROTECTED]: In the case where the blinding struct is owned by a different thread from the one doing an RSA op, the code has to do a modexp and a mod inverse, as opposed to the two squarings that the update normally does. These two squarings should be changed, though -- OpenSSL should use a random new blinding factor after a couple of RSA secret key operations instead of predictably updating the factor (I didn't want to change all of this at once). So the update code will be slower in the future; not every time, but sometimes. Is there any established wisdom on the security implications of refreshing the blinding factor? Assuming that the initial blinding value had sufficient entropy and was unknown to an attacker, is there still some way to carry out a timing attack given the knowledge that the host is just squaring A after each RSA operation? I also wonder if slowing down blinding would weaken the rationale for turning it on all the time when doing RSA? I believe that on most if not all platforms, the cost of putting critical sections around the blinding convert/update will be drastically smaller than the cost of the extra local blinding computation. This depends: on a single-processor machine, indeed additional locking should usually be faster than using local blinding; but for multi-processor systems, the cost of locking could be quite high. I just tried benchmarking the snapshot code against my earlier patch on an 8-way P3-700 server (Win2K AS). My patch gets ~100 RSA sign/s (1024-bit) with a single thread and peaks at ~790 RSA sign/s with 8 threads. The 0402 snapshot also starts at ~100 RSA sign/s with 1 thread and peaks at ~650 RSA sign/s with 8 threads. There are some strategies that could be used to make blinding faster without expensive locking, but these would require incompatible changes to the RSA and/or BN_BLINDING structures (the addition of thread_id is an incompatible change in theory, but it's one that does not directly affect applications). As for what to do for the short term in 0.9.6j or 0.9.7b, is there any chance of using my original patch with the _r methods? It doesn't change either the RSA or BN_BLINDING structures at all, and seems to give the least slowdown on both single and multi-processor systems. Perhaps the always square A behavior should be the default if it is good enough to block the obvious timing attack, with future options to enable random refresh strategies if the caller wants to do a performance/security tradeoff. Tom -- Tom Wu Chief Security Architect Arcot Systems (408) 969-6124 __ OpenSSL Project http://www.openssl.org Development Mailing List [EMAIL PROTECTED] Automated List Manager [EMAIL PROTECTED]
Re: [openssl.org #555] RSA blinding MT patch
Tom Wu via RT [EMAIL PROTECTED]: (Bodo Moeller) via RT wrote: Tom Wu via RT [EMAIL PROTECTED]: Is there any established wisdom on the security implications of refreshing the blinding factor? Assuming that the initial blinding value had sufficient entropy and was unknown to an attacker, is there still some way to carry out a timing attack given the knowledge that the host is just squaring A after each RSA operation? I am not aware of descriptions of such attacks. Maybe it would even suffice to randomly square or cube the factor to avoid potential attacks, but better safe than sorry ... I also wonder if slowing down blinding would weaken the rationale for turning it on all the time when doing RSA? Well, in any case you can still turn it off if it is too slow. I believe that on most if not all platforms, the cost of putting critical sections around the blinding convert/update will be drastically smaller than the cost of the extra local blinding computation. This depends: on a single-processor machine, indeed additional locking should usually be faster than using local blinding; but for multi-processor systems, the cost of locking could be quite high. I just tried benchmarking the snapshot code against my earlier patch on an 8-way P3-700 server (Win2K AS). My patch gets ~100 RSA sign/s (1024-bit) with a single thread and peaks at ~790 RSA sign/s with 8 threads. The 0402 snapshot also starts at ~100 RSA sign/s with 1 thread and peaks at ~650 RSA sign/s with 8 threads. Thanks for the timings. One thing to take into account when interpreting these is that some additional random blinding should be added to your patch; maybe once in ten or hundred RSA operations, so the timing difference would not really change a lot. A more important aspect is that you are comparing just the case that multiple threads do share an RSA structure. A different scenario is that you have multiple threads with *individual* RSA structures -- then the snapshot version will be very fast while the version with locking will be unnecessarily slowed down because the locks are global. This is why we are trying to avoid excessive locking. There are some strategies that could be used to make blinding faster without expensive locking, but these would require incompatible changes to the RSA and/or BN_BLINDING structures (the addition of thread_id is an incompatible change in theory, but it's one that does not directly affect applications). As for what to do for the short term in 0.9.6j or 0.9.7b, is there any chance of using my original patch with the _r methods? It doesn't change either the RSA or BN_BLINDING structures at all, and seems to give the least slowdown on both single and multi-processor systems. Perhaps the always square A behavior should be the default if it is good enough to block the obvious timing attack, with future options to enable random refresh strategies if the caller wants to do a performance/security tradeoff. One idea to consider is to include a second BN_BLINDING slot with each RSA object: the first BN_BLINDING object would be single-threaded (as in the current snapshots), the second one would be shared between all the other threads (and require locking). The major problem is how to do this without harming compatibility too much. -- Bodo Möller [EMAIL PROTECTED] PGP http://www.informatik.tu-darmstadt.de/TI/Mitarbeiter/moeller/0x36d2c658.html * TU Darmstadt, Theoretische Informatik, Alexanderstr. 10, D-64283 Darmstadt * Tel. +49-6151-16-6628, Fax +49-6151-16-6036 __ OpenSSL Project http://www.openssl.org Development Mailing List [EMAIL PROTECTED] Automated List Manager [EMAIL PROTECTED]