RE: [openssl.org #555] RSA blinding MT patch

2003-04-03 Thread Bobco, Pete
Approximately ten days ago, I posted about having problems with the RSA Blinding patch 
that resulted in seeing an intermittent problem of dropping GIFs from my SSL server 
implementation.  I continued to see these problems until yesterday when I built with 
the 402 Snapshot for 0.9.6 (openssl-0.9.6-stable-SNAP-20030402.tar.gz).  

In short, I want to post for the record that the fixes in the 402 Snapshot for 0.9.6 
are quite acceptable for my implementation.  I realize there are some performance 
concerns being brought up, but I am quite HAPPY to see my stuff work for the first 
time since the original RSA blinding patch went in.

In closing, does the OpenSSL Release Group have any idea as to when OpenSSL 0.9.6j 
might be officially released?

--- Pete Bobco ---

-Original Message-
From: Tom Wu via RT [mailto:[EMAIL PROTECTED]
Sent: Wednesday, April 02, 2003 7:43 PM
Cc: [EMAIL PROTECTED]
Subject: Re: [openssl.org #555] RSA blinding MT patch



Bodo Moeller via RT wrote:
 [EMAIL PROTECTED] - Mon Mar 31 17:14:12 2003]:
 
 
The latest snapshots have not been fixed, some more patience is
required ...
 
 
 The next round of snapshots (20030402, to appear at
 ftp://ftp.openssl.org/snapshot;type=d in about six hours)
 should solve the multi-threading problems.  Please test them when they
 are available.

The good news is that the fix in the snapshot fixes the problem, but the 
bad news is that it seems to kill performance in my benchmarks.  On a 
P3-750 running Linux, I get 106 RSA sign/s (1024-bit) with my patch, 
regardless of the number of simultaneous threads.  With the snapshot 
fix, I get 102 RSA sign/s with one thread, but if I try with 2 or more 
threads it drops down to 81 sign/s.

It's quite possible that I've misconfigured something on my own end, but 
I suspect that it is more likely that the local blinding operation is 
slowing things down.  In the case where the blinding struct is owned by 
a different thread from the one doing an RSA op, the code has to do a 
modexp and a mod inverse, as opposed to the two squarings that the 
update normally does.  I believe that on most if not all platforms, the 
cost of putting critical sections around the blinding convert/update 
will be drastically smaller than the cost of the extra local blinding 
computation.

Tom
-- 
Tom Wu
Chief Security Architect
Arcot Systems
(408) 969-6124

__
OpenSSL Project http://www.openssl.org
Development Mailing List   [EMAIL PROTECTED]
Automated List Manager   [EMAIL PROTECTED]
__
OpenSSL Project http://www.openssl.org
Development Mailing List   [EMAIL PROTECTED]
Automated List Manager   [EMAIL PROTECTED]


Re: [openssl.org #555] RSA blinding MT patch

2003-04-03 Thread Richard Levitte - VMS Whacker
In message [EMAIL PROTECTED] on Thu, 3 Apr 2003 11:26:58 -0600, Bobco, Pete 
[EMAIL PROTECTED] said:

Pete.Bobco In closing, does the OpenSSL Release Group have any idea
Pete.Bobco as to when OpenSSL 0.9.6j might be officially released?

I don't know, but I'm guessing someone is working on the performance
problem.  Therefore, I can only say soon, and I do realise I've said
that before.

Be sure that we're working toward as much perfection as we can, and
will do a release as soon as it's pronounced done.

-- 
Richard Levitte   \ Spannvägen 38, II \ [EMAIL PROTECTED]
[EMAIL PROTECTED]  \ S-168 35  BROMMA  \ T: +46-8-26 52 47
\  SWEDEN   \ or +46-708-26 53 44
Procurator Odiosus Ex Infernis-- [EMAIL PROTECTED]
Member of the OpenSSL development team: http://www.openssl.org/

Unsolicited commercial email is subject to an archival fee of $400.
See http://www.stacken.kth.se/~levitte/mail/ for more info.
__
OpenSSL Project http://www.openssl.org
Development Mailing List   [EMAIL PROTECTED]
Automated List Manager   [EMAIL PROTECTED]


Re: [openssl.org #555] RSA blinding MT patch

2003-04-03 Thread (Bodo Moeller) via RT

Tom Wu via RT [EMAIL PROTECTED]:
 Bodo Moeller via RT wrote:

 The next round of snapshots (20030402, to appear at
 ftp://ftp.openssl.org/snapshot;type=d in about six hours)
 should solve the multi-threading problems.  Please test them when they
 are available.

 The good news is that the fix in the snapshot fixes the problem, but the 
 bad news is that it seems to kill performance in my benchmarks.  On a 
 P3-750 running Linux, I get 106 RSA sign/s (1024-bit) with my patch, 
 regardless of the number of simultaneous threads.  With the snapshot 
 fix, I get 102 RSA sign/s with one thread, but if I try with 2 or more 
 threads it drops down to 81 sign/s.
 
 It's quite possible that I've misconfigured something on my own end, but 
 I suspect that it is more likely that the local blinding operation is 
 slowing things down.

Yes, surely this is what is happening, local blinding is somewhat
expensive.

   In the case where the blinding struct is owned by 
 a different thread from the one doing an RSA op, the code has to do a 
 modexp and a mod inverse, as opposed to the two squarings that the 
 update normally does.

These two squarings should be changed, though -- OpenSSL should use a
random new blinding factor after a couple of RSA secret key operations
instead of predictably updating the factor (I didn't want to change
all of this at once).  So the update code will be slower in the
future; not every time, but sometimes.

I believe that on most if not all platforms, the 
 cost of putting critical sections around the blinding convert/update 
 will be drastically smaller than the cost of the extra local blinding 
 computation.

This depends: on a single-processor machine, indeed additional locking
should usually be faster than using local blinding; but for
multi-processor systems, the cost of locking could be quite high.


There are some strategies that could be used to make blinding faster
without expensive locking, but these would require incompatible
changes to the RSA and/or BN_BLINDING structures (the addition of
thread_id is an incompatible change in theory, but it's one that does
not directly affect applications).


-- 
Bodo Möller [EMAIL PROTECTED]
PGP http://www.informatik.tu-darmstadt.de/TI/Mitarbeiter/moeller/0x36d2c658.html
* TU Darmstadt, Theoretische Informatik, Alexanderstr. 10, D-64283 Darmstadt
* Tel. +49-6151-16-6628, Fax +49-6151-16-6036

__
OpenSSL Project http://www.openssl.org
Development Mailing List   [EMAIL PROTECTED]
Automated List Manager   [EMAIL PROTECTED]


Re: [openssl.org #555] RSA blinding MT patch

2003-04-03 Thread Tom Wu via RT

(Bodo Moeller) via RT wrote:
 Tom Wu via RT [EMAIL PROTECTED]:
 
 
  In the case where the blinding struct is owned by 
a different thread from the one doing an RSA op, the code has to do a 
modexp and a mod inverse, as opposed to the two squarings that the 
update normally does.
 
 
 These two squarings should be changed, though -- OpenSSL should use a
 random new blinding factor after a couple of RSA secret key operations
 instead of predictably updating the factor (I didn't want to change
 all of this at once).  So the update code will be slower in the
 future; not every time, but sometimes.

Is there any established wisdom on the security implications of 
refreshing the blinding factor?  Assuming that the initial blinding 
value had sufficient entropy and was unknown to an attacker, is there 
still some way to carry out a timing attack given the knowledge that the 
host is just squaring A after each RSA operation?

I also wonder if slowing down blinding would weaken the rationale for 
turning it on all the time when doing RSA?

 
 
   I believe that on most if not all platforms, the 
cost of putting critical sections around the blinding convert/update 
will be drastically smaller than the cost of the extra local blinding 
computation.
 
 
 This depends: on a single-processor machine, indeed additional locking
 should usually be faster than using local blinding; but for
 multi-processor systems, the cost of locking could be quite high.

I just tried benchmarking the snapshot code against my earlier patch on 
an 8-way P3-700 server (Win2K AS).  My patch gets ~100 RSA sign/s 
(1024-bit) with a single thread and peaks at ~790 RSA sign/s with 8 
threads.  The 0402 snapshot also starts at ~100 RSA sign/s with 1 thread 
and peaks at ~650 RSA sign/s with 8 threads.

 
 There are some strategies that could be used to make blinding faster
 without expensive locking, but these would require incompatible
 changes to the RSA and/or BN_BLINDING structures (the addition of
 thread_id is an incompatible change in theory, but it's one that does
 not directly affect applications).
 

As for what to do for the short term in 0.9.6j or 0.9.7b, is there any 
chance of using my original patch with the _r methods?  It doesn't 
change either the RSA or BN_BLINDING structures at all, and seems to 
give the least slowdown on both single and multi-processor systems. 
Perhaps the always square A behavior should be the default if it is 
good enough to block the obvious timing attack, with future options to 
enable random refresh strategies if the caller wants to do a 
performance/security tradeoff.

Tom
-- 
Tom Wu
Chief Security Architect
Arcot Systems
(408) 969-6124

__
OpenSSL Project http://www.openssl.org
Development Mailing List   [EMAIL PROTECTED]
Automated List Manager   [EMAIL PROTECTED]


Re: [openssl.org #555] RSA blinding MT patch

2003-04-03 Thread Bodo Moeller via RT

Tom Wu via RT [EMAIL PROTECTED]:
 (Bodo Moeller) via RT wrote:
 Tom Wu via RT [EMAIL PROTECTED]:

 Is there any established wisdom on the security implications of 
 refreshing the blinding factor?  Assuming that the initial blinding 
 value had sufficient entropy and was unknown to an attacker, is there 
 still some way to carry out a timing attack given the knowledge that the 
 host is just squaring A after each RSA operation?

I am not aware of descriptions of such attacks.  Maybe it would even
suffice to randomly square or cube the factor to avoid potential
attacks, but better safe than sorry ...

 I also wonder if slowing down blinding would weaken the rationale for 
 turning it on all the time when doing RSA?

Well, in any case you can still turn it off if it is too slow.


I believe that on most if not all platforms, the 
 cost of putting critical sections around the blinding convert/update 
 will be drastically smaller than the cost of the extra local blinding 
 computation.

 This depends: on a single-processor machine, indeed additional locking
 should usually be faster than using local blinding; but for
 multi-processor systems, the cost of locking could be quite high.

 I just tried benchmarking the snapshot code against my earlier patch on 
 an 8-way P3-700 server (Win2K AS).  My patch gets ~100 RSA sign/s 
 (1024-bit) with a single thread and peaks at ~790 RSA sign/s with 8 
 threads.  The 0402 snapshot also starts at ~100 RSA sign/s with 1 thread 
 and peaks at ~650 RSA sign/s with 8 threads.

Thanks for the timings.  One thing to take into account when
interpreting these is that some additional random blinding should be
added to your patch; maybe once in ten or hundred RSA operations, so
the timing difference would not really change a lot.  A more important
aspect is that you are comparing just the case that multiple threads
do share an RSA structure.  A different scenario is that you have
multiple threads with *individual* RSA structures -- then the snapshot
version will be very fast while the version with locking will be
unnecessarily slowed down because the locks are global.  This is why
we are trying to avoid excessive locking.


 There are some strategies that could be used to make blinding faster
 without expensive locking, but these would require incompatible
 changes to the RSA and/or BN_BLINDING structures (the addition of
 thread_id is an incompatible change in theory, but it's one that does
 not directly affect applications).

 As for what to do for the short term in 0.9.6j or 0.9.7b, is there any 
 chance of using my original patch with the _r methods?  It doesn't 
 change either the RSA or BN_BLINDING structures at all, and seems to 
 give the least slowdown on both single and multi-processor systems. 
 Perhaps the always square A behavior should be the default if it is 
 good enough to block the obvious timing attack, with future options to 
 enable random refresh strategies if the caller wants to do a 
 performance/security tradeoff.

One idea to consider is to include a second BN_BLINDING slot with each
RSA object: the first BN_BLINDING object would be single-threaded (as
in the current snapshots), the second one would be shared between all
the other threads (and require locking).  The major problem is how to
do this without harming compatibility too much.


-- 
Bodo Möller [EMAIL PROTECTED]
PGP http://www.informatik.tu-darmstadt.de/TI/Mitarbeiter/moeller/0x36d2c658.html
* TU Darmstadt, Theoretische Informatik, Alexanderstr. 10, D-64283 Darmstadt
* Tel. +49-6151-16-6628, Fax +49-6151-16-6036

__
OpenSSL Project http://www.openssl.org
Development Mailing List   [EMAIL PROTECTED]
Automated List Manager   [EMAIL PROTECTED]