RE: 0.9.8: cfb_enc.c bug? and AES speed on Win64/x64

2005-07-08 Thread David C. Partridge
Instead of doing what was intended, moving the string up one place, the
code has different behaviour.

Yes, it will fill the buffer with H which is what I would expect to happen
- not immediately obvious, but sensible.  (any 370 assembler guys will
recognise MVC as doing this).

If you want to copy from one mem location to another even if they overlap
*and* preserve the contents, then you should use memmove and pay the
overhead of the temporary buffer it probably allocates.

Dave 


__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   [EMAIL PROTECTED]


AES CTR mode implementation

2005-07-08 Thread David C. Partridge
I've been looking at the AES CTR mode implementation in 0.9.7

The counter increment function blindly assumes that the counter value can be
incremented across the whole 128 bits of the counter block.

If you look at (e.g.) RFC3686 or the NIST 800-38A publication, then they
both envisage a counter block that incorporates a nonce and a block counter.

e.g. RFC 3686 specifies a counter block like:

0   1   2   3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |Nonce  |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |  Initialization Vector (IV)   |
   |   |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | Block Counter |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

then when the low order 32 bits overflows, the IV value will be overwritten
in the current implementation.

Shouldn't the AES CTR mode operation specify the number of bits to be used
for the block counter and keep track to ensure the no more than 2^(block
counter bits) are encrypted for this session?

I've not had any chance to look at the 0.9.8 code yet, so apologies if this
is fixed in the new release.

Regards,
David C. Partridge
Technical Products Director
Primeur Security Services
Tel: +44 (0)1926 511058
Mobile: +44 (0)7713 880197


__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   [EMAIL PROTECTED]


RE: AES CTR mode implementation

2005-07-08 Thread David C. Partridge
800-38A essentially says up to impementator, doesn't it?
The standard incrementing function can apply either to an entire block
or to a part of a block.

Hmmm OK I do see you point here.  I was sure I'd seen a discussion on the
net about this saying that it was dangerous to (e.g.) start the counter at
zero, and that a nonce should be built in, and that this part should remain
constant.   But, now that I've gone searching for it again I can't find it
:-(

I wonder why RFC3686 goes to the lengths it does to specify such a complex
counter block with only the low order 32 bits being incremented???

Dave

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On Behalf Of Andy Polyakov
Sent: 08 July 2005 13:23
To: openssl-dev@openssl.org
Subject: Re: AES CTR mode implementation

 The counter increment function blindly assumes that the counter value 
 can be incremented across the whole 128 bits of the counter block.

Correct, which is why it's called AES_ctr128_*.

 If you look at (e.g.) RFC3686 or the NIST 800-38A publication, then 
 they both envisage a counter block that incorporates a nonce and a block
counter.

800-38A essentialy says up to impementator, doesn't it? The standard
incrementing function can apply either to an entire block or to a part of a
block.

 e.g. RFC 3686 specifies a counter block like:
 
 0   1   2   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Nonce  |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|  Initialization Vector (IV)   |
|   |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Block Counter |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 
 then when the low order 32 bits overflows, the IV value will be 
 overwritten in the current implementation.
 
 Shouldn't the AES CTR mode operation specify the number of bits to be 
 used for the block counter and keep track to ensure the no more than 
 2^(block counter bits) are encrypted for this session?

One can discuss additional function[s], AES_ctr_ipsec perhaps or
AES_ctr_variable, which would provide for this, but it would be
inappropriate to modify AES_ctr128_*. In other words it's not a matter for
fixing present code, but extending functionality with new code. Is there
broader interesent for ipsec-specific function than for variable? 
BTW I have AES_CCM_ipsec implementation pending. A.

__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   [EMAIL PROTECTED]



__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   [EMAIL PROTECTED]


RE: [openssl.org #1096] Minor documentation bugs

2005-06-03 Thread David C. Partridge
The problem with pthread_self() is that the value it returns is defined to
be opaque, and isn't necessarily (e.g.) and unsigned long (32 bit), though
many Unix and Unix like systems do use a 32 bit value ...

Dave


__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   [EMAIL PROTECTED]


RE: OpenSSL use of DCLP may not be thread-safe on multiple processors

2005-04-07 Thread David C. Partridge
Thanks all.

It strikes me that the H/W designers have played a bit fast and loose with
the cache consistency issue here - I believe I understand the C/C++
optimisation issues, and these CAN be worked around (IMHO) within the rules
of the standard by using bool in some cases.

However I've notified our dev folks to remove the few cases where we've used
this technique as it is certainly dangerous.

Dave


__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   [EMAIL PROTECTED]


RE: OpenSSL use of DCLP may not be thread-safe on multiple processors

2005-04-06 Thread David C. Partridge
oops ... First test should of course read:

Singleton* Singleton::instance()
{
  if (!initialised) // 1st test

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] Behalf Of David C. Partridge
Sent: 06 April 2005 14:08
To: openssl-dev@openssl.org
Cc: [EMAIL PROTECTED]
Subject: RE: OpenSSL use of DCLP may not be thread-safe on multiple
processors


I've just read the paper, and I believe that the following variation on
the code would work and would avoid the MP unsafe issues raised because
bool is defined to be a single byte.

Further-more, I'm pretty certain that it also resolves the issues with the
order of construction
and setting of the pointer in the singleton case, and probably resolves all
the other over smart optimisation issues as well

static volatile bool initialised=false;

if (!initialised)
{
CRYPTO_w_lock(CRYPTO_LOCK_XXX);
/* Avoid a race condition by checking again inside this lock */
if (!initialised)
{
x = ...;
initialised=true;   // Atomic operation
}
CRYPTO_w_unlock(CRYPTO_LOCK_XXX);
}
/* Now, make use of x */

Or expressed in terms of the Singleton pattern:

in the header for the Singleton class file:

static volatile bool initialised;

in the Source file:

static volatile bool Singleton::initialised=false;

Singleton* Singleton::instance()
{
  if (!initialised == 0) // 1st test
  {
Lock lock;
if (!initialised) // 2nd test
{
  pInstance = new Singleton;
initialised=true;   // Atomic
}
  }
 return pInstance;
}

I've been using this approach for absolutely YEARS, and didn't realise
someone had honoured
it with a design pattern name!!!

I've copied this to Scott Meyers for him to comment on whether I've got this
right ...

Dave Partridge
-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] Behalf Of Steven Reddie
Sent: 06 April 2005 10:02
To: openssl-dev@openssl.org
Subject: OpenSSL use of DCLP may not be thread-safe on multiple
processors


Hi All,

OpenSSL makes use of the DCLP (double-checked locking pattern) in a number
of places (rsa_eay.c and at least one engine; I haven't done an exhaustive
search), with code that usually looks like this:

if (x == NULL)
{
CRYPTO_w_lock(CRYPTO_LOCK_XXX);
/* Avoid a race condition by checking again inside this lock */
if (x == NULL)
{
x = ...;
}
CRYPTO_w_unlock(CRYPTO_LOCK_XXX);
}
/* Now, make use of x */

Some recent research I've done in this area, prompted by Scott Meyers' and
Andrei Alexandrescu's article C++ and the Perils of Double-Checked Locking
at http://www.aristeia.com/Papers/DDJ_Jul_Aug_2004_revised.pdf, makes me
wonder whether this code is thread-safe on multi-processor machines.  As the
article points out, DCLP is dangerous in general, however it is most likely
safe if the thing being tested and set is accessed atomically.  On most
32-bit machines a 32-bit quantity will generally be accessed in a single bus
transaction, making it inherently atomic.  However, there may be cases where
it is not atomic.  An example could be on a machine that allows unaligned
accesses, such as the x86.  It may be possible for half of the value to be
updated in another processors cache, and used (since the value is therefore
not NULL), before the other half is updated.  It seems that in fact the race
condition that is trying to be avoided may have been reduced rather than
eliminated.  While it may be true that the code generated by the compiler
doesn't typically result in unaligned accesses it is still a possibility
that exists, and there may be other ways for non-atomic access to occur
without unalignment being the cause.

I've tried some elaborate workarounds to maintain the optimisation that DCLP
provides, but they turn out to be not entirely safe on other processors such
as the Itanium.  The easiest way to fix this would seem to be always
obtaining the lock before using the variables in question, but this could
have an impact on performance.  A more involved alternative is to use locked
instructions, such as the Interlocked... Functions on Windows, and some
hand-rolled assembler on other platforms, to ensure that the values are
updated atomically.  I'm not offering patches at this point in case there is
too much resistance to a performance hit, so I'm interested to know thoughts
either way.  I agree that the margin for error is very, very small, and I
don't know how much of an impact on performance the necessary changes would
have, so I'm partly sending this so that if nothing is done and a future
race-condition is reported it may assist with locating the problem.

Regards,

Steven

__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List

RE: OpenSSL use of DCLP may not be thread-safe on multiple processors

2005-04-06 Thread David C. Partridge
ARGH!

Are you absolutely sure that this is the case - that's scary - I thought
that the whole issue of SMP cache coherency and write order was solved years
ago.

I mean that if the order of memory write visibility between processors can't
be g'teed, than a whole lot MORE than just DCLP crashes and burns ...  How
in that case can anyone write safe MP code?

D.


__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   [EMAIL PROTECTED]


RE: query: Private Key generation using OpenSSL

2005-02-01 Thread David C. Partridge
Any random data that is shared with the recipient will do as a key for HMAC

Dave


__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   [EMAIL PROTECTED]


RE: ENGINE issues

2005-01-13 Thread David C. Partridge
IIRC the Luna CA3 is FIPS140-2 LEVEL 3 which means it won't allow you under
nay circumstances to extract the private key from the device
(non-extractable, sensitive in PKCS#11 parlance).

What this means is that you need to send the data to the device to be signed
(don't know how to do this using openssl), rather than extracting the key
and using openssl to do the crypto in software.

Dave


__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   [EMAIL PROTECTED]


RE: OS/2 support

2005-01-10 Thread David C. Partridge
Gosh there's a blast from the past!   I remember your name from *way* back
when I used to work on OS/2.   How are you?   Anyway to the chase:

IIRC you just need to patch to replace the strncasecmp with strnicmp, and
strcasecmp with stricmp ( or do conditional compilation).

Dave Partridge

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] Behalf Of John Poltorak
Sent: 09 January 2005 11:47
To: openssl-dev@openssl.org
Subject: OS/2 support



Is there an OS/2 maintainer involved in developing OpenSSL?

The reason I ask is that up until v0.9.7c came out, it compiled out of the
box. Since then it doesn't. The problem seems to have arisen since the
introduction (or change) of ./crypto/o_str.c and results in these errors:-

tmp_dll\o_str.obj(o_str.obj) :  error L2029: 'strncasecmp' : unresolved
external
tmp_dll\o_str.obj(o_str.obj) :  error L2029: 'strcasecmp' : unresolved
external



--
John



__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   [EMAIL PROTECTED]



__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   [EMAIL PROTECTED]


RE: [openssl.org #502] TXT_DB error number 2

2004-10-18 Thread David C. Partridge
The renaming of the serial file is a known bug. See my recent post to
openssl-dev

Dave


__
OpenSSL Project http://www.openssl.org
Development Mailing List   [EMAIL PROTECTED]
Automated List Manager   [EMAIL PROTECTED]