Re: [patch] make AES-cfb128-encrypt faster by uglifying it

2006-05-26 Thread Richard Levitte - VMS Whacker
In message [EMAIL PROTECTED] on Thu, 25 May 2006 22:50:15 -0700 (PDT), Alex 
Dubov [EMAIL PROTECTED] said:

oakad I thought all major compilers have sort of long long,
oakad didn't them? After all, emulated long long is still
oakad only two integer xors as opposed to 8 with char.

If you look in the script Configure, you'll see what kinds of
platforms we claim to support.  That means that we have to be careful
with the kind of assumptions we make.  For example, your patch would
fail miserably on VMS for VAX (which I know is still used out there).

However, nothing stops you from making variants with different types
of integers, maybe with some help from the macros used and defined in
crypto/bn/bn.h, which are correctly defined for each platform, as far
as we know.

Cheers,
Richard

-
Please consider sponsoring my work on free software.
See http://www.free.lp.se/sponsoring.html for details.

-- 
Richard Levitte [EMAIL PROTECTED]
http://richard.levitte.org/

When I became a man I put away childish things, including
 the fear of childishness and the desire to be very grown up.
-- C.S. Lewis
__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   [EMAIL PROTECTED]


Re: [patch] make AES-cfb128-encrypt faster by uglifying it

2006-05-26 Thread Alex Dubov
Ok. How about now?

--- Richard Levitte - VMS Whacker
[EMAIL PROTECTED] wrote:

 In message

[EMAIL PROTECTED]
 on Thu, 25 May 2006 22:50:15 -0700 (PDT), Alex Dubov
 [EMAIL PROTECTED] said:
 
 oakad I thought all major compilers have sort of
 long long,
 oakad didn't them? After all, emulated long long is
 still
 oakad only two integer xors as opposed to 8 with
 char.
 
 If you look in the script Configure, you'll see what
 kinds of
 platforms we claim to support.  That means that we
 have to be careful
 with the kind of assumptions we make.  For example,
 your patch would
 fail miserably on VMS for VAX (which I know is still
 used out there).
 
 However, nothing stops you from making variants with
 different types
 of integers, maybe with some help from the macros
 used and defined in
 crypto/bn/bn.h, which are correctly defined for each
 platform, as far
 as we know.
 
 Cheers,
 Richard
 
 -
 Please consider sponsoring my work on free software.
 See http://www.free.lp.se/sponsoring.html for
 details.
 
 -- 
 Richard Levitte
 [EMAIL PROTECTED]

 http://richard.levitte.org/
 
 When I became a man I put away childish things,
 including
  the fear of childishness and the desire to be very
 grown up.
   -- C.S. Lewis

__
 OpenSSL Project
 http://www.openssl.org
 Development Mailing List  
 openssl-dev@openssl.org
 Automated List Manager  
 [EMAIL PROTECTED]
 

__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

aes_cfb.c.diff
Description: 441793709-aes_cfb.c.diff


Re[2]: [patch] make AES-cfb128-encrypt faster by uglifying it

2006-05-26 Thread rz1a
Hello Alex,

Friday, May 26, 2006, 9:50:15 AM, you wrote:

AD I thought all major compilers have sort of long long,
AD didn't them?
I'm on QNX4 with Watcom C v10.6B which has neither int_64 nor long
long.
So, I'm very anxious about not being able to keep my port current
after such improvements...

AD After all, emulated long long is still only two integer xors as
AD opposed to 8 with char.
Please, invent it a bit more portable but faster still!


-- 
Best regards,
 Tony.  mailto:[EMAIL PROTECTED]

__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   [EMAIL PROTECTED]


Re: Re[2]: [patch] make AES-cfb128-encrypt faster by uglifying it

2006-05-26 Thread Brian Havard
On Fri, 26 May 2006 14:32:36 +0400, [EMAIL PROTECTED] wrote:

Hello Alex,

Friday, May 26, 2006, 9:50:15 AM, you wrote:

AD I thought all major compilers have sort of long long,
AD didn't them?
I'm on QNX4 with Watcom C v10.6B which has neither int_64 nor long
long.
So, I'm very anxious about not being able to keep my port current
after such improvements...

Can't you use OpenWatcom? It's had long long for some time and appears to
still support QNX.

-- 
 __
 |  Brian Havard |  He is not the messiah!   |
 |  [EMAIL PROTECTED]  |  He's a very naughty boy! - Life of Brian |
 --

__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   [EMAIL PROTECTED]


Re[4]: [patch] make AES-cfb128-encrypt faster by uglifying it

2006-05-26 Thread rz1a
Hello Brian,

Friday, May 26, 2006, 5:55:34 PM, you wrote:
BH Can't you use OpenWatcom? It's had long long for some time and
BH appears to still support QNX.
Indeed it knows QNX4 still. The problem is that OW is not ported to
QNX4 yet (and never will, I'm afraid). So it takes to cross-compile on
windows. I use this approach for simpler things, but the serious
projects do not work as expected if cross-compiled... I do not know
why...

Thank you for the suggestion.

Still, I'd like OpenSSL to be a bit more portable...
Currently I'm lacking the sha512 compilable here. It'd be very sad
if more and more code should be configured out to have the package
done...

-- 
Best regards,
 Tony.mailto:[EMAIL PROTECTED]

__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   [EMAIL PROTECTED]


[openssl.org #1333] ssl 64 bit compilation on windows 2003 XP and Visual Studio 2005

2006-05-26 Thread via RT

Please delete this case. I was able to resolve this error.


[EMAIL PROTECTED] - Sat May 20 19:05:06 2006]:

 Hi
 
  
 
 The 32 bit openssl compiled fine
 
  
 
 I am trying to compile the ssl 0.9.8b on Windows 2003 XP 64 system with MS
 Visual Studio 2005. I am getting the following compilation error.
 
  
 
  
 
 cl /Fotmp32dll\e_ubsec.obj  -Iinc32 -Itmp32dll /MD /Ox /W3 /Gs0 /GF /Gy
 /nologo -DWIN32_LEAN_AND_MEAN -DL_ENDIAN
 
  -DDSO_WIN32 -DOPENSSL_SYSNAME_WIN32 -DOPENSSL_SYSNAME_WINNT -DUNICODE
 -D_UNICODE -D_CRT_SECURE_NO_DEPRECATE -D_CRT_NONS
 
 TDC_NO_DEPRECATE -DOPENSSL_USE_APPLINK -I. /Fdout32dll -DOPENSSL_NO_RC5
 -DOPENSSL_NO_MDC2 -DOPENSSL_NO_KRB5 -DOPENSSL_NO
 
 _DYNAMIC_ENGINE -D_WINDLL  -DOPENSSL_BUILD_SHLIBCRYPTO -c
 .\engines\e_ubsec.c
 
 e_ubsec.c
 
 ml  /c ms\uptable.asm
 
 Microsoft (R) Macro Assembler Version 8.00.50727.42
 
 Copyright (C) Microsoft Corporation.  All rights reserved.
 
  
 
  Assembling: ms\uptable.asm
 
 ms\uptable.asm(45) : error A2024: invalid operand size for instruction
 
 ms\uptable.asm(62) : error A2024: invalid operand size for instruction
 
 ms\uptable.asm(79) : error A2024: invalid operand size for instruction
 
 ms\uptable.asm(96) : error A2024: invalid operand size for instruction
 
 ms\uptable.asm(113) : error A2024: invalid operand size for instruction
 
 ms\uptable.asm(130) : error A2024: invalid operand size for instruction
 
 ms\uptable.asm(147) : error A2024: invalid operand size for instruction
 
 ms\uptable.asm(164) : error A2024: invalid operand size for instruction
 
 ms\uptable.asm(181) : error A2024: invalid operand size for instruction
 
 ms\uptable.asm(198) : error A2024: invalid operand size for instruction
 
 ms\uptable.asm(215) : error A2024: invalid operand size for instruction
 
 ms\uptable.asm(232) : error A2024: invalid operand size for instruction
 
 ms\uptable.asm(249) : error A2024: invalid operand size for instruction
 
 ms\uptable.asm(266) : error A2024: invalid operand size for instruction
 
 ms\uptable.asm(283) : error A2024: invalid operand size for instruction
 
 ms\uptable.asm(300) : error A2024: invalid operand size for instruction
 
 ms\uptable.asm(317) : error A2024: invalid operand size for instruction
 
 ms\uptable.asm(334) : error A2024: invalid operand size for instruction
 
 ms\uptable.asm(351) : error A2024: invalid operand size for instruction
 
 ms\uptable.asm(368) : error A2024: invalid operand size for instruction
 
 ms\uptable.asm(385) : error A2024: invalid operand size for instruction
 
 ms\uptable.asm(402) : error A2024: invalid operand size for instruction
 
 ms\uptable.asm(32) : error A2006: undefined symbol : r9
 
 ms\uptable.asm(33) : error A2006: undefined symbol : r8
 
 ms\uptable.asm(34) : error A2006: undefined symbol : rdx
 
 ms\uptable.asm(35) : error A2006: undefined symbol : rcx
 
 ms\uptable.asm(36) : error A2006: undefined symbol : rsp
 
 ms\uptable.asm(37) : error A2006: undefined symbol : rcx
 
 ..
 
  
 
 ms\uptable.asm(135) : error A2006: undefined symbol : r8
 
 ms\uptable.asm(136) : error A2006: undefined symbol : rdx
 
 ms\uptable.asm(137) : error A2006: undefined symbol : rcx
 
 ms\uptable.asm(138) : error A2006: undefined symbol : rsp
 
 ms\uptable.asm(139) : error A2006: undefined symbol : rcx
 
 ms\uptable.asm(140) : fatal error A1012: error count exceeds 100; stopping
 assembly
 
 NMAKE : fatal error U1077: 'C:\Program Files (x86)\Microsoft Visual
Studio
 8\VC\BIN\ml.EXE' : return code '0x1'
 
 Stop.
 
 gmake: *** [compile64] Error 1
 
  
 


__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   [EMAIL PROTECTED]


Re: [patch] make AES-cfb128-encrypt faster by uglifying it

2006-05-26 Thread Stephen Sprunk

Thus spake Alex Dubov [EMAIL PROTECTED]

Ok. How about now?


I'm curious if there's a significant performance difference between using 
u32 and u64; the former should be portable to all supported platforms, and 
may make the latter unnecessary.


Plus, if we're going to go that route, we should consider that some 
platforms have 128-bit XOR support in hardware; is it worth implementing 
that too?


How much of this should be extended to other ciphers?  Should xorN() and 
moveN() be part of the bignum code for reuse in other modules?  IIRC, I 
copied the CFB code from another module (DES?  IDEA?  I forget) with only 
slight changes; I didn't grok enough of it at the time to worry about 
performance, just maintaining correctness.


S

Stephen SprunkStupid people surround themselves with smart
CCIE #3723   people.  Smart people surround themselves with
K5SSS smart people who disagree with them.  --Aaron Sorkin 


__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   [EMAIL PROTECTED]


Re: [patch] make AES-cfb128-encrypt faster by uglifying it

2006-05-26 Thread Andy Polyakov

Ok. How about now?


Subject to SIGBUS on most platforms. It's easy to carry away and score 
on x86 and render support for other platforms void, isn't it? I mean do 
mind unaligned access!


I'm curious if there's a significant performance difference between 
using u32 and u64; the former should be portable to all supported 
platforms, and may make the latter unnecessary.


I'd recommend [or even insist] on for (i=0;i16/sizeof(long);i++) loops 
and let compiler unroll them. 4x4-byte chunks on 32-bit platforms and 
2x8-byte chunks - on 64-bit ones without a single shred of #if 
that-or-that spaghetti and no unnecessary dependency on totally 
unrelated bn.h. And once again, unaligned input/output is to be treated 
byte by byte.


Plus, if we're going to go that route, we should consider that some 
platforms have 128-bit XOR support in hardware; is it worth implementing 
that too?


Is it really that widely used/important mode? To justify that much extra 
complexity for little gain?


How much of this should be extended to other ciphers?  Should xorN() and 
moveN() be part of the bignum code for reuse in other modules?


I'd be opposed to this. If performance gets that important, function 
call will hardly beat inline code anyway. Even if function is say 
128-bit SSE2 and inline is just 4x32-bit. A.


__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   [EMAIL PROTECTED]


Re: [patch] make AES-cfb128-encrypt faster by uglifying it

2006-05-26 Thread Stephen Sprunk

Thus spake Andy Polyakov [EMAIL PROTECTED]

Ok. How about now?


Subject to SIGBUS on most platforms. It's easy to carry away and score on 
x86 and render support for other platforms void, isn't it? I mean do mind 
unaligned access!


Ah, that may have been why I didn't fix that code to use u32.  More 
likely, it was a happy accident that I inherited the portability of the code 
I copied.  I certainly introduced a few logic bugs of my own (which were 
quickly fixed by others)...


I'm curious if there's a significant performance difference between using 
u32 and u64; the former should be portable to all supported platforms, 
and may make the latter unnecessary.


I'd recommend [or even insist] on for (i=0;i16/sizeof(long);i++) loops 
and let compiler unroll them. 4x4-byte chunks on 32-bit platforms and 
2x8-byte chunks - on 64-bit ones without a single shred of #if 
that-or-that spaghetti and no unnecessary dependency on totally unrelated 
bn.h. And once again, unaligned input/output is to be treated byte by 
byte.


My experience is that, for blocks as short as we're discussing here, the 
tests for unaligned blocks usually defeat the benefit you get in the aligned 
case.  Functions like memcpy() generally require a minimum size before they 
try any such trickery due to the cost of the test, and 16 bytes is probably 
on the edge for most platforms.


If you're using a platform that will transparently handle unaligned access 
(either in hardware or software), it's worth it, but IMHO not on code that 
has to work on platforms that don't.


Plus, if we're going to go that route, we should consider that some 
platforms have 128-bit XOR support in hardware; is it worth implementing 
that too?


Is it really that widely used/important mode? To justify that much extra 
complexity for little gain?


I hacked up a version of the AES code a while back that used SSE registers 
to pass the blocks around, do bitwise operations, etc.  It was faster than 
the current version, but (IMHO) not enough to justify adding so much 
unportable hackery to the project.  If one desperately needs speed, the 
existing approach is to use platform-specific asm, and that seems 
sufficient.



How much of this should be extended to other ciphers?  Should
xorN() and moveN() be part of the bignum code for reuse in other
modules?


I'd be opposed to this. If performance gets that important, function call 
will hardly beat inline code anyway. Even if function is say 128-bit SSE2 
and inline is just 4x32-bit. A.


When I find such things useful, I tend to put them in a module's headers as 
a static inline function; that gets the speed of a macro with the semantics 
and safety of a real function.  Unfortunately, that approach probably 
won't work on all of the platforms OpenSSL supports due to all the ancient 
compilers floating around.


S

Stephen SprunkStupid people surround themselves with smart
CCIE #3723   people.  Smart people surround themselves with
K5SSS smart people who disagree with them.  --Aaron Sorkin 


__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   [EMAIL PROTECTED]