Re: PadLock engine SHA1 support

2006-04-24 Thread Michal Ludvig

Andy Polyakov wrote:

Hi,


BTW, have you considered synergetic implementation, which would work as
following. Arrange an intermediate buffer followed by non-accessible
page [commonly would be done with anonymous mmap of two pages followed
by mprotect(PROT_NONE) for the second page]. Upon *_init we call
software SHA*_Init. Then all short inputs go directly through software
SHA*_Update, while everything that is larger than certain value, say 256
bytes, is treated as following. Input stream is first purged/aligned
by running single pass of SHA*_Update till SHA*_CTX-data is full. Then
available 64-byte chunks are copied to the *bottom* of first page
mentioned above. Then we set up SEGV signal handler, let hardware suffer
from page fault and collect the intermediate hash values. The procedure
is repeated if more than pagesize was availalbe at a time.
SHA*_CTX-Nl,Nh are adjusted accordingly and remaning bytes [if any] are
fed again to software SHA*_Update. Upon *_final we just call *software*
SHA*_Final.



Are you sure it flushes the intermediate
results on exception? Well we can try ;-)


Yep it works. Proof of concept at 
http://www.logix.cz/michal/devel/padlock/phe_sum.c
It isn't optimized at all, does finalizing in HW so it can be compiled 
wothout OpenSSL and only works for files 512MB. But it actually works, 
which is a good start ;-)


Thanks for the idea Andy!

Michal

__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   [EMAIL PROTECTED]


Re: PadLock engine SHA1 support

2006-04-21 Thread Andy Polyakov

Hi,


BTW, have you considered synergetic implementation, which would work as
following. Arrange an intermediate buffer followed by non-accessible
page [commonly would be done with anonymous mmap of two pages followed
by mprotect(PROT_NONE) for the second page]. Upon *_init we call
software SHA*_Init. Then all short inputs go directly through software
SHA*_Update, while everything that is larger than certain value, say 256
bytes, is treated as following. Input stream is first purged/aligned
by running single pass of SHA*_Update till SHA*_CTX-data is full. Then
available 64-byte chunks are copied to the *bottom* of first page
mentioned above. Then we set up SEGV signal handler, let hardware suffer
from page fault and collect the intermediate hash values. The procedure
is repeated if more than pagesize was availalbe at a time.
SHA*_CTX-Nl,Nh are adjusted accordingly and remaning bytes [if any] are
fed again to software SHA*_Update. Upon *_final we just call *software*
SHA*_Final.


Man that's a wicked idea ;-) Though I'm not sure how xsha would survive
restarting after its segfault.


Well, the idea is rather to *not* restart it, but collect intermediate 
results and terminate it. Then this results are fed to either software 
or back to hardware as if it's a whole lot of new data, but with init 
values from previous step. The keyword is also to *never* let hardware 
do the final padding and final block calculation [which is why it always 
looks like a whole lot of data to hardware]. That's because hardware 
never knows correct Nl,Nh values used for final padding, only software does.



Are you sure it flushes the intermediate
results on exception? Well we can try ;-)


Manual says it does. Well, it doesn't say it flushes on SEGV in 
particular, but at low level processors don't normally distinguish SEGV, 
page fault or other exception. They just go like oh! it's *an* 
exception, I flush, go kernel, call handler. Manual essentially says I 
flush upon *an* exception.



Would such an approach work on all architectures (anonymous and
protected pages, sighandlers, ...)?


I don't know, but we can always make it conditionally available on 
explicitly tested architectures:-) You also have to realize that it also 
takes extra effort to make such implementation thread-safe. There are 
basically two options. 1. Allocate pages on per-thread basis [which 
would require unified API to per-thread storage, something we don't 
have]. 2. Serialize access to hardware [which we have unified API for]. 
As hardware is faster than network second one is perfectly viable option.



In the meantime could we go with the old fashioned patches that I sent
some time ago? I'll realign them with current CVS head (or 0.9.8 branch).


There were unanswered questions like support for SHA-224, test suite 
with public record that it passes, EVP_MD_FLAG_ONESHOT... But I don't 
have time to look into it right now, we have to do in May or something... A.

__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   [EMAIL PROTECTED]


Re: PadLock engine SHA1 support

2006-04-20 Thread Michal Ludvig
Hi Andy,

I'm sorry for such a late reply ;-) I didn't have the hardware available
during past few months and only got it up and running again recently.

 BTW, have you considered synergetic implementation, which would work as
 following. Arrange an intermediate buffer followed by non-accessible
 page [commonly would be done with anonymous mmap of two pages followed
 by mprotect(PROT_NONE) for the second page]. Upon *_init we call
 software SHA*_Init. Then all short inputs go directly through software
 SHA*_Update, while everything that is larger than certain value, say 256
 bytes, is treated as following. Input stream is first purged/aligned
 by running single pass of SHA*_Update till SHA*_CTX-data is full. Then
 available 64-byte chunks are copied to the *bottom* of first page
 mentioned above. Then we set up SEGV signal handler, let hardware suffer
 from page fault and collect the intermediate hash values. The procedure
 is repeated if more than pagesize was availalbe at a time.
 SHA*_CTX-Nl,Nh are adjusted accordingly and remaning bytes [if any] are
 fed again to software SHA*_Update. Upon *_final we just call *software*
 SHA*_Final. A.

Man that's a wicked idea ;-) Though I'm not sure how xsha would survive
restarting after its segfault. Are you sure it flushes the intermediate
results on exception? Well we can try ;-)

Would such an approach work on all architectures (anonymous and
protected pages, sighandlers, ...)?

In the meantime could we go with the old fashioned patches that I sent
some time ago? I'll realign them with current CVS head (or 0.9.8 branch).

Michal
__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   [EMAIL PROTECTED]


Re: PadLock engine SHA1 support

2005-12-28 Thread Andy Polyakov

Should I add/fix something?


BTW, have you considered synergetic implementation, which would work as 
following. Arrange an intermediate buffer followed by non-accessible 
page [commonly would be done with anonymous mmap of two pages followed 
by mprotect(PROT_NONE) for the second page]. Upon *_init we call 
software SHA*_Init. Then all short inputs go directly through software 
SHA*_Update, while everything that is larger than certain value, say 256 
bytes, is treated as following. Input stream is first purged/aligned 
by running single pass of SHA*_Update till SHA*_CTX-data is full. Then 
available 64-byte chunks are copied to the *bottom* of first page 
mentioned above. Then we set up SEGV signal handler, let hardware suffer 
from page fault and collect the intermediate hash values. The procedure 
is repeated if more than pagesize was availalbe at a time. 
SHA*_CTX-Nl,Nh are adjusted accordingly and remaning bytes [if any] are 
fed again to software SHA*_Update. Upon *_final we just call *software* 
SHA*_Final. A.

__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   [EMAIL PROTECTED]


Re: PadLock engine SHA1 support

2005-11-01 Thread Michal Ludvig
Andy Polyakov wrote:

 Could be. But should it be run automatically
 during make? I guess no...
 
 No, but I'd like to *see* some test program and I'd like to hear
 explicit statement that the implementation passes this test. As you
 might recall we have tested AES by encypting with software and
 decrypting with engine and then other way around. We need something even
 this time:-) A.

FWIW I'm testing it with OpenVPN having OpenSSL-0.9.8+padlock on one end
and OpenSSL-0.9.7 without padlock on the other. And of course I'm
regularly running those openssl dgst tests with and without engine on
some input files. I don't often send untested patches ;-)

Michal Ludvig
-- 
* Personal homepage: http://www.logix.cz/michal
__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   [EMAIL PROTECTED]


Re: PadLock engine SHA1 support

2005-11-01 Thread Andy Polyakov

Could be. But should it be run automatically
during make? I guess no...


No, but I'd like to *see* some test program and I'd like to hear
explicit statement that the implementation passes this test. As you
might recall we have tested AES by encypting with software and
decrypting with engine and then other way around. We need something even
this time:-)



FWIW I'm testing it with OpenVPN having OpenSSL-0.9.8+padlock on one end
and OpenSSL-0.9.7 without padlock on the other. And of course I'm
regularly running those openssl dgst tests with and without engine on
some input files. I don't often send untested patches ;-)


I'm not questioning whether or not the patch is tested! I simply would 
like to see a record, preferably public one, on *simple* verification 
procedure, which *anybody* [with appropriate hardware] could execute at 
any time and in no time, without having to setup VPN or similar. That's 
all. A.

__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   [EMAIL PROTECTED]


Re: PadLock engine SHA1 support

2005-10-31 Thread Michal Ludvig
Andy,

Ping :-) Did you have time to look at this patch? Should I add/fix
something?

Thanks!

Michal Ludvig
-- 
* Personal homepage: http://www.logix.cz/michal
__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   [EMAIL PROTECTED]


Re: PadLock engine SHA1 support

2005-10-31 Thread Andy Polyakov

Ping :-)


(-: Pong


Did you have time to look at this patch?


No, unfortunately. Are you in hurry? If yes, what's the hurry?


Should I add/fix something?


Windows support:-) SHA-224 [which differs from SHA-256 only by initial 
constants and truncated output]. Test programs [extra -e argument 
perhaps]. BTW, what's the deal with padlock engine in ./config -shared 
configuration? It doesn't seem to be there... A.

__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   [EMAIL PROTECTED]


Re: PadLock engine SHA1 support

2005-10-31 Thread Michal Ludvig
Andy Polyakov wrote:

 Did you have time to look at this patch?
 
 No, unfortunately. Are you in hurry? If yes, what's the hurry?

No I'm not. I just wanted to move forward...

 Should I add/fix something?
 
 Windows support:-)

Uh, eh, ... afterall I don't have a machine to test it on.

 SHA-224 [which differs from SHA-256 only by initial
 constants and truncated output]. 

I see. I'll look at it and send you the patch.

 Test programs [extra -e argument perhaps]. 

... to enable engines? Could be. But should it be run automatically
during make? I guess no...

 BTW, what's the deal with padlock engine in ./config -shared
 configuration? It doesn't seem to be there... A.

I don't know. I don't know much about the support infrastructure for
engines. Any pointers where to look?

Michal Ludvig
-- 
* Personal homepage: http://www.logix.cz/michal
__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   [EMAIL PROTECTED]


Re: PadLock engine SHA1 support

2005-10-31 Thread Andy Polyakov
Test programs [extra -e argument perhaps]. 


... to enable engines?


Yes. On the other hand I suppose one can write a script, which would 
simply call 'openssl dgst -sha[1|256] -engine padlock' with a set of 
known input vectors...



Could be. But should it be run automatically
during make? I guess no...


No, but I'd like to *see* some test program and I'd like to hear 
explicit statement that the implementation passes this test. As you 
might recall we have tested AES by encypting with software and 
decrypting with engine and then other way around. We need something even 
this time:-) A.

__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   [EMAIL PROTECTED]


Re: PadLock engine SHA1 support

2005-10-13 Thread Michal Ludvig
Andy Polyakov wrote:

 What happens when you issue the instruction without rep prefix?

 That's invalid instruction I believe.
 
 Dare to actually try?

Just tried = Invalid instruction ;-)

 Instead its necessary to accumulate all data from
 update()s in some buffer and hash them only in final().

 Note that there is EVP_MD_FLAG_ONESHOT, which can/should be used to
 avoid fallback to software at least for such cases.

 I have found this flag but didn't realise how to use it.
 
 If flag is set, just hash directly in update procedure and do nothing
 [but byte swaping?] in final. Instead of doing nothing but copying in
 update procedure and do hashing in final.

What if you need to do several updates before finally hashing it? E.g.
like in HMAC? You need to store the data somewhere before actually
hashing them with padlock...

 And IIRC it's only used in one engine. Afterall I decided it's useless
 and wrote the software fallback path for SHA.
 
 Note that I didn't suggest to scrap software fallback [yet?], just to
 *complement* with a way to hash larger data chunk if it's readily
 available in one stroke. 

How do you know that there won't be more data to come to update()? Maybe
I'm missing something w.r.t. this ONESHOT option...?

You may want to generalise the software fallback path somewhere into
openssl core, but the question is if it's worth the overhead now, when
only one engine needs it.

 BTW, as for copying. As more than likely
 sensitive data gets copied into intermediate buffer, it's more than
 appropriate to zero it prior free. I only see memset on padlock
 intermediate state. A.

Yeah, right. Attached is and incremental diff addressing this issue.

Michal Ludvig
-- 
* Personal homepage: http://www.logix.cz/michal
Index: openssl-0.9.8/crypto/engine/eng_padlock.c
===
--- openssl-0.9.8.orig/crypto/engine/eng_padlock.c
+++ openssl-0.9.8/crypto/engine/eng_padlock.c
@@ -1153,6 +1153,7 @@ padlock_sha_bypass(struct padlock_digest
if (ddata-buf_start  ddata-used  0) {
SHA1_Update(ddata-fallback_ctx, ddata-buf_start, 
ddata-used);
if (ddata-buf_alloc) {
+   memset(ddata-buf_start, 0, ddata-used); 
free(ddata-buf_alloc);
ddata-buf_alloc = 0;
}
@@ -1266,6 +1267,7 @@ padlock_sha_final(EVP_MD_CTX *ctx, unsig

/* Pass the input buffer to PadLock microcode... */
padlock_do_sha1(ddata-buf_start, md, ddata-used);
+   memset(ddata-buf_start, 0, ddata-used);
free(ddata-buf_alloc);
ddata-buf_start = 0;
ddata-buf_alloc = 0;
@@ -1298,8 +1300,10 @@ padlock_sha_cleanup(EVP_MD_CTX *ctx)
 {
struct padlock_digest_data *ddata = DIGEST_DATA(ctx);
 
-   if (ddata-buf_alloc)
+   if (ddata-buf_alloc) {
+   memset(ddata-buf_start, 0, ddata-used);
free(ddata-buf_alloc);
+   }
 
memset(ddata, 0, sizeof(struct padlock_digest_data));
 


Re: PadLock engine SHA1 support

2005-09-30 Thread Andy Polyakov
The intermdiate status (and finally the result) is stored in the 
128Bytes memory array in padlock_do_sha1(). I.e. it's context switch safe.



What happens when you issue the instruction without rep prefix?


That's invalid instruction I believe.


Dare to actually try?


Instead its necessary to accumulate all data from
update()s in some buffer and hash them only in final().


Note that there is EVP_MD_FLAG_ONESHOT, which can/should be used to 
avoid fallback to software at least for such cases.


I have found this flag but didn't realise how to use it.


If flag is set, just hash directly in update procedure and do nothing 
[but byte swaping?] in final. Instead of doing nothing but copying in 
update procedure and do hashing in final.


And IIRC it's 
only used in one engine. Afterall I decided it's useless and wrote the 
software fallback path for SHA.


Note that I didn't suggest to scrap software fallback [yet?], just to 
*complement* with a way to hash larger data chunk if it's readily 
available in one stroke. BTW, as for copying. As more than likely 
sensitive data gets copied into intermediate buffer, it's more than 
appropriate to zero it prior free. I only see memset on padlock 
intermediate state. A.


__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   [EMAIL PROTECTED]


Re: PadLock engine SHA1 support

2005-09-29 Thread Andy Polyakov

the attached patch adds SHA1 support for VIA PadLock engine.


Did VIA publish documentation for new instructions on their web-site? If 
not and you have it, can you send a copy to me?



There are several design decisions that I may need to explain:

The xsha1 instruction always finalizes the MD computation,


That kind of sucks...


i.e. it is
not possible to call the hardware in sha1_update() with the provided
input buffer.


But the instruction with rep prefix is interruptable, i.e. can be 
exposed to context switch, right? That would mean that all the 
intermediate status has to be kept somewhere, either in visible 
registers or off-loaded to memory. What happens when you issue the 
instruction without rep prefix?



Instead its necessary to accumulate all data from
update()s in some buffer and hash them only in final().


Note that there is EVP_MD_FLAG_ONESHOT, which can/should be used to 
avoid fallback to software at least for such cases.



In padlock_init() I allocate a buffer of a given size (8k as well) whose
first 16B-aligned address goes to buf_start. Having the input data
aligned allows PadLock crunch them faster.


Is 16B-alignment for input a requirement even for SHA? Even refers to 
AES... A.

__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   [EMAIL PROTECTED]


Re: PadLock engine SHA1 support

2005-09-29 Thread Michal Ludvig

Andy Polyakov wrote:


The xsha1 instruction always finalizes the MD computation,

That kind of sucks...


Hopefully the next version of the CPU will have a new hashing 
instruction that will finalize only on request. I was already in touch 
with the CPU architects, explained them what problems the current design 
brings to us and they agreed to improve it.



i.e. it is
not possible to call the hardware in sha1_update() with the provided
input buffer.


But the instruction with rep prefix is interruptable, i.e. can be 
exposed to context switch, right? That would mean that all the 
intermediate status has to be kept somewhere, either in visible 
registers or off-loaded to memory.


The intermdiate status (and finally the result) is stored in the 
128Bytes memory array in padlock_do_sha1(). I.e. it's context switch safe.



What happens when you issue the instruction without rep prefix?


That's invalid instruction I believe.


Instead its necessary to accumulate all data from
update()s in some buffer and hash them only in final().


Note that there is EVP_MD_FLAG_ONESHOT, which can/should be used to 
avoid fallback to software at least for such cases.


I have found this flag but didn't realise how to use it. And IIRC it's 
only used in one engine. Afterall I decided it's useless and wrote the 
software fallback path for SHA.



In padlock_init() I allocate a buffer of a given size (8k as well) whose
first 16B-aligned address goes to buf_start. Having the input data
aligned allows PadLock crunch them faster.


Is 16B-alignment for input a requirement even for SHA? Even refers to 
AES... A.


No, alignment is required only for output buffer (128 Bytes in 
padlock_do_sha1()), but having the input aligned improves performance a 
lot. Because we're copying the data anyway, we can copy it to an aligned 
address.


BTW In VIA Esther the buffers for AES can be unaligned in some cases as 
well. I'll come up with a patch.


Michal Ludvig
--
* Personal homepage: http://www.logix.cz/michal

__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   [EMAIL PROTECTED]