Re: [CRYPTO] is it really optimized ?

2007-04-23 Thread Francis Moreau

Hi

[Sorry for the late answer]

On 4/19/07, Francis Moreau [EMAIL PROTECTED] wrote:

On 4/17/07, Roland Dreier [EMAIL PROTECTED] wrote:
It seems trivial to keep the last key you were given and do a quick
memcmp in your setkey method to see if it's different from the last
key you pushed to hardware, and set a flag if it is.  Then only do
your set_key() if you have a new key to pass to hardware.
   
I'm assuming the expense is in the aes_write() calls, and you could
avoid them if you know you're not writing something new.

   that's a wrong assumption. aes_write()/aes_read() are both used to
   access to the controller and are slow (no cache involved).

 Sorry, I wasn't clear.  I meant that the hardware access is what is
 slow, and that anything you do on the CPU is relatively cheap compared
 to that.

 So my suggestion is just to keep a cache (in CPU memory) of what you
 have already loaded into the HW, and before reloading the HW just
 check the cache and don't do the actual HW access if you're not going
 to change the HW contents.  So you avoid any extra aes_write and
 aes_read calls in the cache hit case.

 This would have the advantage of making anything that does lots of
 bulk encryption fast without special casing ecryptfs.


I'm not sure how memcmp(key, cache, KEY_SIZE) would impact AES
performance. I need to give it a test but can't today. I'll do
tomorrow and give you back the result.



OK, I gave it a test and it appears that the cache hit case is
slightly worse than unconditionnal key loading. So it means that
testing that hte key is cached is as long as loading the key into the
controller. Here is what I did in set_key() function:

static void set_key(const char *key)
{
static u32 my_key[4] __cacheline_aligned;

u32 key0 = *(const u32 *)(key + 12);
u32 key1 = *(const u32 *)(key + 8);
u32 key2 = *(const u32 *)(key + 4);
u32 key3 = *(const u32 *)(key);
int timeout = 100;
u32 miss = 0;

miss |= key0 ^ my_key[0];
miss |= key1 ^ my_key[1];
miss |= key2 ^ my_key[2];
miss |= key3 ^ my_key[3];
if (miss == 0)
return;

my_key[0] = key0;
my_key[1] = key1;
my_key[2] = key2;
my_key[3] = key3;

aes_write(be32_to_cpu(key0), AES_KEY0);
aes_write(be32_to_cpu(key1), AES_KEY1);
aes_write(be32_to_cpu(key2), AES_KEY2);
aes_write(be32_to_cpu(key3), AES_KEY3);

/* generate dkey: should take 11 cycles */
aes_write(aes_read(AES_CR) | CR_DKEYGEN, AES_CR);

while (aes_read(AES_CR)  CR_DKEYGEN) {
if (--timeout == 0)
break;
}
}

So I was wrong, hardware access is not so expensive as I thought. But
it also means that all instructions executed in the drivers'
encrypt()/decrypt() methods have a real cost and skipping key loadings
is a win.

Using the driver exclusively doesn't seem to be the right solution,
but I don't see another way to do that...
--
Francis
-
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [CRYPTO] is it really optimized ?

2007-04-17 Thread Francis Moreau

On 4/17/07, Herbert Xu [EMAIL PROTECTED] wrote:

On Mon, Apr 16, 2007 at 10:37:01AM +0200, Francis Moreau wrote:

 BTW, here are figures I got with 2 different versions of the driver
 when using tcrypt module. The second being the result with the
 optimized driver (no key reloading on each block):

 normal version:
 test 4 (128 bit key, 8192 byte blocks): 1 operation in 67991 cycles (8192
 bytes)

 optimized version:
 test 4 (128 bit key, 8192 byte blocks): 1 operation in 51783 cycles (8192
 bytes)

 So the gain is 16000 cycles which seems to worth the change, isn't it ?

Sounds like it would.  It would help of course if you posted the patch :)



OK, I tried to cook up something very simple. Since I don't know this
code, please be indulgent when reading the following patch ;)

diff --git a/crypto/api.c b/crypto/api.c
index 55af8bb..f067de8 100644
--- a/crypto/api.c
+++ b/crypto/api.c
@@ -71,6 +71,10 @@ struct crypto_alg *__crypto_alg_lookup(const char
*name, u32 type, u32 mask)
((struct crypto_larval *)q)-mask != mask)
continue;

+   if (alg-cra_flags  CRYPTO_ALG_EXCLUSIVE 
+   atomic_read(alg-cra_refcnt)  0)
+   continue;
+
exact = !strcmp(q-cra_driver_name, name);
fuzzy = !strcmp(q-cra_name, name);
if (!exact  !(fuzzy  q-cra_priority  best))
diff --git a/include/linux/crypto.h b/include/linux/crypto.h
index 779aa78..278d386 100644
--- a/include/linux/crypto.h
+++ b/include/linux/crypto.h
@@ -41,6 +41,7 @@
#define CRYPTO_ALG_DEAD 0x0020
#define CRYPTO_ALG_DYING0x0040
#define CRYPTO_ALG_ASYNC0x0080
+#define CRYPTO_ALG_EXCLUSIVE   0x0100

/*
 * Set this bit if and only if the algorithm requires another algorithm of

--
Francis
-
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [CRYPTO] is it really optimized ?

2007-04-17 Thread Evgeniy Polyakov
On Tue, Apr 17, 2007 at 04:01:51PM +0200, Francis Moreau ([EMAIL PROTECTED]) 
wrote:
 On 4/17/07, Herbert Xu [EMAIL PROTECTED] wrote:
 
 Yep.  We don't need such a flag anyway.  All we need is a way to tweak
 the priority and Bob's your uncle.
 
 
 Could you elaborate please, I don't see how you prevent others users
 to use this module with priority.
 
 Priority is a stuff that tells you which aes implementation to use but
 it does not prevent an implementation to be used several times...

Preventing anyone from using the module is incorrect.
How will you handle the case when you have only one algo registered and
it will be exclusively used by ecryptfs?

Herbert proposes to register _second_ algo (say aes-generic(prio_100)
and aes_for_ecryptfs(prio_1)) with lower prio, so generic access will
never try to catch aes_for_ecryptfs, but your code still can access it
using full name.

 Thanks
 -- 
 Francis

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [CRYPTO] is it really optimized ?

2007-04-17 Thread Francis Moreau

On 4/17/07, Evgeniy Polyakov [EMAIL PROTECTED] wrote:

 OK, I tried to cook up something very simple. Since I don't know this
 code, please be indulgent when reading the following patch ;)

Which means that after one has loaded ecryptfs module it can not use
ipsec and dm-crypt if there is only one crypto algo registered...



That's actually the goal, but I agree we would need a flag to pass
when loading AES module to say I want an exclusive usage of it and
therefore it can be run faster.

If you have several users of AES module, you can choose (a) use the
no-optimized version for all users or (b) choose which user needs to
be run quickly and make it exclusively use the AES hw module; the
others users would use the generic AES (the slower one).

--
Francis
-
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [CRYPTO] is it really optimized ?

2007-04-17 Thread Roland Dreier
  Again, my code is faster only because I skip the key loading in 
  cia_encrypt method. Actually the gain is bigger in decryption mode 
  than in encryption one because I also generate the decryption key for 
  each block.

I wonder if there's some way you can cache the last caller and reload
the key lazily (only when it changes).  Of course without your code
it's hard to say...
-
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [CRYPTO] is it really optimized ?

2007-04-17 Thread Roland Dreier
   I wonder if there's some way you can cache the last caller and reload
   the key lazily (only when it changes).
  
  yes something that allows crypto drivers to detect if the key has
  changed would be good.

It seems trivial to keep the last key you were given and do a quick
memcmp in your setkey method to see if it's different from the last
key you pushed to hardware, and set a flag if it is.  Then only do
your set_key() if you have a new key to pass to hardware.

I'm assuming the expense is in the aes_write() calls, and you could
avoid them if you know you're not writing something new.

 - R.
-
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [CRYPTO] is it really optimized ?

2007-04-17 Thread Herbert Xu
Francis Moreau [EMAIL PROTECTED] wrote:
 On 4/17/07, Herbert Xu [EMAIL PROTECTED] wrote:

 Actually, I was referring to your AES module :)
 
 Well I don't if I can do that unfortunately.

What's the problem?

 Actually there's nothing really interesting in this code, only key or
 acc loadings and that's it.
 
 What do you want to see exactly ?

Well if your code's faster than what we have in the kernel then we
should use yours instead.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [CRYPTO] is it really optimized ?

2007-04-17 Thread Francis Moreau

On 4/17/07, Roland Dreier [EMAIL PROTECTED] wrote:

  Again, my code is faster only because I skip the key loading in
  cia_encrypt method. Actually the gain is bigger in decryption mode
  than in encryption one because I also generate the decryption key for
  each block.

I wonder if there's some way you can cache the last caller and reload
the key lazily (only when it changes).


yes something that allows crypto drivers to detect if the key has
changed would be good.


Of course without your code it's hard to say...



Alright you can find the main part of it below...



struct foo_aes_ctx {
u8 key[AES_KEY_LENGTH];
};

/*
*
*/
static inline void set_dir(int dir)
{
u32 cr = aes_read(AES_CR);

switch (dir) {
case AES_DIR_ENCRYPT:
cr |= CR_DIR;
break;
case AES_DIR_DECRYPT:
cr = ~CR_DIR;
break;
default:
BUG();
}
aes_write(cr, AES_CR);
}

static inline void set_key(const char *key)
{
u32 key0 = be32_to_cpup((const u32 *)(key + 12));
u32 key1 = be32_to_cpup((const u32 *)(key + 8));
u32 key2 = be32_to_cpup((const u32 *)(key + 4));
u32 key3 = be32_to_cpup((const u32 *)(key));

aes_write(key0, AES_KEY0);
aes_write(key1, AES_KEY1);
aes_write(key2, AES_KEY2);
aes_write(key3, AES_KEY3);
}

/* should take only 11 cycles */
static int gen_dkey(void)
{
int timeout = 100;

aes_write(aes_read(AES_CR) | CR_DKEYGEN, AES_CR);

while (aes_read(AES_CR)  CR_DKEYGEN) {
if (--timeout == 0)
return -EIO;
}
return 0;
}

static int crypt_block(int dir, u8 *dst, const u8 *src, const char *key)
{
register u32 acc0 = be32_to_cpup((const u32 *)(src + 12));
register u32 acc1 = be32_to_cpup((const u32 *)(src + 8));
register u32 acc2 = be32_to_cpup((const u32 *)(src + 4));
register u32 acc3 = be32_to_cpup((const u32 *)(src));
unsigned long flags;

spin_lock_irqsave(foo_aes_lock, flags);

set_key(key);
set_dir(dir);

if (dir == AES_DIR_DECRYPT)
gen_dkey();

aes_write(acc0, AES_ACC0);
aes_write(acc1, AES_ACC1);
aes_write(acc2, AES_ACC2);
aes_write(acc3, AES_ACC3);

{
/* Again, should take only 11 cycles */
int timeout = 100;

while (aes_read(AES_CR)  0x70) {
if (--timeout == 0)
return -EIO;
}
}

/* order is important, guess why ? */
*(u32 *)(dst + 12) = cpu_to_be32(aes_read(AES_ACC0));
*(u32 *)(dst + 8)  = cpu_to_be32(aes_read(AES_ACC1));
*(u32 *)(dst + 4)  = cpu_to_be32(aes_read(AES_ACC2));
*(u32 *)(dst)  = cpu_to_be32(aes_read(AES_ACC3));

spin_unlock_irqrestore(foo_aes_lock, flags);

return 0;
}

/*
*
*/
static int foo_setkey(struct crypto_tfm *tfm, const u8 *key, unsigned int len)
{
struct foo_aes_ctx *ctx = crypto_tfm_ctx(tfm);
int rv;

if (len == AES_KEY_LENGTH) {
memcpy(ctx-key, key, AES_KEY_LENGTH);
rv = 0;
} else {
tfm-crt_flags |= CRYPTO_TFM_RES_BAD_KEY_LEN;
rv = -EINVAL;
}
return rv;
}

static void foo_decrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
{
struct foo_aes_ctx *ctx = crypto_tfm_ctx(tfm);

BUG_ON(!in);
BUG_ON(!out);

crypt_block(AES_DIR_DECRYPT, out, in, ctx-key);
}

static void foo_encrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
{
struct foo_aes_ctx *ctx = crypto_tfm_ctx(tfm);

BUG_ON(!in);
BUG_ON(!out);

crypt_block(AES_DIR_ENCRYPT, out, in, ctx-key);
}

static struct crypto_alg foo_alg = {
.cra_name   = aes,
.cra_driver_name= aes-128-foo,
.cra_priority   = 300,
.cra_flags  = CRYPTO_ALG_TYPE_CIPHER,
.cra_blocksize  = AES_MIN_BLOCK_SIZE,
.cra_ctxsize= sizeof(struct foo_aes_ctx),
.cra_module = THIS_MODULE,
.cra_list   = LIST_HEAD_INIT(foo_alg.cra_list),
.cra_u  = {
.cipher = {
.cia_min_keysize=  AES_KEY_LENGTH,
.cia_max_keysize=  AES_KEY_LENGTH,
.cia_setkey =  foo_setkey,
.cia_encrypt=  foo_encrypt,
.cia_decrypt=  foo_decrypt
}
}
};



--
Francis
-
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [CRYPTO] is it really optimized ?

2007-04-17 Thread Francis Moreau

On 4/17/07, Herbert Xu [EMAIL PROTECTED] wrote:

Francis Moreau [EMAIL PROTECTED] wrote:
 On 4/17/07, Herbert Xu [EMAIL PROTECTED] wrote:

 Actually, I was referring to your AES module :)

 Well I don't if I can do that unfortunately.

What's the problem?



Always the same problem. Some stupid people here think that they have
designed the most wonderful AES hardware module. If I give you the
code of the driver I'll show you a lot of confidential stuff.


 Actually there's nothing really interesting in this code, only key or
 acc loadings and that's it.

 What do you want to see exactly ?

Well if your code's faster than what we have in the kernel then we
should use yours instead.



Again, my code is faster only because I skip the key loading in
cia_encrypt method. Actually the gain is bigger in decryption mode
than in encryption one because I also generate the decryption key for
each block.

You see there's absolutely no clever trick here...
--
Francis
-
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [CRYPTO] is it really optimized ?

2007-04-16 Thread Herbert Xu
On Mon, Apr 16, 2007 at 10:37:01AM +0200, Francis Moreau wrote:
 
 BTW, here are figures I got with 2 different versions of the driver
 when using tcrypt module. The second being the result with the
 optimized driver (no key reloading on each block):
 
 normal version:
 test 4 (128 bit key, 8192 byte blocks): 1 operation in 67991 cycles (8192 
 bytes)
 
 optimized version:
 test 4 (128 bit key, 8192 byte blocks): 1 operation in 51783 cycles (8192 
 bytes)
 
 So the gain is 16000 cycles which seems to worth the change, isn't it ?

Sounds like it would.  It would help of course if you posted the patch :)

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [CRYPTO] is it really optimized ?

2007-04-14 Thread Herbert Xu
Francis Moreau [EMAIL PROTECTED] wrote:
 
 hmm yes indeed it should do the job, but I don't see how you do that.
 For example, let say I want to use aes-foo with eCryptfs. I can give
 a higher priority to aes-foo than aes one. When eCryptfs asks for
 a aes cipher it will pass aes name and since aes-foo has a higher
 priority then the cypto core will return aes-foo cipher, right ? But
 in this scheme, eCryptfs has not a higher priority than other kernel
 users. How can I prevent others to use aes-foo ?

You would assign aes-foo a lower priority and then tell eCryptfs to
use aes-foo instead of aes.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [CRYPTO] is it really optimized ?

2007-04-14 Thread Francis Moreau

On 4/14/07, Herbert Xu [EMAIL PROTECTED] wrote:

Francis Moreau [EMAIL PROTECTED] wrote:

 hmm yes indeed it should do the job, but I don't see how you do that.
 For example, let say I want to use aes-foo with eCryptfs. I can give
 a higher priority to aes-foo than aes one. When eCryptfs asks for
 a aes cipher it will pass aes name and since aes-foo has a higher
 priority then the cypto core will return aes-foo cipher, right ? But
 in this scheme, eCryptfs has not a higher priority than other kernel
 users. How can I prevent others to use aes-foo ?

You would assign aes-foo a lower priority and then tell eCryptfs to
use aes-foo instead of aes.



ok but do you think it's safe to assume that no others parts of the
kernel will request aes-foo ? Remember that the main point is to
optimize aes-foo ?

I would say that it would be better if aes-foo could raise a flag
for example indicating to the crypto core that this algo can be
instatiate only one time...

thanks
--
Francis
-
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [CRYPTO] is it really optimized ?

2007-04-14 Thread Michael Halcrow
On Sun, Apr 15, 2007 at 05:34:19AM +1000, Herbert Xu wrote:
 Francis Moreau [EMAIL PROTECTED] wrote:
  
  hmm yes indeed it should do the job, but I don't see how you do that.
  For example, let say I want to use aes-foo with eCryptfs. I can give
  a higher priority to aes-foo than aes one. When eCryptfs asks for
  a aes cipher it will pass aes name and since aes-foo has a higher
  priority then the cypto core will return aes-foo cipher, right ? But
  in this scheme, eCryptfs has not a higher priority than other kernel
  users. How can I prevent others to use aes-foo ?
 
 You would assign aes-foo a lower priority and then tell eCryptfs to
 use aes-foo instead of aes.

Note that eCryptfs whitelists the cipher name (see
fs/ecryptfs/crypto.c::ecryptfs_cipher_code_str_map[] and associated
functions). This is because eCryptfs needs to pick a cipher code
(RFC2440-ish) to identify the cipher in the encrypted file
metadata. Shall I go ahead with a patch to add support for the '-'
qualifier?

Mike
-
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html