Re: [CRYPTO] is it really optimized ?
Hi [Sorry for the late answer] On 4/19/07, Francis Moreau [EMAIL PROTECTED] wrote: On 4/17/07, Roland Dreier [EMAIL PROTECTED] wrote: It seems trivial to keep the last key you were given and do a quick memcmp in your setkey method to see if it's different from the last key you pushed to hardware, and set a flag if it is. Then only do your set_key() if you have a new key to pass to hardware. I'm assuming the expense is in the aes_write() calls, and you could avoid them if you know you're not writing something new. that's a wrong assumption. aes_write()/aes_read() are both used to access to the controller and are slow (no cache involved). Sorry, I wasn't clear. I meant that the hardware access is what is slow, and that anything you do on the CPU is relatively cheap compared to that. So my suggestion is just to keep a cache (in CPU memory) of what you have already loaded into the HW, and before reloading the HW just check the cache and don't do the actual HW access if you're not going to change the HW contents. So you avoid any extra aes_write and aes_read calls in the cache hit case. This would have the advantage of making anything that does lots of bulk encryption fast without special casing ecryptfs. I'm not sure how memcmp(key, cache, KEY_SIZE) would impact AES performance. I need to give it a test but can't today. I'll do tomorrow and give you back the result. OK, I gave it a test and it appears that the cache hit case is slightly worse than unconditionnal key loading. So it means that testing that hte key is cached is as long as loading the key into the controller. Here is what I did in set_key() function: static void set_key(const char *key) { static u32 my_key[4] __cacheline_aligned; u32 key0 = *(const u32 *)(key + 12); u32 key1 = *(const u32 *)(key + 8); u32 key2 = *(const u32 *)(key + 4); u32 key3 = *(const u32 *)(key); int timeout = 100; u32 miss = 0; miss |= key0 ^ my_key[0]; miss |= key1 ^ my_key[1]; miss |= key2 ^ my_key[2]; miss |= key3 ^ my_key[3]; if (miss == 0) return; my_key[0] = key0; my_key[1] = key1; my_key[2] = key2; my_key[3] = key3; aes_write(be32_to_cpu(key0), AES_KEY0); aes_write(be32_to_cpu(key1), AES_KEY1); aes_write(be32_to_cpu(key2), AES_KEY2); aes_write(be32_to_cpu(key3), AES_KEY3); /* generate dkey: should take 11 cycles */ aes_write(aes_read(AES_CR) | CR_DKEYGEN, AES_CR); while (aes_read(AES_CR) CR_DKEYGEN) { if (--timeout == 0) break; } } So I was wrong, hardware access is not so expensive as I thought. But it also means that all instructions executed in the drivers' encrypt()/decrypt() methods have a real cost and skipping key loadings is a win. Using the driver exclusively doesn't seem to be the right solution, but I don't see another way to do that... -- Francis - To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [CRYPTO] is it really optimized ?
On 4/17/07, Herbert Xu [EMAIL PROTECTED] wrote: On Mon, Apr 16, 2007 at 10:37:01AM +0200, Francis Moreau wrote: BTW, here are figures I got with 2 different versions of the driver when using tcrypt module. The second being the result with the optimized driver (no key reloading on each block): normal version: test 4 (128 bit key, 8192 byte blocks): 1 operation in 67991 cycles (8192 bytes) optimized version: test 4 (128 bit key, 8192 byte blocks): 1 operation in 51783 cycles (8192 bytes) So the gain is 16000 cycles which seems to worth the change, isn't it ? Sounds like it would. It would help of course if you posted the patch :) OK, I tried to cook up something very simple. Since I don't know this code, please be indulgent when reading the following patch ;) diff --git a/crypto/api.c b/crypto/api.c index 55af8bb..f067de8 100644 --- a/crypto/api.c +++ b/crypto/api.c @@ -71,6 +71,10 @@ struct crypto_alg *__crypto_alg_lookup(const char *name, u32 type, u32 mask) ((struct crypto_larval *)q)-mask != mask) continue; + if (alg-cra_flags CRYPTO_ALG_EXCLUSIVE + atomic_read(alg-cra_refcnt) 0) + continue; + exact = !strcmp(q-cra_driver_name, name); fuzzy = !strcmp(q-cra_name, name); if (!exact !(fuzzy q-cra_priority best)) diff --git a/include/linux/crypto.h b/include/linux/crypto.h index 779aa78..278d386 100644 --- a/include/linux/crypto.h +++ b/include/linux/crypto.h @@ -41,6 +41,7 @@ #define CRYPTO_ALG_DEAD 0x0020 #define CRYPTO_ALG_DYING0x0040 #define CRYPTO_ALG_ASYNC0x0080 +#define CRYPTO_ALG_EXCLUSIVE 0x0100 /* * Set this bit if and only if the algorithm requires another algorithm of -- Francis - To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [CRYPTO] is it really optimized ?
On Tue, Apr 17, 2007 at 04:01:51PM +0200, Francis Moreau ([EMAIL PROTECTED]) wrote: On 4/17/07, Herbert Xu [EMAIL PROTECTED] wrote: Yep. We don't need such a flag anyway. All we need is a way to tweak the priority and Bob's your uncle. Could you elaborate please, I don't see how you prevent others users to use this module with priority. Priority is a stuff that tells you which aes implementation to use but it does not prevent an implementation to be used several times... Preventing anyone from using the module is incorrect. How will you handle the case when you have only one algo registered and it will be exclusively used by ecryptfs? Herbert proposes to register _second_ algo (say aes-generic(prio_100) and aes_for_ecryptfs(prio_1)) with lower prio, so generic access will never try to catch aes_for_ecryptfs, but your code still can access it using full name. Thanks -- Francis -- Evgeniy Polyakov - To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [CRYPTO] is it really optimized ?
On 4/17/07, Evgeniy Polyakov [EMAIL PROTECTED] wrote: OK, I tried to cook up something very simple. Since I don't know this code, please be indulgent when reading the following patch ;) Which means that after one has loaded ecryptfs module it can not use ipsec and dm-crypt if there is only one crypto algo registered... That's actually the goal, but I agree we would need a flag to pass when loading AES module to say I want an exclusive usage of it and therefore it can be run faster. If you have several users of AES module, you can choose (a) use the no-optimized version for all users or (b) choose which user needs to be run quickly and make it exclusively use the AES hw module; the others users would use the generic AES (the slower one). -- Francis - To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [CRYPTO] is it really optimized ?
Again, my code is faster only because I skip the key loading in cia_encrypt method. Actually the gain is bigger in decryption mode than in encryption one because I also generate the decryption key for each block. I wonder if there's some way you can cache the last caller and reload the key lazily (only when it changes). Of course without your code it's hard to say... - To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [CRYPTO] is it really optimized ?
I wonder if there's some way you can cache the last caller and reload the key lazily (only when it changes). yes something that allows crypto drivers to detect if the key has changed would be good. It seems trivial to keep the last key you were given and do a quick memcmp in your setkey method to see if it's different from the last key you pushed to hardware, and set a flag if it is. Then only do your set_key() if you have a new key to pass to hardware. I'm assuming the expense is in the aes_write() calls, and you could avoid them if you know you're not writing something new. - R. - To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [CRYPTO] is it really optimized ?
Francis Moreau [EMAIL PROTECTED] wrote: On 4/17/07, Herbert Xu [EMAIL PROTECTED] wrote: Actually, I was referring to your AES module :) Well I don't if I can do that unfortunately. What's the problem? Actually there's nothing really interesting in this code, only key or acc loadings and that's it. What do you want to see exactly ? Well if your code's faster than what we have in the kernel then we should use yours instead. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [CRYPTO] is it really optimized ?
On 4/17/07, Roland Dreier [EMAIL PROTECTED] wrote: Again, my code is faster only because I skip the key loading in cia_encrypt method. Actually the gain is bigger in decryption mode than in encryption one because I also generate the decryption key for each block. I wonder if there's some way you can cache the last caller and reload the key lazily (only when it changes). yes something that allows crypto drivers to detect if the key has changed would be good. Of course without your code it's hard to say... Alright you can find the main part of it below... struct foo_aes_ctx { u8 key[AES_KEY_LENGTH]; }; /* * */ static inline void set_dir(int dir) { u32 cr = aes_read(AES_CR); switch (dir) { case AES_DIR_ENCRYPT: cr |= CR_DIR; break; case AES_DIR_DECRYPT: cr = ~CR_DIR; break; default: BUG(); } aes_write(cr, AES_CR); } static inline void set_key(const char *key) { u32 key0 = be32_to_cpup((const u32 *)(key + 12)); u32 key1 = be32_to_cpup((const u32 *)(key + 8)); u32 key2 = be32_to_cpup((const u32 *)(key + 4)); u32 key3 = be32_to_cpup((const u32 *)(key)); aes_write(key0, AES_KEY0); aes_write(key1, AES_KEY1); aes_write(key2, AES_KEY2); aes_write(key3, AES_KEY3); } /* should take only 11 cycles */ static int gen_dkey(void) { int timeout = 100; aes_write(aes_read(AES_CR) | CR_DKEYGEN, AES_CR); while (aes_read(AES_CR) CR_DKEYGEN) { if (--timeout == 0) return -EIO; } return 0; } static int crypt_block(int dir, u8 *dst, const u8 *src, const char *key) { register u32 acc0 = be32_to_cpup((const u32 *)(src + 12)); register u32 acc1 = be32_to_cpup((const u32 *)(src + 8)); register u32 acc2 = be32_to_cpup((const u32 *)(src + 4)); register u32 acc3 = be32_to_cpup((const u32 *)(src)); unsigned long flags; spin_lock_irqsave(foo_aes_lock, flags); set_key(key); set_dir(dir); if (dir == AES_DIR_DECRYPT) gen_dkey(); aes_write(acc0, AES_ACC0); aes_write(acc1, AES_ACC1); aes_write(acc2, AES_ACC2); aes_write(acc3, AES_ACC3); { /* Again, should take only 11 cycles */ int timeout = 100; while (aes_read(AES_CR) 0x70) { if (--timeout == 0) return -EIO; } } /* order is important, guess why ? */ *(u32 *)(dst + 12) = cpu_to_be32(aes_read(AES_ACC0)); *(u32 *)(dst + 8) = cpu_to_be32(aes_read(AES_ACC1)); *(u32 *)(dst + 4) = cpu_to_be32(aes_read(AES_ACC2)); *(u32 *)(dst) = cpu_to_be32(aes_read(AES_ACC3)); spin_unlock_irqrestore(foo_aes_lock, flags); return 0; } /* * */ static int foo_setkey(struct crypto_tfm *tfm, const u8 *key, unsigned int len) { struct foo_aes_ctx *ctx = crypto_tfm_ctx(tfm); int rv; if (len == AES_KEY_LENGTH) { memcpy(ctx-key, key, AES_KEY_LENGTH); rv = 0; } else { tfm-crt_flags |= CRYPTO_TFM_RES_BAD_KEY_LEN; rv = -EINVAL; } return rv; } static void foo_decrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in) { struct foo_aes_ctx *ctx = crypto_tfm_ctx(tfm); BUG_ON(!in); BUG_ON(!out); crypt_block(AES_DIR_DECRYPT, out, in, ctx-key); } static void foo_encrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in) { struct foo_aes_ctx *ctx = crypto_tfm_ctx(tfm); BUG_ON(!in); BUG_ON(!out); crypt_block(AES_DIR_ENCRYPT, out, in, ctx-key); } static struct crypto_alg foo_alg = { .cra_name = aes, .cra_driver_name= aes-128-foo, .cra_priority = 300, .cra_flags = CRYPTO_ALG_TYPE_CIPHER, .cra_blocksize = AES_MIN_BLOCK_SIZE, .cra_ctxsize= sizeof(struct foo_aes_ctx), .cra_module = THIS_MODULE, .cra_list = LIST_HEAD_INIT(foo_alg.cra_list), .cra_u = { .cipher = { .cia_min_keysize= AES_KEY_LENGTH, .cia_max_keysize= AES_KEY_LENGTH, .cia_setkey = foo_setkey, .cia_encrypt= foo_encrypt, .cia_decrypt= foo_decrypt } } }; -- Francis - To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [CRYPTO] is it really optimized ?
On 4/17/07, Herbert Xu [EMAIL PROTECTED] wrote: Francis Moreau [EMAIL PROTECTED] wrote: On 4/17/07, Herbert Xu [EMAIL PROTECTED] wrote: Actually, I was referring to your AES module :) Well I don't if I can do that unfortunately. What's the problem? Always the same problem. Some stupid people here think that they have designed the most wonderful AES hardware module. If I give you the code of the driver I'll show you a lot of confidential stuff. Actually there's nothing really interesting in this code, only key or acc loadings and that's it. What do you want to see exactly ? Well if your code's faster than what we have in the kernel then we should use yours instead. Again, my code is faster only because I skip the key loading in cia_encrypt method. Actually the gain is bigger in decryption mode than in encryption one because I also generate the decryption key for each block. You see there's absolutely no clever trick here... -- Francis - To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [CRYPTO] is it really optimized ?
On Mon, Apr 16, 2007 at 10:37:01AM +0200, Francis Moreau wrote: BTW, here are figures I got with 2 different versions of the driver when using tcrypt module. The second being the result with the optimized driver (no key reloading on each block): normal version: test 4 (128 bit key, 8192 byte blocks): 1 operation in 67991 cycles (8192 bytes) optimized version: test 4 (128 bit key, 8192 byte blocks): 1 operation in 51783 cycles (8192 bytes) So the gain is 16000 cycles which seems to worth the change, isn't it ? Sounds like it would. It would help of course if you posted the patch :) Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [CRYPTO] is it really optimized ?
Francis Moreau [EMAIL PROTECTED] wrote: hmm yes indeed it should do the job, but I don't see how you do that. For example, let say I want to use aes-foo with eCryptfs. I can give a higher priority to aes-foo than aes one. When eCryptfs asks for a aes cipher it will pass aes name and since aes-foo has a higher priority then the cypto core will return aes-foo cipher, right ? But in this scheme, eCryptfs has not a higher priority than other kernel users. How can I prevent others to use aes-foo ? You would assign aes-foo a lower priority and then tell eCryptfs to use aes-foo instead of aes. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [CRYPTO] is it really optimized ?
On 4/14/07, Herbert Xu [EMAIL PROTECTED] wrote: Francis Moreau [EMAIL PROTECTED] wrote: hmm yes indeed it should do the job, but I don't see how you do that. For example, let say I want to use aes-foo with eCryptfs. I can give a higher priority to aes-foo than aes one. When eCryptfs asks for a aes cipher it will pass aes name and since aes-foo has a higher priority then the cypto core will return aes-foo cipher, right ? But in this scheme, eCryptfs has not a higher priority than other kernel users. How can I prevent others to use aes-foo ? You would assign aes-foo a lower priority and then tell eCryptfs to use aes-foo instead of aes. ok but do you think it's safe to assume that no others parts of the kernel will request aes-foo ? Remember that the main point is to optimize aes-foo ? I would say that it would be better if aes-foo could raise a flag for example indicating to the crypto core that this algo can be instatiate only one time... thanks -- Francis - To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [CRYPTO] is it really optimized ?
On Sun, Apr 15, 2007 at 05:34:19AM +1000, Herbert Xu wrote: Francis Moreau [EMAIL PROTECTED] wrote: hmm yes indeed it should do the job, but I don't see how you do that. For example, let say I want to use aes-foo with eCryptfs. I can give a higher priority to aes-foo than aes one. When eCryptfs asks for a aes cipher it will pass aes name and since aes-foo has a higher priority then the cypto core will return aes-foo cipher, right ? But in this scheme, eCryptfs has not a higher priority than other kernel users. How can I prevent others to use aes-foo ? You would assign aes-foo a lower priority and then tell eCryptfs to use aes-foo instead of aes. Note that eCryptfs whitelists the cipher name (see fs/ecryptfs/crypto.c::ecryptfs_cipher_code_str_map[] and associated functions). This is because eCryptfs needs to pick a cipher code (RFC2440-ish) to identify the cipher in the encrypted file metadata. Shall I go ahead with a patch to add support for the '-' qualifier? Mike - To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html