Re: [gentoo-user] hard drive encryption
On 03/11/2012 02:29 PM, Florian Philipp wrote: Am 11.03.2012 16:38, schrieb Valmor de Almeida: Hello, I have not looked at encryption before and find myself in a situation that I have to encrypt my hard drive. I keep /, /boot, and swap outside LVM, everything else is under LVM. I think all I need to do is to encrypt /home which is under LVM. I use reiserfs. I would appreciate suggestion and pointers on what it is practical and simple in order to accomplish this task with a minimum of downtime. Thanks, -- Valmor Is it acceptable for you to have a commandline prompt for the password when booting? In that case you can use LUKS with the /etc/init.d/dmcrypt I think so. init script. /etc/conf.d/dmcrypt should contain some examples. As you want to encrypt an LVM volume, the lvm init script needs to be started before this. As I see it, there is no strict dependency between those two scripts. You can add this by adding this line to /etc/rc.conf: rc_dmcrypt_after=lvm For creating a LUKS-encrypted volume, look at http://en.gentoo-wiki.com/wiki/DM-Crypt Currently looking at this. You won't need most of what is written there; just section 9, Administering LUKS and the kernel config in section 2, Assumptions. Concerning downtime, I'm not aware of any solution that avoids copying the data over to the new volume. If downtime is absolutely critical, ask and we can work something out that minimizes the time. Regards, Florian Philipp Since I am planning to encrypt only home/ under LVM control, what kind of overhead should I expect? Thanks, -- Valmor
Re: [gentoo-user] hard drive encryption
Am 13.03.2012 12:55, schrieb Valmor de Almeida: On 03/11/2012 02:29 PM, Florian Philipp wrote: Am 11.03.2012 16:38, schrieb Valmor de Almeida: Hello, I have not looked at encryption before and find myself in a situation that I have to encrypt my hard drive. I keep /, /boot, and swap outside LVM, everything else is under LVM. I think all I need to do is to encrypt /home which is under LVM. I use reiserfs. I would appreciate suggestion and pointers on what it is practical and simple in order to accomplish this task with a minimum of downtime. Thanks, -- Valmor Is it acceptable for you to have a commandline prompt for the password when booting? In that case you can use LUKS with the /etc/init.d/dmcrypt I think so. init script. /etc/conf.d/dmcrypt should contain some examples. As you want to encrypt an LVM volume, the lvm init script needs to be started before this. As I see it, there is no strict dependency between those two scripts. You can add this by adding this line to /etc/rc.conf: rc_dmcrypt_after=lvm For creating a LUKS-encrypted volume, look at http://en.gentoo-wiki.com/wiki/DM-Crypt Currently looking at this. You won't need most of what is written there; just section 9, Administering LUKS and the kernel config in section 2, Assumptions. Concerning downtime, I'm not aware of any solution that avoids copying the data over to the new volume. If downtime is absolutely critical, ask and we can work something out that minimizes the time. Regards, Florian Philipp Since I am planning to encrypt only home/ under LVM control, what kind of overhead should I expect? Thanks, What do you mean with overhead? CPU utilization? In that case the overhead is minimal, especially when you run a 64-bit kernel with the optimized AES kernel module. Measured on a Core i5: time cat Video/*.* /dev/null real0m42.918s user0m0.023s sys 0m2.027s That was a sequential read of roughly 3.5GB with empty caches. This corresponds to the normal disk speed. Regards, Florian Philipp signature.asc Description: OpenPGP digital signature
Re: [gentoo-user] hard drive encryption
On Tue, Mar 13, 2012 at 12:11 PM, Florian Philipp li...@binarywings.net wrote: Am 13.03.2012 12:55, schrieb Valmor de Almeida: On 03/11/2012 02:29 PM, Florian Philipp wrote: Am 11.03.2012 16:38, schrieb Valmor de Almeida: Hello, I have not looked at encryption before and find myself in a situation that I have to encrypt my hard drive. I keep /, /boot, and swap outside LVM, everything else is under LVM. I think all I need to do is to encrypt /home which is under LVM. I use reiserfs. I would appreciate suggestion and pointers on what it is practical and simple in order to accomplish this task with a minimum of downtime. Thanks, -- Valmor Is it acceptable for you to have a commandline prompt for the password when booting? In that case you can use LUKS with the /etc/init.d/dmcrypt I think so. init script. /etc/conf.d/dmcrypt should contain some examples. As you want to encrypt an LVM volume, the lvm init script needs to be started before this. As I see it, there is no strict dependency between those two scripts. You can add this by adding this line to /etc/rc.conf: rc_dmcrypt_after=lvm For creating a LUKS-encrypted volume, look at http://en.gentoo-wiki.com/wiki/DM-Crypt Currently looking at this. You won't need most of what is written there; just section 9, Administering LUKS and the kernel config in section 2, Assumptions. Concerning downtime, I'm not aware of any solution that avoids copying the data over to the new volume. If downtime is absolutely critical, ask and we can work something out that minimizes the time. Regards, Florian Philipp Since I am planning to encrypt only home/ under LVM control, what kind of overhead should I expect? Thanks, What do you mean with overhead? CPU utilization? In that case the overhead is minimal, especially when you run a 64-bit kernel with the optimized AES kernel module. Rough guess: Latency. With encryption, you can't DMA disk data directly into a process's address space, because you need the decrypt hop. Try running bonnie++ on encrypted vs non-encrypted volumes. (Or not; I doubt you have the time and materials to do a good, meaningful set of time trials) -- :wq
Re: [gentoo-user] hard drive encryption
Am 13.03.2012 17:26, schrieb Michael Mol: On Tue, Mar 13, 2012 at 12:11 PM, Florian Philipp li...@binarywings.net wrote: Am 13.03.2012 12:55, schrieb Valmor de Almeida: On 03/11/2012 02:29 PM, Florian Philipp wrote: Am 11.03.2012 16:38, schrieb Valmor de Almeida: Hello, I have not looked at encryption before and find myself in a situation that I have to encrypt my hard drive. I keep /, /boot, and swap outside LVM, everything else is under LVM. I think all I need to do is to encrypt /home which is under LVM. I use reiserfs. I would appreciate suggestion and pointers on what it is practical and simple in order to accomplish this task with a minimum of downtime. Thanks, -- Valmor Is it acceptable for you to have a commandline prompt for the password when booting? In that case you can use LUKS with the /etc/init.d/dmcrypt I think so. init script. /etc/conf.d/dmcrypt should contain some examples. As you want to encrypt an LVM volume, the lvm init script needs to be started before this. As I see it, there is no strict dependency between those two scripts. You can add this by adding this line to /etc/rc.conf: rc_dmcrypt_after=lvm For creating a LUKS-encrypted volume, look at http://en.gentoo-wiki.com/wiki/DM-Crypt Currently looking at this. You won't need most of what is written there; just section 9, Administering LUKS and the kernel config in section 2, Assumptions. Concerning downtime, I'm not aware of any solution that avoids copying the data over to the new volume. If downtime is absolutely critical, ask and we can work something out that minimizes the time. Regards, Florian Philipp Since I am planning to encrypt only home/ under LVM control, what kind of overhead should I expect? Thanks, What do you mean with overhead? CPU utilization? In that case the overhead is minimal, especially when you run a 64-bit kernel with the optimized AES kernel module. Rough guess: Latency. With encryption, you can't DMA disk data directly into a process's address space, because you need the decrypt hop. Good call. Wouldn't have thought of that. Try running bonnie++ on encrypted vs non-encrypted volumes. (Or not; I doubt you have the time and materials to do a good, meaningful set of time trials) Yeah, that sounds like something for which you need a very dull winter day. Besides, I've already lost a poorly cooled HDD on a benchmark. Regards, Florian Philipp signature.asc Description: OpenPGP digital signature
Re: [gentoo-user] hard drive encryption
On Tue, 13 Mar 2012 17:49:40 +0100, Florian Philipp wrote: Besides, I've already lost a poorly cooled HDD on a benchmark. Better than losing it on real data. -- Neil Bothwick Why do they call it a TV set when you only get one? signature.asc Description: PGP signature
Re: [gentoo-user] hard drive encryption
On Tue, Mar 13, 2012 at 12:49 PM, Florian Philipp li...@binarywings.net wrote: Am 13.03.2012 17:26, schrieb Michael Mol: On Tue, Mar 13, 2012 at 12:11 PM, Florian Philipp li...@binarywings.net wrote: Am 13.03.2012 12:55, schrieb Valmor de Almeida: On 03/11/2012 02:29 PM, Florian Philipp wrote: Am 11.03.2012 16:38, schrieb Valmor de Almeida: Hello, I have not looked at encryption before and find myself in a situation that I have to encrypt my hard drive. I keep /, /boot, and swap outside LVM, everything else is under LVM. I think all I need to do is to encrypt /home which is under LVM. I use reiserfs. I would appreciate suggestion and pointers on what it is practical and simple in order to accomplish this task with a minimum of downtime. Thanks, -- Valmor Is it acceptable for you to have a commandline prompt for the password when booting? In that case you can use LUKS with the /etc/init.d/dmcrypt I think so. init script. /etc/conf.d/dmcrypt should contain some examples. As you want to encrypt an LVM volume, the lvm init script needs to be started before this. As I see it, there is no strict dependency between those two scripts. You can add this by adding this line to /etc/rc.conf: rc_dmcrypt_after=lvm For creating a LUKS-encrypted volume, look at http://en.gentoo-wiki.com/wiki/DM-Crypt Currently looking at this. You won't need most of what is written there; just section 9, Administering LUKS and the kernel config in section 2, Assumptions. Concerning downtime, I'm not aware of any solution that avoids copying the data over to the new volume. If downtime is absolutely critical, ask and we can work something out that minimizes the time. Regards, Florian Philipp Since I am planning to encrypt only home/ under LVM control, what kind of overhead should I expect? Thanks, What do you mean with overhead? CPU utilization? In that case the overhead is minimal, especially when you run a 64-bit kernel with the optimized AES kernel module. Rough guess: Latency. With encryption, you can't DMA disk data directly into a process's address space, because you need the decrypt hop. Good call. Wouldn't have thought of that. Try running bonnie++ on encrypted vs non-encrypted volumes. (Or not; I doubt you have the time and materials to do a good, meaningful set of time trials) Yeah, that sounds like something for which you need a very dull winter day. Besides, I've already lost a poorly cooled HDD on a benchmark. Sounds like something we can do at my LUG at one of our weekly socials. The part I don't know is how to set this kind of thing up and how to tune it; I don't want it to be like Microsoft's comparison of SQL Server against MySQL from a decade or so ago, where they didn't tune MySQL for their bench workload. -- :wq
Re: [gentoo-user] hard drive encryption
On Tue, Mar 13, 2012 at 05:11:47PM +0100, Florian Philipp wrote: Since I am planning to encrypt only home/ under LVM control, what kind of overhead should I expect? What do you mean with overhead? CPU utilization? In that case the overhead is minimal, especially when you run a 64-bit kernel with the optimized AES kernel module. Speaking of that... I always wondered what the exact difference was between AES and AES i586. I can gather myself that it's about optimisation for a specific architecture. But which one would be best for my i686 Core 2 Duo? -- Gruß | Greetings | Qapla' I forbid any use of my email addresses with Facebook services. A pessimist is an optimist who's given it some thought. pgp2QBsinY8SO.pgp Description: PGP signature
Re: [gentoo-user] hard drive encryption
Am 13.03.2012 18:45, schrieb Frank Steinmetzger: On Tue, Mar 13, 2012 at 05:11:47PM +0100, Florian Philipp wrote: Since I am planning to encrypt only home/ under LVM control, what kind of overhead should I expect? What do you mean with overhead? CPU utilization? In that case the overhead is minimal, especially when you run a 64-bit kernel with the optimized AES kernel module. Speaking of that... I always wondered what the exact difference was between AES and AES i586. I can gather myself that it's about optimisation for a specific architecture. But which one would be best for my i686 Core 2 Duo? From what I can see in the kernel sources, there is a generic AES implementation using nothing but portable C code and then there is aes-i586 assembler code with aes_glue C code. So I assume the i586 version is better for you --- unless GCC suddenly got a lot better at optimizing code. signature.asc Description: OpenPGP digital signature
Re: [gentoo-user] hard drive encryption
On Tue, Mar 13, 2012 at 2:06 PM, Florian Philipp li...@binarywings.net wrote: Am 13.03.2012 18:45, schrieb Frank Steinmetzger: On Tue, Mar 13, 2012 at 05:11:47PM +0100, Florian Philipp wrote: Since I am planning to encrypt only home/ under LVM control, what kind of overhead should I expect? What do you mean with overhead? CPU utilization? In that case the overhead is minimal, especially when you run a 64-bit kernel with the optimized AES kernel module. Speaking of that... I always wondered what the exact difference was between AES and AES i586. I can gather myself that it's about optimisation for a specific architecture. But which one would be best for my i686 Core 2 Duo? From what I can see in the kernel sources, there is a generic AES implementation using nothing but portable C code and then there is aes-i586 assembler code with aes_glue C code. So I assume the i586 version is better for you --- unless GCC suddenly got a lot better at optimizing code. Since when, exactly? GCC isn't the best compiler at optimization, but I fully expect current versions to produce better code for x86-64 than hand-tuned i586. Wider registers, more registers, crypto acceleration instructions and SIMD instructions are all very nice to have. I don't know the specifics of AES, though, or what kind of crypto algorithm it is, so it's entirely possible that one can't effectively parallelize it except in some relatively unique circumstances. -- :wq
Re: [gentoo-user] hard drive encryption
Am 13.03.2012 19:18, schrieb Michael Mol: On Tue, Mar 13, 2012 at 2:06 PM, Florian Philipp li...@binarywings.net wrote: Am 13.03.2012 18:45, schrieb Frank Steinmetzger: On Tue, Mar 13, 2012 at 05:11:47PM +0100, Florian Philipp wrote: Since I am planning to encrypt only home/ under LVM control, what kind of overhead should I expect? What do you mean with overhead? CPU utilization? In that case the overhead is minimal, especially when you run a 64-bit kernel with the optimized AES kernel module. Speaking of that... I always wondered what the exact difference was between AES and AES i586. I can gather myself that it's about optimisation for a specific architecture. But which one would be best for my i686 Core 2 Duo? From what I can see in the kernel sources, there is a generic AES implementation using nothing but portable C code and then there is aes-i586 assembler code with aes_glue C code. So I assume the i586 version is better for you --- unless GCC suddenly got a lot better at optimizing code. Since when, exactly? GCC isn't the best compiler at optimization, but I fully expect current versions to produce better code for x86-64 than hand-tuned i586. Wider registers, more registers, crypto acceleration instructions and SIMD instructions are all very nice to have. I don't know the specifics of AES, though, or what kind of crypto algorithm it is, so it's entirely possible that one can't effectively parallelize it except in some relatively unique circumstances. One sec. We are talking about an Core2 Duo running in 32bit mode, right? That's what the i686 reference in the question meant --- or at least, that's what I assumed. If we talk about 32bit mode, none of what you describe is available. Those additional registers and instructions are not accessible with i686 instructions. A Core 2 also has no AES instructions. Of course, GCC could make use of what it knows about the CPU, like number of parallel pipelines, pipeline depth, cache size, instructions added in i686 and so on. But even then I doubt it can outperform hand-tuned assembler, even if it is for a slightly older instruction set. If instead we are talking about an Core 2 Duo running in x86_64 mode, we should be talking about the aes-x86_64 module instead of the aes-i586 module and that makes use of the complete instruction set of the Core 2, including SSE2. Regards, Florian Philipp signature.asc Description: OpenPGP digital signature
Re: [gentoo-user] hard drive encryption
On 13 March 2012, at 18:18, Michael Mol wrote: ... So I assume the i586 version is better for you --- unless GCC suddenly got a lot better at optimizing code. Since when, exactly? GCC isn't the best compiler at optimization, but I fully expect current versions to produce better code for x86-64 than hand-tuned i586. Wider registers, more registers, crypto acceleration instructions and SIMD instructions are all very nice to have. I don't know the specifics of AES, though, or what kind of crypto algorithm it is, so it's entirely possible that one can't effectively parallelize it except in some relatively unique circumstances. Do you have much experience of writing assembler? I don't, and I'm not an expert on this, but I've read the odd blog article on this subject over the years. What I've read often has the programmer looking at the compiled gcc bytecode and examining what it does. The compiler might not care how many registers it uses, and thus a variable might find itself frequently swapped back into RAM; the programmer does not have any control over the compiler, and IIRC some flags reserve a register for degugging (IIRC -fomit-frame-pointer disables this). I think it's possible to use registers more efficiently by swapping them (??) or by using bitwise comparisons and other tricks. Assembler optimisation is only used on sections of code that are at the core of a loop - that are called hundreds or thousands (even millions?) of times during the program's execution. It's not for code, such as reading the .config file or initialisation, which is only called once. Because the code in the core of the loop is called so often, you don't have to achieve much of an optimisation for the aggregate to be much more considerable. The operations in question may only be constitute a few lines of C, or a handful of machine operations, so it boils down to an algorithm that a human programmer is capable of getting a grip on and comprehending. Whilst compilers are clearly more efficient for large programs, on this micro scale, humans are more clever and creative than machines. Encryption / decryption is an example of code that lends itself to this kind of optimisation. In particular AES was designed, I believe, to be amenable to implementation in this way. The reason for that was that it was desirable to have it run on embedded devices and on dedicated chips. So it boils down to a simple bitswap operation (??) - the plaintext is modified by the encryption key, input and output as a fast stream. Each byte goes in, each byte goes out, the same function performed on each one. Another operation that lends itself to assembler optimisation is video decoding - the video is encoded only once, and then may be played back hundreds or millions of times by different people. The same operations must be repeated a number of times on each frame, then c 25 - 60 frames are decoded per second, so at least 90,000 frames per hour. Again, the smallest optimisation is worthwhile. Stroller.
Re: [gentoo-user] hard drive encryption
On Tue, Mar 13, 2012 at 2:58 PM, Florian Philipp li...@binarywings.net wrote: Am 13.03.2012 19:18, schrieb Michael Mol: On Tue, Mar 13, 2012 at 2:06 PM, Florian Philipp li...@binarywings.net wrote: Am 13.03.2012 18:45, schrieb Frank Steinmetzger: On Tue, Mar 13, 2012 at 05:11:47PM +0100, Florian Philipp wrote: Since I am planning to encrypt only home/ under LVM control, what kind of overhead should I expect? What do you mean with overhead? CPU utilization? In that case the overhead is minimal, especially when you run a 64-bit kernel with the optimized AES kernel module. Speaking of that... I always wondered what the exact difference was between AES and AES i586. I can gather myself that it's about optimisation for a specific architecture. But which one would be best for my i686 Core 2 Duo? From what I can see in the kernel sources, there is a generic AES implementation using nothing but portable C code and then there is aes-i586 assembler code with aes_glue C code. So I assume the i586 version is better for you --- unless GCC suddenly got a lot better at optimizing code. Since when, exactly? GCC isn't the best compiler at optimization, but I fully expect current versions to produce better code for x86-64 than hand-tuned i586. Wider registers, more registers, crypto acceleration instructions and SIMD instructions are all very nice to have. I don't know the specifics of AES, though, or what kind of crypto algorithm it is, so it's entirely possible that one can't effectively parallelize it except in some relatively unique circumstances. One sec. We are talking about an Core2 Duo running in 32bit mode, right? That's what the i686 reference in the question meant --- or at least, that's what I assumed. I think you're right; I missed that part. If we talk about 32bit mode, none of what you describe is available. Those additional registers and instructions are not accessible with i686 instructions. A Core 2 also has no AES instructions. Of course, GCC could make use of what it knows about the CPU, like number of parallel pipelines, pipeline depth, cache size, instructions added in i686 and so on. But even then I doubt it can outperform hand-tuned assembler, even if it is for a slightly older instruction set. I'm still not sure why. I'll posit that some badly-written C could place constraints on the compiler's optimizer, but GCC should have little problem handling well-written C, separating semantics from syntax and finding good transforms of the original code to get proofably-same results. Unless I'm grossly overestimating the capabilities of its AST processing and optimization engine. If instead we are talking about an Core 2 Duo running in x86_64 mode, we should be talking about the aes-x86_64 module instead of the aes-i586 module and that makes use of the complete instruction set of the Core 2, including SSE2. FWIW, SSE2 is available on 32-bit processors; I have code in the field using SSE2 on Pentium 4s. -- :wq
Re: [gentoo-user] hard drive encryption
Am 13.03.2012 19:58, schrieb Florian Philipp: Am 13.03.2012 19:18, schrieb Michael Mol: On Tue, Mar 13, 2012 at 2:06 PM, Florian Philipp li...@binarywings.net wrote: Am 13.03.2012 18:45, schrieb Frank Steinmetzger: On Tue, Mar 13, 2012 at 05:11:47PM +0100, Florian Philipp wrote: Since I am planning to encrypt only home/ under LVM control, what kind of overhead should I expect? What do you mean with overhead? CPU utilization? In that case the overhead is minimal, especially when you run a 64-bit kernel with the optimized AES kernel module. Speaking of that... I always wondered what the exact difference was between AES and AES i586. I can gather myself that it's about optimisation for a specific architecture. But which one would be best for my i686 Core 2 Duo? From what I can see in the kernel sources, there is a generic AES implementation using nothing but portable C code and then there is aes-i586 assembler code with aes_glue C code. So I assume the i586 version is better for you --- unless GCC suddenly got a lot better at optimizing code. Since when, exactly? GCC isn't the best compiler at optimization, but I fully expect current versions to produce better code for x86-64 than hand-tuned i586. Wider registers, more registers, crypto acceleration instructions and SIMD instructions are all very nice to have. I don't know the specifics of AES, though, or what kind of crypto algorithm it is, so it's entirely possible that one can't effectively parallelize it except in some relatively unique circumstances. One sec. We are talking about an Core2 Duo running in 32bit mode, right? That's what the i686 reference in the question meant --- or at least, that's what I assumed. If we talk about 32bit mode, none of what you describe is available. Those additional registers and instructions are not accessible with i686 instructions. A Core 2 also has no AES instructions. Of course, GCC could make use of what it knows about the CPU, like number of parallel pipelines, pipeline depth, cache size, instructions added in i686 and so on. But even then I doubt it can outperform hand-tuned assembler, even if it is for a slightly older instruction set. P.S: I just looked up the differences in the instruction sets of i586 and i686. The only significant instruction added in i686 was a conditional move (CMOV). This helps to avoid condition jumps. However, in the aes-i586 code there are only two conditional jumps and they both just end the loop of encryption/decryption rounds for AES-128 and AES256, respectively. My assembler isn't perfect but I doubt you can optimize that away with a CMOV. If instead we are talking about an Core 2 Duo running in x86_64 mode, we should be talking about the aes-x86_64 module instead of the aes-i586 module and that makes use of the complete instruction set of the Core 2, including SSE2. Regards, Florian Philipp signature.asc Description: OpenPGP digital signature
Re: [gentoo-user] hard drive encryption
Am 13.03.2012 20:13, schrieb Michael Mol: On Tue, Mar 13, 2012 at 2:58 PM, Florian Philipp li...@binarywings.net wrote: Am 13.03.2012 19:18, schrieb Michael Mol: On Tue, Mar 13, 2012 at 2:06 PM, Florian Philipp li...@binarywings.net wrote: Am 13.03.2012 18:45, schrieb Frank Steinmetzger: On Tue, Mar 13, 2012 at 05:11:47PM +0100, Florian Philipp wrote: Since I am planning to encrypt only home/ under LVM control, what kind of overhead should I expect? What do you mean with overhead? CPU utilization? In that case the overhead is minimal, especially when you run a 64-bit kernel with the optimized AES kernel module. Speaking of that... I always wondered what the exact difference was between AES and AES i586. I can gather myself that it's about optimisation for a specific architecture. But which one would be best for my i686 Core 2 Duo? From what I can see in the kernel sources, there is a generic AES implementation using nothing but portable C code and then there is aes-i586 assembler code with aes_glue C code. So I assume the i586 version is better for you --- unless GCC suddenly got a lot better at optimizing code. Since when, exactly? GCC isn't the best compiler at optimization, but I fully expect current versions to produce better code for x86-64 than hand-tuned i586. Wider registers, more registers, crypto acceleration instructions and SIMD instructions are all very nice to have. I don't know the specifics of AES, though, or what kind of crypto algorithm it is, so it's entirely possible that one can't effectively parallelize it except in some relatively unique circumstances. One sec. We are talking about an Core2 Duo running in 32bit mode, right? That's what the i686 reference in the question meant --- or at least, that's what I assumed. I think you're right; I missed that part. If we talk about 32bit mode, none of what you describe is available. Those additional registers and instructions are not accessible with i686 instructions. A Core 2 also has no AES instructions. Of course, GCC could make use of what it knows about the CPU, like number of parallel pipelines, pipeline depth, cache size, instructions added in i686 and so on. But even then I doubt it can outperform hand-tuned assembler, even if it is for a slightly older instruction set. I'm still not sure why. I'll posit that some badly-written C could place constraints on the compiler's optimizer, but GCC should have little problem handling well-written C, separating semantics from syntax and finding good transforms of the original code to get proofably-same results. Unless I'm grossly overestimating the capabilities of its AST processing and optimization engine. Well, it's not /that/ good. Otherwise the Firefox ebuild wouldn't need a profiling run to allow the compiler to predict loop and jump certainties and so on. But, by all means, let's test it! It's not like we cannot. Unfortunately, I don't have a 32bit Gentoo machine at hand where I could test it right now. If instead we are talking about an Core 2 Duo running in x86_64 mode, we should be talking about the aes-x86_64 module instead of the aes-i586 module and that makes use of the complete instruction set of the Core 2, including SSE2. FWIW, SSE2 is available on 32-bit processors; I have code in the field using SSE2 on Pentium 4s. Um, yeah. I should have clarified that. I meant that for x86_64 machines, the compiler as well as the assembler programmer can safely assume that SSE2 is available. For generic i686 assembler code, you cannot. Regards, Florian Philipp signature.asc Description: OpenPGP digital signature
Re: [gentoo-user] hard drive encryption
On Tue, Mar 13, 2012 at 3:07 PM, Stroller strol...@stellar.eclipse.co.uk wrote: On 13 March 2012, at 18:18, Michael Mol wrote: ... So I assume the i586 version is better for you --- unless GCC suddenly got a lot better at optimizing code. Since when, exactly? GCC isn't the best compiler at optimization, but I fully expect current versions to produce better code for x86-64 than hand-tuned i586. Wider registers, more registers, crypto acceleration instructions and SIMD instructions are all very nice to have. I don't know the specifics of AES, though, or what kind of crypto algorithm it is, so it's entirely possible that one can't effectively parallelize it except in some relatively unique circumstances. Do you have much experience of writing assembler? I don't, and I'm not an expert on this, but I've read the odd blog article on this subject over the years. Similar level of experience here. I can read it, even debug it from time to time. A few regular bloggers on the subject are like candy. And I used to have pagetable.org, Ars's Technopaedia and specsheets for early x86 and motorola processors memorized. For the past couple years, I've been focusing on reading blogs of language and compiler authors, academics involved in proofing, testing and improving them, etc. What I've read often has the programmer looking at the compiled gcc bytecode and examining what it does. The compiler might not care how many registers it uses, and thus a variable might find itself frequently swapped back into RAM; the programmer does not have any control over the compiler, and IIRC some flags reserve a register for degugging (IIRC -fomit-frame-pointer disables this). I think it's possible to use registers more efficiently by swapping them (??) or by using bitwise comparisons and other tricks. Sure; it's cheaper to null out a register by XORing it with itself than setting it to 0. Assembler optimisation is only used on sections of code that are at the core of a loop - that are called hundreds or thousands (even millions?) of times during the program's execution. It's not for code, such as reading the .config file or initialisation, which is only called once. Because the code in the core of the loop is called so often, you don't have to achieve much of an optimisation for the aggregate to be much more considerable. Sure; optimize the hell out of the code where you spend most of your time. I wasn't aware that gcc passed up on safe optimization opportunities, though. The operations in question may only be constitute a few lines of C, or a handful of machine operations, so it boils down to an algorithm that a human programmer is capable of getting a grip on and comprehending. Whilst compilers are clearly more efficient for large programs, on this micro scale, humans are more clever and creative than machines. I disagree. With defined semantics for the source and target, a computer's cleverness is limited only by the computational and memory expense of its search algorithms. Humans get through this by making habit various optimizations, but those habits become less useful as additional paths and instructions are added. As system complexity increases, humans operate on personally cached techniques derived from simpler systems. I would expect very, very few people to be intimately familiar with the the majority of optimization possibilities present on an amdfam10 processor or a core2. Compiler's aren't necessarily familiar with them, either; they're just quicker at discovering them, given knowledge of the individual instructions and the rules of language semantics. Encryption / decryption is an example of code that lends itself to this kind of optimisation. In particular AES was designed, I believe, to be amenable to implementation in this way. The reason for that was that it was desirable to have it run on embedded devices and on dedicated chips. So it boils down to a simple bitswap operation (??) - the plaintext is modified by the encryption key, input and output as a fast stream. Each byte goes in, each byte goes out, the same function performed on each one. I'd be willing to posit that you're right here, though if there isn't a per-byte feedback mechanism, SIMD instructions would come into serious play. But I expect there's a per-byte feedback mechanism, so parallelization would likely come in the form of processing simultaneous streams. Another operation that lends itself to assembler optimisation is video decoding - the video is encoded only once, and then may be played back hundreds or millions of times by different people. The same operations must be repeated a number of times on each frame, then c 25 - 60 frames are decoded per second, so at least 90,000 frames per hour. Again, the smallest optimisation is worthwhile. Absolutely. My position, though, is that compilers are quicker and more capable of discovering optimization
Re: [gentoo-user] hard drive encryption
On Tue, Mar 13, 2012 at 3:30 PM, Florian Philipp li...@binarywings.net wrote: Am 13.03.2012 20:13, schrieb Michael Mol: On Tue, Mar 13, 2012 at 2:58 PM, Florian Philipp li...@binarywings.net wrote: Am 13.03.2012 19:18, schrieb Michael Mol: On Tue, Mar 13, 2012 at 2:06 PM, Florian Philipp li...@binarywings.net wrote: Am 13.03.2012 18:45, schrieb Frank Steinmetzger: On Tue, Mar 13, 2012 at 05:11:47PM +0100, Florian Philipp wrote: Since I am planning to encrypt only home/ under LVM control, what kind of overhead should I expect? What do you mean with overhead? CPU utilization? In that case the overhead is minimal, especially when you run a 64-bit kernel with the optimized AES kernel module. Speaking of that... I always wondered what the exact difference was between AES and AES i586. I can gather myself that it's about optimisation for a specific architecture. But which one would be best for my i686 Core 2 Duo? From what I can see in the kernel sources, there is a generic AES implementation using nothing but portable C code and then there is aes-i586 assembler code with aes_glue C code. So I assume the i586 version is better for you --- unless GCC suddenly got a lot better at optimizing code. Since when, exactly? GCC isn't the best compiler at optimization, but I fully expect current versions to produce better code for x86-64 than hand-tuned i586. Wider registers, more registers, crypto acceleration instructions and SIMD instructions are all very nice to have. I don't know the specifics of AES, though, or what kind of crypto algorithm it is, so it's entirely possible that one can't effectively parallelize it except in some relatively unique circumstances. One sec. We are talking about an Core2 Duo running in 32bit mode, right? That's what the i686 reference in the question meant --- or at least, that's what I assumed. I think you're right; I missed that part. If we talk about 32bit mode, none of what you describe is available. Those additional registers and instructions are not accessible with i686 instructions. A Core 2 also has no AES instructions. Of course, GCC could make use of what it knows about the CPU, like number of parallel pipelines, pipeline depth, cache size, instructions added in i686 and so on. But even then I doubt it can outperform hand-tuned assembler, even if it is for a slightly older instruction set. I'm still not sure why. I'll posit that some badly-written C could place constraints on the compiler's optimizer, but GCC should have little problem handling well-written C, separating semantics from syntax and finding good transforms of the original code to get proofably-same results. Unless I'm grossly overestimating the capabilities of its AST processing and optimization engine. Well, it's not /that/ good. Otherwise the Firefox ebuild wouldn't need a profiling run to allow the compiler to predict loop and jump certainties and so on. I was thinking more in the context of simple functions and mathematical operations. Loop probabilities? Yeah, that's a tough one. Nobody wants to stall a huge CPU pipeline. I remember when the NetBurst architecture came out. Intel cranked up the amount of die space dedicated to branch prediction... But, by all means, let's test it! It's not like we cannot. Unfortunately, I don't have a 32bit Gentoo machine at hand where I could test it right now. Now we're talking. :) Unfortunately, I don't have a 32-bit Gentoo environment available, either. Actually, I've never run Gentoo in a 32-bit envrionment. . -- :wq
Re: [gentoo-user] hard drive encryption
Am 13.03.2012 20:07, schrieb Stroller: On 13 March 2012, at 18:18, Michael Mol wrote: ... So I assume the i586 version is better for you --- unless GCC suddenly got a lot better at optimizing code. Since when, exactly? GCC isn't the best compiler at optimization, but I fully expect current versions to produce better code for x86-64 than hand-tuned i586. Wider registers, more registers, crypto acceleration instructions and SIMD instructions are all very nice to have. I don't know the specifics of AES, though, or what kind of crypto algorithm it is, so it's entirely possible that one can't effectively parallelize it except in some relatively unique circumstances. Do you have much experience of writing assembler? I don't, and I'm not an expert on this, but I've read the odd blog article on this subject over the years. What I've read often has the programmer looking at the compiled gcc bytecode and examining what it does. The compiler might not care how many registers it uses, and thus a variable might find itself frequently swapped back into RAM; the programmer does not have any control over the compiler, and IIRC some flags reserve a register for degugging (IIRC -fomit-frame-pointer disables this). I think it's possible to use registers more efficiently by swapping them (??) or by using bitwise comparisons and other tricks. You recall correctly about the frame pointer. Concerning the register usage: I'm no expert in this field, either, but I think the main issue is not simply register allocation but branch and exception prediction and so on. The compiler can either optimize for a seamless continuation if the jump happens or if it doesn't. A human or a just-in-time compiler can better handle these cases by predicting the outcome of -- in the case of a JIT -- analyze the outcome of the first few iterations. OT: IIRC, register reuse is also the main performance problem of state-of-the-art javascript engines, at the moment. Concerning the code they compile at runtime, they are nearly as good as `gcc -O0` but they have the same problem concerning registers (GCC with -O0 produces code that works exactly as you describe above: Storing the result after every computation and loading it again). Assembler optimisation is only used on sections of code that are at the core of a loop - that are called hundreds or thousands (even millions?) of times during the program's execution. It's not for code, such as reading the .config file or initialisation, which is only called once. Because the code in the core of the loop is called so often, you don't have to achieve much of an optimisation for the aggregate to be much more considerable. The operations in question may only be constitute a few lines of C, or a handful of machine operations, so it boils down to an algorithm that a human programmer is capable of getting a grip on and comprehending. Whilst compilers are clearly more efficient for large programs, on this micro scale, humans are more clever and creative than machines. Encryption / decryption is an example of code that lends itself to this kind of optimisation. In particular AES was designed, I believe, to be amenable to implementation in this way. The reason for that was that it was desirable to have it run on embedded devices and on dedicated chips. So it boils down to a simple bitswap operation (??) - the plaintext is modified by the encryption key, input and output as a fast stream. Each byte goes in, each byte goes out, the same function performed on each one. Well, sort of. First of, you are right, AES was designed with hardware implementations in mind. The algorithm boils down to a number of substitution and permutation networks and XOR operations (I assume that's what you meant with byte swap). If you look at the portable C code (/usr/src/linux/crypto/aes_generic.c), you can see that it mostly consists of lookup tables and XORs. The thing about each byte goes in, each byte goes out, however, is a bit wrong. What you think of is a stream cipher like RC4. AES is a block cipher. These use an (in this case 128 bit long) input string and XOR it with the encryption (sub-)key and shuffle it around according to the exact algorithm. Another operation that lends itself to assembler optimisation is video decoding - the video is encoded only once, and then may be played back hundreds or millions of times by different people. The same operations must be repeated a number of times on each frame, then c 25 - 60 frames are decoded per second, so at least 90,000 frames per hour. Again, the smallest optimisation is worthwhile. Stroller. signature.asc Description: OpenPGP digital signature
Re: [gentoo-user] hard drive encryption
Am 13.03.2012 20:38, schrieb Michael Mol: On Tue, Mar 13, 2012 at 3:07 PM, Stroller strol...@stellar.eclipse.co.uk wrote: On 13 March 2012, at 18:18, Michael Mol wrote: ... So I assume the i586 version is better for you --- unless GCC suddenly got a lot better at optimizing code. Since when, exactly? GCC isn't the best compiler at optimization, but I fully expect current versions to produce better code for x86-64 than hand-tuned i586. Wider registers, more registers, crypto acceleration instructions and SIMD instructions are all very nice to have. I don't know the specifics of AES, though, or what kind of crypto algorithm it is, so it's entirely possible that one can't effectively parallelize it except in some relatively unique circumstances. Do you have much experience of writing assembler? I don't, and I'm not an expert on this, but I've read the odd blog article on this subject over the years. Similar level of experience here. I can read it, even debug it from time to time. A few regular bloggers on the subject are like candy. And I used to have pagetable.org, Ars's Technopaedia and specsheets for early x86 and motorola processors memorized. For the past couple years, I've been focusing on reading blogs of language and compiler authors, academics involved in proofing, testing and improving them, etc. What I've read often has the programmer looking at the compiled gcc bytecode and examining what it does. The compiler might not care how many registers it uses, and thus a variable might find itself frequently swapped back into RAM; the programmer does not have any control over the compiler, and IIRC some flags reserve a register for degugging (IIRC -fomit-frame-pointer disables this). I think it's possible to use registers more efficiently by swapping them (??) or by using bitwise comparisons and other tricks. Sure; it's cheaper to null out a register by XORing it with itself than setting it to 0. Assembler optimisation is only used on sections of code that are at the core of a loop - that are called hundreds or thousands (even millions?) of times during the program's execution. It's not for code, such as reading the .config file or initialisation, which is only called once. Because the code in the core of the loop is called so often, you don't have to achieve much of an optimisation for the aggregate to be much more considerable. Sure; optimize the hell out of the code where you spend most of your time. I wasn't aware that gcc passed up on safe optimization opportunities, though. The operations in question may only be constitute a few lines of C, or a handful of machine operations, so it boils down to an algorithm that a human programmer is capable of getting a grip on and comprehending. Whilst compilers are clearly more efficient for large programs, on this micro scale, humans are more clever and creative than machines. I disagree. With defined semantics for the source and target, a computer's cleverness is limited only by the computational and memory expense of its search algorithms. Humans get through this by making habit various optimizations, but those habits become less useful as additional paths and instructions are added. As system complexity increases, humans operate on personally cached techniques derived from simpler systems. I would expect very, very few people to be intimately familiar with the the majority of optimization possibilities present on an amdfam10 processor or a core2. Compiler's aren't necessarily familiar with them, either; they're just quicker at discovering them, given knowledge of the individual instructions and the rules of language semantics. Encryption / decryption is an example of code that lends itself to this kind of optimisation. In particular AES was designed, I believe, to be amenable to implementation in this way. The reason for that was that it was desirable to have it run on embedded devices and on dedicated chips. So it boils down to a simple bitswap operation (??) - the plaintext is modified by the encryption key, input and output as a fast stream. Each byte goes in, each byte goes out, the same function performed on each one. I'd be willing to posit that you're right here, though if there isn't a per-byte feedback mechanism, SIMD instructions would come into serious play. But I expect there's a per-byte feedback mechanism, so parallelization would likely come in the form of processing simultaneous streams. Another operation that lends itself to assembler optimisation is video decoding - the video is encoded only once, and then may be played back hundreds or millions of times by different people. The same operations must be repeated a number of times on each frame, then c 25 - 60 frames are decoded per second, so at least 90,000 frames per hour. Again, the smallest optimisation is worthwhile. Absolutely. My position, though, is
Re: [gentoo-user] hard drive encryption
This thread is becoming ridiculously long. Just as a last side-note: One of the primary reasons that the IA64 architecture failed was that it relied on the compiler to optimize the code in order to exploit the massive instruction-level parallelism the CPU offered. Compilers never became good enough for the job. Of course, that happended in the nineties and we have much better compilers now (and x86 is easier to handle for compilers). But on the other hand: That was Intel's next big thing and if they couldn't make the compilers work, I have no reason to believe in their efficiency now. Regards, Florian Philipp Argh, just as I want to quit: I had the dates garbled up. IA64 came out in 2001 but the compiler design was of course a product of the late nineties and the design process started mid-nineties. signature.asc Description: OpenPGP digital signature
Re: [gentoo-user] hard drive encryption
On Tue, Mar 13, 2012 at 07:58:55PM +0100, Florian Philipp wrote: From what I can see in the kernel sources, there is a generic AES implementation using nothing but portable C code and then there is aes-i586 assembler code with aes_glue C code. So I assume the i586 version is better for you --- unless GCC suddenly got a lot better at optimizing code. Since when, exactly? GCC isn't the best compiler at optimization, but I fully expect current versions to produce better code for x86-64 than hand-tuned i586. Wider registers, more registers, crypto acceleration instructions and SIMD instructions are all very nice to have. I don't know the specifics of AES, though, or what kind of crypto algorithm it is, so it's entirely possible that one can't effectively parallelize it except in some relatively unique circumstances. One sec. We are talking about an Core2 Duo running in 32bit mode, right? That's what the i686 reference in the question meant --- or at least, that's what I assumed. Sorry, I forgot to mention that I'm running 32 bit, yes. I don't really see the benefit of 64 bit for my use case. For all I know, the executables get bigger and my poor old laptop will have to shuffle more bits around. :) However, hardware AES would be *the* reason for me to, instead of a netbook, buy something with an i5 in my next laptop, some time in the distant future. -- Gruß | Greetings | Qapla' I forbid any use of my email addresses with Facebook services. Ein Computer stürzt nur ab, wenn der Text lange nicht gespeichert wurde. pgpU3gNUbjZL6.pgp Description: PGP signature
Re: [gentoo-user] hard drive encryption
Am 11.03.2012 16:38, schrieb Valmor de Almeida: Hello, I have not looked at encryption before and find myself in a situation that I have to encrypt my hard drive. I keep /, /boot, and swap outside LVM, everything else is under LVM. I think all I need to do is to encrypt /home which is under LVM. I use reiserfs. I would appreciate suggestion and pointers on what it is practical and simple in order to accomplish this task with a minimum of downtime. Thanks, -- Valmor Is it acceptable for you to have a commandline prompt for the password when booting? In that case you can use LUKS with the /etc/init.d/dmcrypt init script. /etc/conf.d/dmcrypt should contain some examples. As you want to encrypt an LVM volume, the lvm init script needs to be started before this. As I see it, there is no strict dependency between those two scripts. You can add this by adding this line to /etc/rc.conf: rc_dmcrypt_after=lvm For creating a LUKS-encrypted volume, look at http://en.gentoo-wiki.com/wiki/DM-Crypt You won't need most of what is written there; just section 9, Administering LUKS and the kernel config in section 2, Assumptions. Concerning downtime, I'm not aware of any solution that avoids copying the data over to the new volume. If downtime is absolutely critical, ask and we can work something out that minimizes the time. Regards, Florian Philipp signature.asc Description: OpenPGP digital signature