Re: [gentoo-user] hard drive encryption

2012-03-13 Thread Valmor de Almeida
On 03/11/2012 02:29 PM, Florian Philipp wrote:
 Am 11.03.2012 16:38, schrieb Valmor de Almeida:

 Hello,

 I have not looked at encryption before and find myself in a situation
 that I have to encrypt my hard drive. I keep /, /boot, and swap outside
 LVM, everything else is under LVM. I think all I need to do is to
 encrypt /home which is under LVM. I use reiserfs.

 I would appreciate suggestion and pointers on what it is practical and
 simple in order to accomplish this task with a minimum of downtime.

 Thanks,

 --
 Valmor

 
 
 Is it acceptable for you to have a commandline prompt for the password
 when booting? In that case you can use LUKS with the /etc/init.d/dmcrypt

I think so.

 init script. /etc/conf.d/dmcrypt should contain some examples. As you
 want to encrypt an LVM volume, the lvm init script needs to be started
 before this. As I see it, there is no strict dependency between those
 two scripts. You can add this by adding this line to /etc/rc.conf:
 rc_dmcrypt_after=lvm
 
 For creating a LUKS-encrypted volume, look at
 http://en.gentoo-wiki.com/wiki/DM-Crypt

Currently looking at this.

 
 You won't need most of what is written there; just section 9,
 Administering LUKS and the kernel config in section 2, Assumptions.
 
 Concerning downtime, I'm not aware of any solution that avoids copying
 the data over to the new volume. If downtime is absolutely critical, ask
 and we can work something out that minimizes the time.
 
 Regards,
 Florian Philipp
 

Since I am planning to encrypt only home/ under LVM control, what kind
of overhead should I expect?

Thanks,

--
Valmor




Re: [gentoo-user] hard drive encryption

2012-03-13 Thread Florian Philipp
Am 13.03.2012 12:55, schrieb Valmor de Almeida:
 On 03/11/2012 02:29 PM, Florian Philipp wrote:
 Am 11.03.2012 16:38, schrieb Valmor de Almeida:

 Hello,

 I have not looked at encryption before and find myself in a situation
 that I have to encrypt my hard drive. I keep /, /boot, and swap outside
 LVM, everything else is under LVM. I think all I need to do is to
 encrypt /home which is under LVM. I use reiserfs.

 I would appreciate suggestion and pointers on what it is practical and
 simple in order to accomplish this task with a minimum of downtime.

 Thanks,

 --
 Valmor



 Is it acceptable for you to have a commandline prompt for the password
 when booting? In that case you can use LUKS with the /etc/init.d/dmcrypt
 
 I think so.
 
 init script. /etc/conf.d/dmcrypt should contain some examples. As you
 want to encrypt an LVM volume, the lvm init script needs to be started
 before this. As I see it, there is no strict dependency between those
 two scripts. You can add this by adding this line to /etc/rc.conf:
 rc_dmcrypt_after=lvm

 For creating a LUKS-encrypted volume, look at
 http://en.gentoo-wiki.com/wiki/DM-Crypt
 
 Currently looking at this.
 

 You won't need most of what is written there; just section 9,
 Administering LUKS and the kernel config in section 2, Assumptions.

 Concerning downtime, I'm not aware of any solution that avoids copying
 the data over to the new volume. If downtime is absolutely critical, ask
 and we can work something out that minimizes the time.

 Regards,
 Florian Philipp

 
 Since I am planning to encrypt only home/ under LVM control, what kind
 of overhead should I expect?
 
 Thanks,
 

What do you mean with overhead? CPU utilization? In that case the
overhead is minimal, especially when you run a 64-bit kernel with the
optimized AES kernel module.

Measured on a Core i5:
time cat Video/*.* /dev/null

real0m42.918s
user0m0.023s
sys 0m2.027s

That was a sequential read of roughly 3.5GB with empty caches. This
corresponds to the normal disk speed.

Regards,
Florian Philipp



signature.asc
Description: OpenPGP digital signature


Re: [gentoo-user] hard drive encryption

2012-03-13 Thread Michael Mol
On Tue, Mar 13, 2012 at 12:11 PM, Florian Philipp li...@binarywings.net wrote:
 Am 13.03.2012 12:55, schrieb Valmor de Almeida:
 On 03/11/2012 02:29 PM, Florian Philipp wrote:
 Am 11.03.2012 16:38, schrieb Valmor de Almeida:

 Hello,

 I have not looked at encryption before and find myself in a situation
 that I have to encrypt my hard drive. I keep /, /boot, and swap outside
 LVM, everything else is under LVM. I think all I need to do is to
 encrypt /home which is under LVM. I use reiserfs.

 I would appreciate suggestion and pointers on what it is practical and
 simple in order to accomplish this task with a minimum of downtime.

 Thanks,

 --
 Valmor



 Is it acceptable for you to have a commandline prompt for the password
 when booting? In that case you can use LUKS with the /etc/init.d/dmcrypt

 I think so.

 init script. /etc/conf.d/dmcrypt should contain some examples. As you
 want to encrypt an LVM volume, the lvm init script needs to be started
 before this. As I see it, there is no strict dependency between those
 two scripts. You can add this by adding this line to /etc/rc.conf:
 rc_dmcrypt_after=lvm

 For creating a LUKS-encrypted volume, look at
 http://en.gentoo-wiki.com/wiki/DM-Crypt

 Currently looking at this.


 You won't need most of what is written there; just section 9,
 Administering LUKS and the kernel config in section 2, Assumptions.

 Concerning downtime, I'm not aware of any solution that avoids copying
 the data over to the new volume. If downtime is absolutely critical, ask
 and we can work something out that minimizes the time.

 Regards,
 Florian Philipp


 Since I am planning to encrypt only home/ under LVM control, what kind
 of overhead should I expect?

 Thanks,


 What do you mean with overhead? CPU utilization? In that case the
 overhead is minimal, especially when you run a 64-bit kernel with the
 optimized AES kernel module.

Rough guess: Latency. With encryption, you can't DMA disk data
directly into a process's address space, because you need the decrypt
hop.

Try running bonnie++ on encrypted vs non-encrypted volumes. (Or not; I
doubt you have the time and materials to do a good, meaningful set of
time trials)

-- 
:wq



Re: [gentoo-user] hard drive encryption

2012-03-13 Thread Florian Philipp
Am 13.03.2012 17:26, schrieb Michael Mol:
 On Tue, Mar 13, 2012 at 12:11 PM, Florian Philipp li...@binarywings.net 
 wrote:
 Am 13.03.2012 12:55, schrieb Valmor de Almeida:
 On 03/11/2012 02:29 PM, Florian Philipp wrote:
 Am 11.03.2012 16:38, schrieb Valmor de Almeida:

 Hello,

 I have not looked at encryption before and find myself in a situation
 that I have to encrypt my hard drive. I keep /, /boot, and swap outside
 LVM, everything else is under LVM. I think all I need to do is to
 encrypt /home which is under LVM. I use reiserfs.

 I would appreciate suggestion and pointers on what it is practical and
 simple in order to accomplish this task with a minimum of downtime.

 Thanks,

 --
 Valmor



 Is it acceptable for you to have a commandline prompt for the password
 when booting? In that case you can use LUKS with the /etc/init.d/dmcrypt

 I think so.

 init script. /etc/conf.d/dmcrypt should contain some examples. As you
 want to encrypt an LVM volume, the lvm init script needs to be started
 before this. As I see it, there is no strict dependency between those
 two scripts. You can add this by adding this line to /etc/rc.conf:
 rc_dmcrypt_after=lvm

 For creating a LUKS-encrypted volume, look at
 http://en.gentoo-wiki.com/wiki/DM-Crypt

 Currently looking at this.


 You won't need most of what is written there; just section 9,
 Administering LUKS and the kernel config in section 2, Assumptions.

 Concerning downtime, I'm not aware of any solution that avoids copying
 the data over to the new volume. If downtime is absolutely critical, ask
 and we can work something out that minimizes the time.

 Regards,
 Florian Philipp


 Since I am planning to encrypt only home/ under LVM control, what kind
 of overhead should I expect?

 Thanks,


 What do you mean with overhead? CPU utilization? In that case the
 overhead is minimal, especially when you run a 64-bit kernel with the
 optimized AES kernel module.
 
 Rough guess: Latency. With encryption, you can't DMA disk data
 directly into a process's address space, because you need the decrypt
 hop.
 

Good call. Wouldn't have thought of that.

 Try running bonnie++ on encrypted vs non-encrypted volumes. (Or not; I
 doubt you have the time and materials to do a good, meaningful set of
 time trials)
 

Yeah, that sounds like something for which you need a very dull winter
day. Besides, I've already lost a poorly cooled HDD on a benchmark.

Regards,
Florian Philipp



signature.asc
Description: OpenPGP digital signature


Re: [gentoo-user] hard drive encryption

2012-03-13 Thread Neil Bothwick
On Tue, 13 Mar 2012 17:49:40 +0100, Florian Philipp wrote:

 Besides, I've already lost a poorly cooled HDD on a benchmark.

Better than losing it on real data.


-- 
Neil Bothwick

Why do they call it a TV set when you only get one?


signature.asc
Description: PGP signature


Re: [gentoo-user] hard drive encryption

2012-03-13 Thread Michael Mol
On Tue, Mar 13, 2012 at 12:49 PM, Florian Philipp li...@binarywings.net wrote:
 Am 13.03.2012 17:26, schrieb Michael Mol:
 On Tue, Mar 13, 2012 at 12:11 PM, Florian Philipp li...@binarywings.net 
 wrote:
 Am 13.03.2012 12:55, schrieb Valmor de Almeida:
 On 03/11/2012 02:29 PM, Florian Philipp wrote:
 Am 11.03.2012 16:38, schrieb Valmor de Almeida:

 Hello,

 I have not looked at encryption before and find myself in a situation
 that I have to encrypt my hard drive. I keep /, /boot, and swap outside
 LVM, everything else is under LVM. I think all I need to do is to
 encrypt /home which is under LVM. I use reiserfs.

 I would appreciate suggestion and pointers on what it is practical and
 simple in order to accomplish this task with a minimum of downtime.

 Thanks,

 --
 Valmor



 Is it acceptable for you to have a commandline prompt for the password
 when booting? In that case you can use LUKS with the /etc/init.d/dmcrypt

 I think so.

 init script. /etc/conf.d/dmcrypt should contain some examples. As you
 want to encrypt an LVM volume, the lvm init script needs to be started
 before this. As I see it, there is no strict dependency between those
 two scripts. You can add this by adding this line to /etc/rc.conf:
 rc_dmcrypt_after=lvm

 For creating a LUKS-encrypted volume, look at
 http://en.gentoo-wiki.com/wiki/DM-Crypt

 Currently looking at this.


 You won't need most of what is written there; just section 9,
 Administering LUKS and the kernel config in section 2, Assumptions.

 Concerning downtime, I'm not aware of any solution that avoids copying
 the data over to the new volume. If downtime is absolutely critical, ask
 and we can work something out that minimizes the time.

 Regards,
 Florian Philipp


 Since I am planning to encrypt only home/ under LVM control, what kind
 of overhead should I expect?

 Thanks,


 What do you mean with overhead? CPU utilization? In that case the
 overhead is minimal, especially when you run a 64-bit kernel with the
 optimized AES kernel module.

 Rough guess: Latency. With encryption, you can't DMA disk data
 directly into a process's address space, because you need the decrypt
 hop.


 Good call. Wouldn't have thought of that.

 Try running bonnie++ on encrypted vs non-encrypted volumes. (Or not; I
 doubt you have the time and materials to do a good, meaningful set of
 time trials)


 Yeah, that sounds like something for which you need a very dull winter
 day. Besides, I've already lost a poorly cooled HDD on a benchmark.

Sounds like something we can do at my LUG at one of our weekly
socials. The part I don't know is how to set this kind of thing up and
how to tune it; I don't want it to be like Microsoft's comparison of
SQL Server against MySQL from a decade or so ago, where they didn't
tune MySQL for their bench workload.

-- 
:wq



Re: [gentoo-user] hard drive encryption

2012-03-13 Thread Frank Steinmetzger
On Tue, Mar 13, 2012 at 05:11:47PM +0100, Florian Philipp wrote:

  Since I am planning to encrypt only home/ under LVM control, what kind
  of overhead should I expect?
 
 What do you mean with overhead? CPU utilization? In that case the
 overhead is minimal, especially when you run a 64-bit kernel with the
 optimized AES kernel module.

Speaking of that...
I always wondered what the exact difference was between AES and AES i586. I
can gather myself that it's about optimisation for a specific architecture.
But which one would be best for my i686 Core 2 Duo?
-- 
Gruß | Greetings | Qapla'
I forbid any use of my email addresses with Facebook services.

A pessimist is an optimist who's given it some thought.


pgp2QBsinY8SO.pgp
Description: PGP signature


Re: [gentoo-user] hard drive encryption

2012-03-13 Thread Florian Philipp
Am 13.03.2012 18:45, schrieb Frank Steinmetzger:
 On Tue, Mar 13, 2012 at 05:11:47PM +0100, Florian Philipp wrote:
 
 Since I am planning to encrypt only home/ under LVM control, what kind
 of overhead should I expect?

 What do you mean with overhead? CPU utilization? In that case the
 overhead is minimal, especially when you run a 64-bit kernel with the
 optimized AES kernel module.
 
 Speaking of that...
 I always wondered what the exact difference was between AES and AES i586. I
 can gather myself that it's about optimisation for a specific architecture.
 But which one would be best for my i686 Core 2 Duo?

From what I can see in the kernel sources, there is a generic AES
implementation using nothing but portable C code and then there is
aes-i586 assembler code with aes_glue C code. So I assume the i586
version is better for you --- unless GCC suddenly got a lot better at
optimizing code.



signature.asc
Description: OpenPGP digital signature


Re: [gentoo-user] hard drive encryption

2012-03-13 Thread Michael Mol
On Tue, Mar 13, 2012 at 2:06 PM, Florian Philipp li...@binarywings.net wrote:
 Am 13.03.2012 18:45, schrieb Frank Steinmetzger:
 On Tue, Mar 13, 2012 at 05:11:47PM +0100, Florian Philipp wrote:

 Since I am planning to encrypt only home/ under LVM control, what kind
 of overhead should I expect?

 What do you mean with overhead? CPU utilization? In that case the
 overhead is minimal, especially when you run a 64-bit kernel with the
 optimized AES kernel module.

 Speaking of that...
 I always wondered what the exact difference was between AES and AES i586. I
 can gather myself that it's about optimisation for a specific architecture.
 But which one would be best for my i686 Core 2 Duo?

 From what I can see in the kernel sources, there is a generic AES
 implementation using nothing but portable C code and then there is
 aes-i586 assembler code with aes_glue C code.


 So I assume the i586
 version is better for you --- unless GCC suddenly got a lot better at
 optimizing code.

Since when, exactly? GCC isn't the best compiler at optimization, but
I fully expect current versions to produce better code for x86-64 than
hand-tuned i586. Wider registers, more registers, crypto acceleration
instructions and SIMD instructions are all very nice to have. I don't
know the specifics of AES, though, or what kind of crypto algorithm it
is, so it's entirely possible that one can't effectively parallelize
it except in some relatively unique circumstances.

-- 
:wq



Re: [gentoo-user] hard drive encryption

2012-03-13 Thread Florian Philipp
Am 13.03.2012 19:18, schrieb Michael Mol:
 On Tue, Mar 13, 2012 at 2:06 PM, Florian Philipp li...@binarywings.net 
 wrote:
 Am 13.03.2012 18:45, schrieb Frank Steinmetzger:
 On Tue, Mar 13, 2012 at 05:11:47PM +0100, Florian Philipp wrote:

 Since I am planning to encrypt only home/ under LVM control, what kind
 of overhead should I expect?

 What do you mean with overhead? CPU utilization? In that case the
 overhead is minimal, especially when you run a 64-bit kernel with the
 optimized AES kernel module.

 Speaking of that...
 I always wondered what the exact difference was between AES and AES i586. I
 can gather myself that it's about optimisation for a specific architecture.
 But which one would be best for my i686 Core 2 Duo?

 From what I can see in the kernel sources, there is a generic AES
 implementation using nothing but portable C code and then there is
 aes-i586 assembler code with aes_glue C code.
 
 
 So I assume the i586
 version is better for you --- unless GCC suddenly got a lot better at
 optimizing code.
 
 Since when, exactly? GCC isn't the best compiler at optimization, but
 I fully expect current versions to produce better code for x86-64 than
 hand-tuned i586. Wider registers, more registers, crypto acceleration
 instructions and SIMD instructions are all very nice to have. I don't
 know the specifics of AES, though, or what kind of crypto algorithm it
 is, so it's entirely possible that one can't effectively parallelize
 it except in some relatively unique circumstances.
 

One sec. We are talking about an Core2 Duo running in 32bit mode, right?
That's what the i686 reference in the question meant --- or at least,
that's what I assumed.

If we talk about 32bit mode, none of what you describe is available.
Those additional registers and instructions are not accessible with i686
instructions. A Core 2 also has no AES instructions.

Of course, GCC could make use of what it knows about the CPU, like
number of parallel pipelines, pipeline depth, cache size, instructions
added in i686 and so on. But even then I doubt it can outperform
hand-tuned assembler, even if it is for a slightly older instruction set.

If instead we are talking about an Core 2 Duo running in x86_64 mode, we
should be talking about the aes-x86_64 module instead of the aes-i586
module and that makes use of the complete instruction set of the Core 2,
including SSE2.

Regards,
Florian Philipp



signature.asc
Description: OpenPGP digital signature


Re: [gentoo-user] hard drive encryption

2012-03-13 Thread Stroller

On 13 March 2012, at 18:18, Michael Mol wrote:
 ...
 So I assume the i586
 version is better for you --- unless GCC suddenly got a lot better at
 optimizing code.
 
 Since when, exactly? GCC isn't the best compiler at optimization, but
 I fully expect current versions to produce better code for x86-64 than
 hand-tuned i586. Wider registers, more registers, crypto acceleration
 instructions and SIMD instructions are all very nice to have. I don't
 know the specifics of AES, though, or what kind of crypto algorithm it
 is, so it's entirely possible that one can't effectively parallelize
 it except in some relatively unique circumstances.

Do you have much experience of writing assembler?

I don't, and I'm not an expert on this, but I've read the odd blog article on 
this subject over the years.

What I've read often has the programmer looking at the compiled gcc bytecode 
and examining what it does. The compiler might not care how many registers it 
uses, and thus a variable might find itself frequently swapped back into RAM; 
the programmer does not have any control over the compiler, and IIRC some flags 
reserve a register for degugging (IIRC -fomit-frame-pointer disables this). I 
think it's possible to use registers more efficiently by swapping them (??) or 
by using bitwise comparisons and other tricks. 

Assembler optimisation is only used on sections of code that are at the core of 
a loop - that are called hundreds or thousands (even millions?) of times during 
the program's execution. It's not for code, such as reading the .config file or 
initialisation, which is only called once. Because the code in the core of the 
loop is called so often, you don't have to achieve much of an optimisation for 
the aggregate to be much more considerable.

The operations in question may only be constitute a few lines of C, or a 
handful of machine operations, so it boils down to an algorithm that a human 
programmer is capable of getting a grip on and comprehending. Whilst compilers 
are clearly more efficient for large programs, on this micro scale, humans are 
more clever and creative than machines. 

Encryption / decryption is an example of code that lends itself to this kind of 
optimisation. In particular AES was designed, I believe, to be amenable to 
implementation in this way. The reason for that was that it was desirable to 
have it run on embedded devices and on dedicated chips. So it boils down to a 
simple bitswap operation (??) - the plaintext is modified by the encryption 
key, input and output as a fast stream. Each byte goes in, each byte goes out, 
the same function performed on each one.

Another operation that lends itself to assembler optimisation is video decoding 
- the video is encoded only once, and then may be played back hundreds or 
millions of times by different people. The same operations must be repeated a 
number of times on each frame, then c 25 - 60 frames are decoded per second, so 
at least 90,000 frames per hour. Again, the smallest optimisation is worthwhile.

Stroller.




Re: [gentoo-user] hard drive encryption

2012-03-13 Thread Michael Mol
On Tue, Mar 13, 2012 at 2:58 PM, Florian Philipp li...@binarywings.net wrote:
 Am 13.03.2012 19:18, schrieb Michael Mol:
 On Tue, Mar 13, 2012 at 2:06 PM, Florian Philipp li...@binarywings.net 
 wrote:
 Am 13.03.2012 18:45, schrieb Frank Steinmetzger:
 On Tue, Mar 13, 2012 at 05:11:47PM +0100, Florian Philipp wrote:

 Since I am planning to encrypt only home/ under LVM control, what kind
 of overhead should I expect?

 What do you mean with overhead? CPU utilization? In that case the
 overhead is minimal, especially when you run a 64-bit kernel with the
 optimized AES kernel module.

 Speaking of that...
 I always wondered what the exact difference was between AES and AES i586. I
 can gather myself that it's about optimisation for a specific architecture.
 But which one would be best for my i686 Core 2 Duo?

 From what I can see in the kernel sources, there is a generic AES
 implementation using nothing but portable C code and then there is
 aes-i586 assembler code with aes_glue C code.


 So I assume the i586
 version is better for you --- unless GCC suddenly got a lot better at
 optimizing code.

 Since when, exactly? GCC isn't the best compiler at optimization, but
 I fully expect current versions to produce better code for x86-64 than
 hand-tuned i586. Wider registers, more registers, crypto acceleration
 instructions and SIMD instructions are all very nice to have. I don't
 know the specifics of AES, though, or what kind of crypto algorithm it
 is, so it's entirely possible that one can't effectively parallelize
 it except in some relatively unique circumstances.


 One sec. We are talking about an Core2 Duo running in 32bit mode, right?
 That's what the i686 reference in the question meant --- or at least,
 that's what I assumed.

I think you're right; I missed that part.


 If we talk about 32bit mode, none of what you describe is available.
 Those additional registers and instructions are not accessible with i686
 instructions. A Core 2 also has no AES instructions.

 Of course, GCC could make use of what it knows about the CPU, like
 number of parallel pipelines, pipeline depth, cache size, instructions
 added in i686 and so on. But even then I doubt it can outperform
 hand-tuned assembler, even if it is for a slightly older instruction set.

I'm still not sure why. I'll posit that some badly-written C could
place constraints on the compiler's optimizer, but GCC should have
little problem handling well-written C, separating semantics from
syntax and finding good transforms of the original code to get
proofably-same results. Unless I'm grossly overestimating the
capabilities of its AST processing and optimization engine.


 If instead we are talking about an Core 2 Duo running in x86_64 mode, we
 should be talking about the aes-x86_64 module instead of the aes-i586
 module and that makes use of the complete instruction set of the Core 2,
 including SSE2.

FWIW, SSE2 is available on 32-bit processors; I have code in the field
using SSE2 on Pentium 4s.

-- 
:wq



Re: [gentoo-user] hard drive encryption

2012-03-13 Thread Florian Philipp
Am 13.03.2012 19:58, schrieb Florian Philipp:
 Am 13.03.2012 19:18, schrieb Michael Mol:
 On Tue, Mar 13, 2012 at 2:06 PM, Florian Philipp li...@binarywings.net 
 wrote:
 Am 13.03.2012 18:45, schrieb Frank Steinmetzger:
 On Tue, Mar 13, 2012 at 05:11:47PM +0100, Florian Philipp wrote:

 Since I am planning to encrypt only home/ under LVM control, what kind
 of overhead should I expect?

 What do you mean with overhead? CPU utilization? In that case the
 overhead is minimal, especially when you run a 64-bit kernel with the
 optimized AES kernel module.

 Speaking of that...
 I always wondered what the exact difference was between AES and AES i586. I
 can gather myself that it's about optimisation for a specific architecture.
 But which one would be best for my i686 Core 2 Duo?

 From what I can see in the kernel sources, there is a generic AES
 implementation using nothing but portable C code and then there is
 aes-i586 assembler code with aes_glue C code.


 So I assume the i586
 version is better for you --- unless GCC suddenly got a lot better at
 optimizing code.

 Since when, exactly? GCC isn't the best compiler at optimization, but
 I fully expect current versions to produce better code for x86-64 than
 hand-tuned i586. Wider registers, more registers, crypto acceleration
 instructions and SIMD instructions are all very nice to have. I don't
 know the specifics of AES, though, or what kind of crypto algorithm it
 is, so it's entirely possible that one can't effectively parallelize
 it except in some relatively unique circumstances.

 
 One sec. We are talking about an Core2 Duo running in 32bit mode, right?
 That's what the i686 reference in the question meant --- or at least,
 that's what I assumed.
 
 If we talk about 32bit mode, none of what you describe is available.
 Those additional registers and instructions are not accessible with i686
 instructions. A Core 2 also has no AES instructions.
 
 Of course, GCC could make use of what it knows about the CPU, like
 number of parallel pipelines, pipeline depth, cache size, instructions
 added in i686 and so on. But even then I doubt it can outperform
 hand-tuned assembler, even if it is for a slightly older instruction set.
 

P.S: I just looked up the differences in the instruction sets of i586
and i686. The only significant instruction added in i686 was a
conditional move (CMOV). This helps to avoid condition jumps. However,
in the aes-i586 code there are only two conditional jumps and they both
just end the loop of encryption/decryption rounds for AES-128 and
AES256, respectively. My assembler isn't perfect but I doubt you can
optimize that away with a CMOV.

 If instead we are talking about an Core 2 Duo running in x86_64 mode, we
 should be talking about the aes-x86_64 module instead of the aes-i586
 module and that makes use of the complete instruction set of the Core 2,
 including SSE2.
 
 Regards,
 Florian Philipp




signature.asc
Description: OpenPGP digital signature


Re: [gentoo-user] hard drive encryption

2012-03-13 Thread Florian Philipp
Am 13.03.2012 20:13, schrieb Michael Mol:
 On Tue, Mar 13, 2012 at 2:58 PM, Florian Philipp li...@binarywings.net 
 wrote:
 Am 13.03.2012 19:18, schrieb Michael Mol:
 On Tue, Mar 13, 2012 at 2:06 PM, Florian Philipp li...@binarywings.net 
 wrote:
 Am 13.03.2012 18:45, schrieb Frank Steinmetzger:
 On Tue, Mar 13, 2012 at 05:11:47PM +0100, Florian Philipp wrote:

 Since I am planning to encrypt only home/ under LVM control, what kind
 of overhead should I expect?

 What do you mean with overhead? CPU utilization? In that case the
 overhead is minimal, especially when you run a 64-bit kernel with the
 optimized AES kernel module.

 Speaking of that...
 I always wondered what the exact difference was between AES and AES i586. 
 I
 can gather myself that it's about optimisation for a specific 
 architecture.
 But which one would be best for my i686 Core 2 Duo?

 From what I can see in the kernel sources, there is a generic AES
 implementation using nothing but portable C code and then there is
 aes-i586 assembler code with aes_glue C code.


 So I assume the i586
 version is better for you --- unless GCC suddenly got a lot better at
 optimizing code.

 Since when, exactly? GCC isn't the best compiler at optimization, but
 I fully expect current versions to produce better code for x86-64 than
 hand-tuned i586. Wider registers, more registers, crypto acceleration
 instructions and SIMD instructions are all very nice to have. I don't
 know the specifics of AES, though, or what kind of crypto algorithm it
 is, so it's entirely possible that one can't effectively parallelize
 it except in some relatively unique circumstances.


 One sec. We are talking about an Core2 Duo running in 32bit mode, right?
 That's what the i686 reference in the question meant --- or at least,
 that's what I assumed.
 
 I think you're right; I missed that part.
 

 If we talk about 32bit mode, none of what you describe is available.
 Those additional registers and instructions are not accessible with i686
 instructions. A Core 2 also has no AES instructions.

 Of course, GCC could make use of what it knows about the CPU, like
 number of parallel pipelines, pipeline depth, cache size, instructions
 added in i686 and so on. But even then I doubt it can outperform
 hand-tuned assembler, even if it is for a slightly older instruction set.
 
 I'm still not sure why. I'll posit that some badly-written C could
 place constraints on the compiler's optimizer, but GCC should have
 little problem handling well-written C, separating semantics from
 syntax and finding good transforms of the original code to get
 proofably-same results. Unless I'm grossly overestimating the
 capabilities of its AST processing and optimization engine.
 

Well, it's not /that/ good. Otherwise the Firefox ebuild wouldn't need a
profiling run to allow the compiler to predict loop and jump certainties
and so on.

But, by all means, let's test it! It's not like we cannot.
Unfortunately, I don't have a 32bit Gentoo machine at hand where I could
test it right now.


 If instead we are talking about an Core 2 Duo running in x86_64 mode, we
 should be talking about the aes-x86_64 module instead of the aes-i586
 module and that makes use of the complete instruction set of the Core 2,
 including SSE2.
 
 FWIW, SSE2 is available on 32-bit processors; I have code in the field
 using SSE2 on Pentium 4s.
 

Um, yeah. I should have clarified that. I meant that for x86_64
machines, the compiler as well as the assembler programmer can safely
assume that SSE2 is available. For generic i686 assembler code, you cannot.

Regards,
Florian Philipp



signature.asc
Description: OpenPGP digital signature


Re: [gentoo-user] hard drive encryption

2012-03-13 Thread Michael Mol
On Tue, Mar 13, 2012 at 3:07 PM, Stroller
strol...@stellar.eclipse.co.uk wrote:

 On 13 March 2012, at 18:18, Michael Mol wrote:
 ...
 So I assume the i586
 version is better for you --- unless GCC suddenly got a lot better at
 optimizing code.

 Since when, exactly? GCC isn't the best compiler at optimization, but
 I fully expect current versions to produce better code for x86-64 than
 hand-tuned i586. Wider registers, more registers, crypto acceleration
 instructions and SIMD instructions are all very nice to have. I don't
 know the specifics of AES, though, or what kind of crypto algorithm it
 is, so it's entirely possible that one can't effectively parallelize
 it except in some relatively unique circumstances.

 Do you have much experience of writing assembler?

 I don't, and I'm not an expert on this, but I've read the odd blog article on 
 this subject over the years.

Similar level of experience here. I can read it, even debug it from
time to time. A few regular bloggers on the subject are like candy.
And I used to have pagetable.org, Ars's Technopaedia and specsheets
for early x86 and motorola processors memorized. For the past couple
years, I've been focusing on reading blogs of language and compiler
authors, academics involved in proofing, testing and improving them,
etc.


 What I've read often has the programmer looking at the compiled gcc bytecode 
 and examining what it does. The compiler might not care how many registers it 
 uses, and thus a variable might find itself frequently swapped back into RAM; 
 the programmer does not have any control over the compiler, and IIRC some 
 flags reserve a register for degugging (IIRC -fomit-frame-pointer disables 
 this). I think it's possible to use registers more efficiently by swapping 
 them (??) or by using bitwise comparisons and other tricks.

Sure; it's cheaper to null out a register by XORing it with itself
than setting it to 0.


 Assembler optimisation is only used on sections of code that are at the core 
 of a loop - that are called hundreds or thousands (even millions?) of times 
 during the program's execution. It's not for code, such as reading the 
 .config file or initialisation, which is only called once. Because the code 
 in the core of the loop is called so often, you don't have to achieve much of 
 an optimisation for the aggregate to be much more considerable.

Sure; optimize the hell out of the code where you spend most of your
time. I wasn't aware that gcc passed up on safe optimization
opportunities, though.


 The operations in question may only be constitute a few lines of C, or a 
 handful of machine operations, so it boils down to an algorithm that a human 
 programmer is capable of getting a grip on and comprehending. Whilst 
 compilers are clearly more efficient for large programs, on this micro scale, 
 humans are more clever and creative than machines.

I disagree. With defined semantics for the source and target, a
computer's cleverness is limited only by the computational and memory
expense of its search algorithms. Humans get through this by making
habit various optimizations, but those habits become less useful as
additional paths and instructions are added. As system complexity
increases, humans operate on personally cached techniques derived from
simpler systems. I would expect very, very few people to be intimately
familiar with the the majority of optimization possibilities present
on an amdfam10 processor or a core2. Compiler's aren't necessarily
familiar with them, either; they're just quicker at discovering them,
given knowledge of the individual instructions and the rules of
language semantics.


 Encryption / decryption is an example of code that lends itself to this kind 
 of optimisation. In particular AES was designed, I believe, to be amenable to 
 implementation in this way. The reason for that was that it was desirable to 
 have it run on embedded devices and on dedicated chips. So it boils down to a 
 simple bitswap operation (??) - the plaintext is modified by the encryption 
 key, input and output as a fast stream. Each byte goes in, each byte goes 
 out, the same function performed on each one.

I'd be willing to posit that you're right here, though if there isn't
a per-byte feedback mechanism, SIMD instructions would come into
serious play. But I expect there's a per-byte feedback mechanism, so
parallelization would likely come in the form of processing
simultaneous streams.


 Another operation that lends itself to assembler optimisation is video 
 decoding - the video is encoded only once, and then may be played back 
 hundreds or millions of times by different people. The same operations must 
 be repeated a number of times on each frame, then c 25 - 60 frames are 
 decoded per second, so at least 90,000 frames per hour. Again, the smallest 
 optimisation is worthwhile.

Absolutely. My position, though, is that compilers are quicker and
more capable of discovering optimization 

Re: [gentoo-user] hard drive encryption

2012-03-13 Thread Michael Mol
On Tue, Mar 13, 2012 at 3:30 PM, Florian Philipp li...@binarywings.net wrote:
 Am 13.03.2012 20:13, schrieb Michael Mol:
 On Tue, Mar 13, 2012 at 2:58 PM, Florian Philipp li...@binarywings.net 
 wrote:
 Am 13.03.2012 19:18, schrieb Michael Mol:
 On Tue, Mar 13, 2012 at 2:06 PM, Florian Philipp li...@binarywings.net 
 wrote:
 Am 13.03.2012 18:45, schrieb Frank Steinmetzger:
 On Tue, Mar 13, 2012 at 05:11:47PM +0100, Florian Philipp wrote:

 Since I am planning to encrypt only home/ under LVM control, what kind
 of overhead should I expect?

 What do you mean with overhead? CPU utilization? In that case the
 overhead is minimal, especially when you run a 64-bit kernel with the
 optimized AES kernel module.

 Speaking of that...
 I always wondered what the exact difference was between AES and AES 
 i586. I
 can gather myself that it's about optimisation for a specific 
 architecture.
 But which one would be best for my i686 Core 2 Duo?

 From what I can see in the kernel sources, there is a generic AES
 implementation using nothing but portable C code and then there is
 aes-i586 assembler code with aes_glue C code.


 So I assume the i586
 version is better for you --- unless GCC suddenly got a lot better at
 optimizing code.

 Since when, exactly? GCC isn't the best compiler at optimization, but
 I fully expect current versions to produce better code for x86-64 than
 hand-tuned i586. Wider registers, more registers, crypto acceleration
 instructions and SIMD instructions are all very nice to have. I don't
 know the specifics of AES, though, or what kind of crypto algorithm it
 is, so it's entirely possible that one can't effectively parallelize
 it except in some relatively unique circumstances.


 One sec. We are talking about an Core2 Duo running in 32bit mode, right?
 That's what the i686 reference in the question meant --- or at least,
 that's what I assumed.

 I think you're right; I missed that part.


 If we talk about 32bit mode, none of what you describe is available.
 Those additional registers and instructions are not accessible with i686
 instructions. A Core 2 also has no AES instructions.

 Of course, GCC could make use of what it knows about the CPU, like
 number of parallel pipelines, pipeline depth, cache size, instructions
 added in i686 and so on. But even then I doubt it can outperform
 hand-tuned assembler, even if it is for a slightly older instruction set.

 I'm still not sure why. I'll posit that some badly-written C could
 place constraints on the compiler's optimizer, but GCC should have
 little problem handling well-written C, separating semantics from
 syntax and finding good transforms of the original code to get
 proofably-same results. Unless I'm grossly overestimating the
 capabilities of its AST processing and optimization engine.


 Well, it's not /that/ good. Otherwise the Firefox ebuild wouldn't need a
 profiling run to allow the compiler to predict loop and jump certainties
 and so on.

I was thinking more in the context of simple functions and
mathematical operations. Loop probabilities? Yeah, that's a tough one.
Nobody wants to stall a huge CPU pipeline. I remember when the
NetBurst architecture came out. Intel cranked up the amount of die
space dedicated to branch prediction...


 But, by all means, let's test it! It's not like we cannot.
 Unfortunately, I don't have a 32bit Gentoo machine at hand where I could
 test it right now.

Now we're talking. :)

Unfortunately, I don't have a 32-bit Gentoo environment available,
either. Actually, I've never run Gentoo in a 32-bit envrionment. .

-- 
:wq



Re: [gentoo-user] hard drive encryption

2012-03-13 Thread Florian Philipp
Am 13.03.2012 20:07, schrieb Stroller:
 
 On 13 March 2012, at 18:18, Michael Mol wrote:
 ...
 So I assume the i586 version is better for you --- unless GCC
 suddenly got a lot better at optimizing code.
 
 Since when, exactly? GCC isn't the best compiler at optimization,
 but I fully expect current versions to produce better code for
 x86-64 than hand-tuned i586. Wider registers, more registers,
 crypto acceleration instructions and SIMD instructions are all very
 nice to have. I don't know the specifics of AES, though, or what
 kind of crypto algorithm it is, so it's entirely possible that one
 can't effectively parallelize it except in some relatively unique
 circumstances.
 
 Do you have much experience of writing assembler?
 
 I don't, and I'm not an expert on this, but I've read the odd blog
 article on this subject over the years.
 
 What I've read often has the programmer looking at the compiled gcc
 bytecode and examining what it does. The compiler might not care how
 many registers it uses, and thus a variable might find itself
 frequently swapped back into RAM; the programmer does not have any
 control over the compiler, and IIRC some flags reserve a register for
 degugging (IIRC -fomit-frame-pointer disables this). I think it's
 possible to use registers more efficiently by swapping them (??) or
 by using bitwise comparisons and other tricks.
 

You recall correctly about the frame pointer.

Concerning the register usage: I'm no expert in this field, either, but
I think the main issue is not simply register allocation but branch and
exception prediction and so on. The compiler can either optimize for a
seamless continuation if the jump happens or if it doesn't. A human or a
just-in-time compiler can better handle these cases by predicting the
outcome of -- in the case of a JIT -- analyze the outcome of the first
few iterations.

OT: IIRC, register reuse is also the main performance problem of
state-of-the-art javascript engines, at the moment. Concerning the code
they compile at runtime, they are nearly as good as `gcc -O0` but they
have the same problem concerning registers (GCC with -O0 produces code
that works exactly as you describe above: Storing the result after every
computation and loading it again).

 Assembler optimisation is only used on sections of code that are at
 the core of a loop - that are called hundreds or thousands (even
 millions?) of times during the program's execution. It's not for
 code, such as reading the .config file or initialisation, which is
 only called once. Because the code in the core of the loop is called
 so often, you don't have to achieve much of an optimisation for the
 aggregate to be much more considerable.
 
 The operations in question may only be constitute a few lines of C,
 or a handful of machine operations, so it boils down to an algorithm
 that a human programmer is capable of getting a grip on and
 comprehending. Whilst compilers are clearly more efficient for large
 programs, on this micro scale, humans are more clever and creative
 than machines.
 
 Encryption / decryption is an example of code that lends itself to
 this kind of optimisation. In particular AES was designed, I believe,
 to be amenable to implementation in this way. The reason for that was
 that it was desirable to have it run on embedded devices and on
 dedicated chips. So it boils down to a simple bitswap operation (??)
 - the plaintext is modified by the encryption key, input and output
 as a fast stream. Each byte goes in, each byte goes out, the same
 function performed on each one.
 

Well, sort of. First of, you are right, AES was designed with hardware
implementations in mind.

The algorithm boils down to a number of substitution and permutation
networks and XOR operations (I assume that's what you meant with byte
swap). If you look at the portable C code
(/usr/src/linux/crypto/aes_generic.c), you can see that it mostly
consists of lookup tables and XORs.

The thing about each byte goes in, each byte goes out, however, is a
bit wrong. What you think of is a stream cipher like RC4. AES is a block
cipher. These use an (in this case 128 bit long) input string and XOR it
with the encryption (sub-)key and shuffle it around according to the
exact algorithm.

 Another operation that lends itself to assembler optimisation is
 video decoding - the video is encoded only once, and then may be
 played back hundreds or millions of times by different people. The
 same operations must be repeated a number of times on each frame,
 then c 25 - 60 frames are decoded per second, so at least 90,000
 frames per hour. Again, the smallest optimisation is worthwhile.
 
 Stroller.
 
 




signature.asc
Description: OpenPGP digital signature


Re: [gentoo-user] hard drive encryption

2012-03-13 Thread Florian Philipp
Am 13.03.2012 20:38, schrieb Michael Mol:
 On Tue, Mar 13, 2012 at 3:07 PM, Stroller 
 strol...@stellar.eclipse.co.uk wrote:
 
 On 13 March 2012, at 18:18, Michael Mol wrote:
 ...
 So I assume the i586 version is better for you --- unless GCC
 suddenly got a lot better at optimizing code.
 
 Since when, exactly? GCC isn't the best compiler at optimization,
 but I fully expect current versions to produce better code for
 x86-64 than hand-tuned i586. Wider registers, more registers,
 crypto acceleration instructions and SIMD instructions are all
 very nice to have. I don't know the specifics of AES, though, or
 what kind of crypto algorithm it is, so it's entirely possible
 that one can't effectively parallelize it except in some
 relatively unique circumstances.
 
 Do you have much experience of writing assembler?
 
 I don't, and I'm not an expert on this, but I've read the odd blog
 article on this subject over the years.
 
 Similar level of experience here. I can read it, even debug it from 
 time to time. A few regular bloggers on the subject are like candy. 
 And I used to have pagetable.org, Ars's Technopaedia and specsheets 
 for early x86 and motorola processors memorized. For the past couple 
 years, I've been focusing on reading blogs of language and compiler 
 authors, academics involved in proofing, testing and improving them, 
 etc.
 
 
 What I've read often has the programmer looking at the compiled gcc
 bytecode and examining what it does. The compiler might not care
 how many registers it uses, and thus a variable might find itself
 frequently swapped back into RAM; the programmer does not have any
 control over the compiler, and IIRC some flags reserve a register
 for degugging (IIRC -fomit-frame-pointer disables this). I think
 it's possible to use registers more efficiently by swapping them
 (??) or by using bitwise comparisons and other tricks.
 
 Sure; it's cheaper to null out a register by XORing it with itself 
 than setting it to 0.
 
 
 Assembler optimisation is only used on sections of code that are at
 the core of a loop - that are called hundreds or thousands (even
 millions?) of times during the program's execution. It's not for
 code, such as reading the .config file or initialisation, which is
 only called once. Because the code in the core of the loop is
 called so often, you don't have to achieve much of an optimisation
 for the aggregate to be much more considerable.
 
 Sure; optimize the hell out of the code where you spend most of your 
 time. I wasn't aware that gcc passed up on safe optimization 
 opportunities, though.
 
 
 The operations in question may only be constitute a few lines of C,
 or a handful of machine operations, so it boils down to an
 algorithm that a human programmer is capable of getting a grip on
 and comprehending. Whilst compilers are clearly more efficient for
 large programs, on this micro scale, humans are more clever and
 creative than machines.
 
 I disagree. With defined semantics for the source and target, a 
 computer's cleverness is limited only by the computational and
 memory expense of its search algorithms. Humans get through this by
 making habit various optimizations, but those habits become less
 useful as additional paths and instructions are added. As system
 complexity increases, humans operate on personally cached techniques
 derived from simpler systems. I would expect very, very few people to
 be intimately familiar with the the majority of optimization
 possibilities present on an amdfam10 processor or a core2. Compiler's
 aren't necessarily familiar with them, either; they're just quicker
 at discovering them, given knowledge of the individual instructions
 and the rules of language semantics.
 
 
 Encryption / decryption is an example of code that lends itself to
 this kind of optimisation. In particular AES was designed, I
 believe, to be amenable to implementation in this way. The reason
 for that was that it was desirable to have it run on embedded
 devices and on dedicated chips. So it boils down to a simple
 bitswap operation (??) - the plaintext is modified by the
 encryption key, input and output as a fast stream. Each byte goes
 in, each byte goes out, the same function performed on each one.
 
 I'd be willing to posit that you're right here, though if there
 isn't a per-byte feedback mechanism, SIMD instructions would come
 into serious play. But I expect there's a per-byte feedback
 mechanism, so parallelization would likely come in the form of
 processing simultaneous streams.
 
 
 Another operation that lends itself to assembler optimisation is
 video decoding - the video is encoded only once, and then may be
 played back hundreds or millions of times by different people. The
 same operations must be repeated a number of times on each frame,
 then c 25 - 60 frames are decoded per second, so at least 90,000
 frames per hour. Again, the smallest optimisation is worthwhile.
 
 Absolutely. My position, though, is 

Re: [gentoo-user] hard drive encryption

2012-03-13 Thread Florian Philipp
 
 This thread is becoming ridiculously long. Just as a last side-note:
 
 One of the primary reasons that the IA64 architecture failed was that it
 relied on the compiler to optimize the code in order to exploit the
 massive instruction-level parallelism the CPU offered. Compilers never
 became good enough for the job. Of course, that happended in the
 nineties and we have much better compilers now (and x86 is easier to
 handle for compilers). But on the other hand: That was Intel's next big
 thing and if they couldn't make the compilers work, I have no reason to
 believe in their efficiency now.
 
 Regards,
 Florian Philipp

Argh, just as I want to quit: I had the dates garbled up. IA64 came out
in 2001 but the compiler design was of course a product of the late
nineties and the design process started mid-nineties.



signature.asc
Description: OpenPGP digital signature


Re: [gentoo-user] hard drive encryption

2012-03-13 Thread Frank Steinmetzger
On Tue, Mar 13, 2012 at 07:58:55PM +0100, Florian Philipp wrote:

  From what I can see in the kernel sources, there is a generic AES
  implementation using nothing but portable C code and then there is
  aes-i586 assembler code with aes_glue C code.
  
  So I assume the i586
  version is better for you --- unless GCC suddenly got a lot better at
  optimizing code.
  
  Since when, exactly? GCC isn't the best compiler at optimization, but
  I fully expect current versions to produce better code for x86-64 than
  hand-tuned i586. Wider registers, more registers, crypto acceleration
  instructions and SIMD instructions are all very nice to have. I don't
  know the specifics of AES, though, or what kind of crypto algorithm it
  is, so it's entirely possible that one can't effectively parallelize
  it except in some relatively unique circumstances.
  
 
 One sec. We are talking about an Core2 Duo running in 32bit mode, right?
 That's what the i686 reference in the question meant --- or at least,
 that's what I assumed.

Sorry, I forgot to mention that I'm running 32 bit, yes. I don't really see
the benefit of 64 bit for my use case. For all I know, the executables get
bigger and my poor old laptop will have to shuffle more bits around. :)

However, hardware AES would be *the* reason for me to, instead of a netbook,
buy something with an i5 in my next laptop, some time in the distant future.
-- 
Gruß | Greetings | Qapla'
I forbid any use of my email addresses with Facebook services.

Ein Computer stürzt nur ab, wenn der Text lange nicht gespeichert wurde.


pgpU3gNUbjZL6.pgp
Description: PGP signature


Re: [gentoo-user] hard drive encryption

2012-03-11 Thread Florian Philipp
Am 11.03.2012 16:38, schrieb Valmor de Almeida:
 
 Hello,
 
 I have not looked at encryption before and find myself in a situation
 that I have to encrypt my hard drive. I keep /, /boot, and swap outside
 LVM, everything else is under LVM. I think all I need to do is to
 encrypt /home which is under LVM. I use reiserfs.
 
 I would appreciate suggestion and pointers on what it is practical and
 simple in order to accomplish this task with a minimum of downtime.
 
 Thanks,
 
 --
 Valmor
 


Is it acceptable for you to have a commandline prompt for the password
when booting? In that case you can use LUKS with the /etc/init.d/dmcrypt
init script. /etc/conf.d/dmcrypt should contain some examples. As you
want to encrypt an LVM volume, the lvm init script needs to be started
before this. As I see it, there is no strict dependency between those
two scripts. You can add this by adding this line to /etc/rc.conf:
rc_dmcrypt_after=lvm

For creating a LUKS-encrypted volume, look at
http://en.gentoo-wiki.com/wiki/DM-Crypt

You won't need most of what is written there; just section 9,
Administering LUKS and the kernel config in section 2, Assumptions.

Concerning downtime, I'm not aware of any solution that avoids copying
the data over to the new volume. If downtime is absolutely critical, ask
and we can work something out that minimizes the time.

Regards,
Florian Philipp



signature.asc
Description: OpenPGP digital signature