On Thu, Sep 9, 2010 at 3:46 PM, Daniel Troeder <dan...@admin-box.com> wrote: > On 09/09/2010 07:24 PM, Matt Neimeyer wrote: >> My generic question is: When I'm using a pipe line series of commands >> do I use up more/less space than doing things in sequence? >> >> For example, I have a development Gentoo VM that has a hard drive that >> is too small... I wanted to move a database off of that onto another >> machine but when I tried the following I filled my partition and 'evil >> things' happened... >> >> mysqldump blah... >> gzip blah... >> >> In this specific case I added another virtual drive, mounted that and >> went on with life but I'm curious if I could have gotten away with the >> pipe line instead. Will doing something like this still use "twice" >> the space? >> >> mysqldump | gzip > file.sql.gz >> >> OR going back to my generic question if I pipe line like "type | sort >> | unique > output" does that only use 1x or 3x the disk space? >> >> Thanks in advance! >> >> Matt >> >> P.S. If the answer is "it depends" how do know what it depends on? >> > Everyone already answered the disk space question. I want to add just > this: It also saves you lots of i/o-bandwidth: only the compressed data > gets written to disk. As i/o is the most common bottleneck, it is often > an imperative to do as much as possible in a pipe. If you're lucky it > can also mean, that multiple programs run at the same time, resulting in > higher throughput. Lucky is, when consumer and producer (right and left > of pipe) can work simultaneously because the buffer is big enough. You > can see this every time you (un)pack a tar.gz.
And if you have a huge amount of data where compression causes CPU to become the bottleneck you can use something like pbzip2 which uses all CPUs/cores in parallel to speed up [de]compression. :)