On Mon, 17 Feb 2003, Martin Baehr wrote: > On Mon, Feb 17, 2003 at 12:48:59PM +1300, Wesley Parish wrote: > > I've just checked, and there is a file mvs38j.tar.gz, and it clocks in at > > 407.2 MB, but the mvs38j directory clocks in at 419.2 MB. > > > > So much for gzip's compression ratio - I am deeply disappointed. > > the compression ration depends on the data being compressed. > binaries don't compress as well as text. > images and movies are already compressed in their native format, > so any attempt to compress them again must fail.
Well, yes and no. For any heap of information, there is a theoretical compression limit. This limit can be calculated. However, practical compression algorithms follow quite different approaches, and many of them are tailored to specific applications, which makes them particularly efficient for their intended application, but less good for others. All this has to do with the "structure" of the data to be compressed. Another practical issue is processing time - generally spoken, a higher compression ratio will require more processing time. Ultimately, when the limit is approached, the gains in compression may be negligible as compared to the increase in processing time. While many digital imaging formats use some form of compression, there are some that don't - on these compression can help a lot. Even for some compressed formats, further compression is quite possible. One example is the gifblast application, which further compresses already compressed GIF images (without any loss of information). Gifblast normally gave me a size reduction between 10 and 30 percent - yes, well, not that much, but as I said, for already compressed data. Re images, there is one more trick: true colour images need much more space than indexed ones. This is practically quite useful for images not requiring more than 256 different colours (you probably don't want to do this to your photos!). GIMP offers a hot-key (Alt-i) for converting RGB true colour to indexed format. From the point of view of information theory, this is also a data compression, and it is lossless if the number of colours you choose for the indexed file is equal or greater than the number of colours in the original. ImageMagick can also convert to indexed format if you prefer batch processing from CLI. Back to gzip: it has some options to control the trade-off between speed and compression ratio, and so does bzip2. bzip2 often gives a bit higher compression ratio than gzip. Enough lecturing for now :-) One final question: is anyone aware of a simple tool for Linux / Unix that calculates the theoretical compression limit for a file? Would just be interesting to check how good or bad the compression software really does... Cheers, Helmut Walle. +----------------+ | Helmut Walle | | [EMAIL PROTECTED] | | 03 - 388 39 54 | +----------------+
