On Mon, 17 Feb 2003, Martin Baehr wrote:

> On Mon, Feb 17, 2003 at 12:48:59PM +1300, Wesley Parish wrote:
> > I've just checked, and there is a file mvs38j.tar.gz, and it clocks in at
> > 407.2 MB, but the mvs38j directory clocks in at 419.2 MB.
> >
> > So much for gzip's compression ratio -  I am deeply disappointed.
>
> the compression ration depends on the data being compressed.
> binaries don't compress as well as text.
> images and movies are already compressed in their native format,
> so any attempt to compress them again must fail.

Well, yes and no. For any heap of information, there is a theoretical
compression limit. This limit can be calculated. However, practical
compression algorithms follow quite different approaches, and many of
them are tailored to specific applications, which makes them
particularly efficient for their intended application, but less good
for others. All this has to do with the "structure" of the data to be
compressed. Another practical issue is processing time - generally
spoken, a higher compression ratio will require more processing time.
Ultimately, when the limit is approached, the gains in compression may
be negligible as compared to the increase in processing time.

While many digital imaging formats use some form of compression, there
are some that don't - on these compression can help a lot. Even for
some compressed formats, further compression is quite possible. One
example is the gifblast application, which further compresses already
compressed GIF images (without any loss of information). Gifblast
normally gave me a size reduction between 10 and 30 percent - yes,
well, not that much, but as I said, for already compressed data.

Re images, there is one more trick: true colour images need much more
space than indexed ones. This is practically quite useful for images
not requiring more than 256 different colours (you probably don't want
to do this to your photos!). GIMP offers a hot-key (Alt-i) for
converting RGB true colour to indexed format. From the point of view
of information theory, this is also a data compression, and it is
lossless if the number of colours you choose for the indexed file is
equal or greater than the number of colours in the original.
ImageMagick can also convert to indexed format if you prefer batch
processing from CLI.

Back to gzip: it has some options to control the trade-off between
speed and compression ratio, and so does bzip2. bzip2 often gives a
bit higher compression ratio than gzip.

Enough lecturing for now :-) One final question: is anyone aware of a
simple tool for Linux / Unix that calculates the theoretical
compression limit for a file? Would just be interesting to check how
good or bad the compression software really does...


Cheers,

Helmut Walle.

+----------------+
| Helmut Walle   |
| [EMAIL PROTECTED] |
| 03 - 388 39 54 |
+----------------+

Reply via email to