Bug#636184: #636184: some values

Adam Borowski Sat, 09 Jun 2018 15:09:16 -0700

On Sat, Jun 09, 2018 at 10:40:05AM -0400, Philippe Cloutier wrote:
> Thank you Adam,
> I like your proposal, but have a few comments below.
> 
> First, "It has been since greatly outpaced" is odd when we don't
> specify a time.


Any new entrants are not going to be worse on the size-vs-speed graph,
thus such statement won't become untrue in the future, and that's why
naming a specific time seems pointless to me.

> I think it would indeed be relevant to mention that bzip2 was developed
> between 1996 and 2000 (for machines with very limited RAM).

While especially xz can take enormous amounts of RAM, on levels where they
provide slightly better compression ratio than bzip2 they actually need less
RAM, both for compressing and decompressing.  Thus, memory use is not an
advantage of bzip2.

> Second, there's a typo in "Thus, bzip2 shouldn't in be used in new
> designs, although you want it available to access historic data.", and
> it's a little simplistic, and perhaps a little strong, since there
> might be cases for which bzip2 is superior to all alternatives.

I think that's a true statement.  With most algorithms supporting multiple
levels, you can't speak about a single size-to-speed number but have to
measure the whole envelope at all supported levels, for various types of
files.  It's rare for an algorithm to be outclassed on its entire range
-- even lzop has cases when it wins with zstd, but as far as I can tell
this is indeed the case for bzip2.

I'm stopping short at "strictly inferior" as it's possible one may construct
a file which bzip2 processes miraculously fast, but I know of no real-world
file type where bzip2 would win.

> I would favor "Thus, bzip2 should unlikely be used when there is no
> compatibility concern (for example, to decompress data previously
> compressed in bzip2's format).".
> 
> Finally, the existing description is problematic because it's based on
> tests which were once valid, but which are outdated. I am extremely
> far from being an expert of compression, but I am afraid that the
> proposal may be also somewhat over-reliant on different tests. I have
> no doubt that zstd was much better with the file you used for testing,
> but giving these numbers based on a single file is generalizing from
> little. I think it's very useful to provide data on performance, but
> unless a comprehensive comparison was made, I would still prefer
> providing no or little data to providing data which is inexact or
> which gets outdated.

I used a single data point as I'm more inclined to believe what I see with
my own eyes, but there are many comprehensive comparisons available on the
Net.

If you don't want specific numbers, something like "several times as fast"
could be good.

New research goes on, and in another decade or two we'll be talking about
replacing xz and zstd -- but bzip2's time is over.  Let's see what its
author comes with next -- I don't think he said his last word.

> Le mar. 5 juin 2018 à 15:45, Adam Borowski <[email protected]> a écrit :
> > > The extended description says of bzip2:
> > >
> > > > It typically compresses files to within 10% to 15% of the best available
> > > > techniques, whilst being around twice as fast at compression and six
> > > > times faster at decompression.

> >  bzip2 is a freely available, patent free, data compressor.  It has been
> >  since greatly outpaced by newer alternatives, for example zstd at
> >  equivalent shrinking ratio compresses thrice as fast while decompressing
> >  nearly 15 times faster than bzip2.  Thus, bzip2 shouldn't in be used in
> >  new designs, although you want it available to access historic data.


Meow!
-- 
⢀⣴⠾⠻⢶⣦⠀ I've read an article about how lively happy music boosts
⣾⠁⢰⠒⠀⣿⡁ productivity.  You can read it, too, you just need the
⢿⡄⠘⠷⠚⠋⠀ right music while doing so.  I recommend Skepticism
⠈⠳⣄⠀⠀⠀⠀ (funeral doom metal).

Bug#636184: #636184: some values

Reply via email to